Part Discussion, Part Question.
I am a SQL DBA, not a vCenter Admin. We recently installed vCenter 5.1 and our sysadmin team began a procerss of migrating hundreds of VMs to it from 4.1/5.0.
I'm not sure what that all entails but for quite some time, the vCenter application has been generating a lot of transaction logs.
We do trn backups every 15 minutes and these backup files are 400MB+...do the math and thats close to 40GB/day in trn backups.
First comment: I have seen the reccommendation to set the vCenter database to SIMPLE recovery mode in SQL Server. This will surely reduce the size of the transaction log file (LDF) as only the active transactions are kept. But when I see this advice on this blog, people are not considering that SIMPLE recovery mode means you can only restore the database to the point-in-time of the last full or differential backup. If you dont care about point-in-time recovery then SIMPLE makes sense for you. We want to be able to recover to a point-in-time so we use FULL recovery mode and transaction log backups.
Second Comment: Nothing "weird" is happening on the Database side. No long open transactions or log shipping or mirroring not working. I've checked the VLFs for the vc database and the log_reuse_wait flags and transactions are only waiting on the Log backup which is happening correctly every 15 minutes. They are getting backed up and removed.
The big question is what is vCenter doing that it has to generate so much log activity??
I've traced the database activity in profiler a bit and there is a lot of updating in thousands of small transactions...but I'm not sure what its doing...there's one update that runs thousands of times thats updating a long ntext column in table VPX_SDRS_STATS_VM but thats not the only thing happening. I've checked the vCenter agent jobs (that roll up data). They are working and they are not generating the transaction traffic.
So any ideas on what vCenter is doing? Any ideas on how to tone it down? Please no suggestions about tuning SQL Server...I'm pretty sure I've looked into everything on that side....we don't need advice on maintaining the transaction log...we can do that. The question is with the vCenter App. activity itself.
Thanks for your time
Hi,
That does seem to be a bit heavy. Any idea how many hosts and VMs this vCenter is managing?
The first thing I would have your vCenter admin look at would be the settings for statistics collection. Large numbers of hosts/VM with stats collection turned up past the default settings could explain the high rate of change in your database.
EVERY task and event that occurs on vCenter or on any ESX host is logged in the database, so again, the more hosts that are involved, that higher the rate of change you could expect, just in general.
Sorry for the delay in response...I had to get info from my sysadmin team who are the vCenter administrators.
Note again, its not the size of the vCenter databases that is the issue but the amount of transaction logs being generated...data moving in and out of the database via updates, deletes, inserts, etc... Our database size does not seem to be a problem.
here's what our adins responded with:
Right now the vCenter server is set for Information (Normal Logging). The is the middle-of-the-road logging level. As for the statistics that we’re keeping, here are the settings:
We’re well below the level they list here of 50 hosts and 2000 vm’s – and even that it says the estimated space required is only 14.32GB
One more thought to re-iterate....the vcenter collection jobs are running fine and do not seem to be the cause of the transaction bloat. We saw the transaction logs increasing even when the jobs weren't running.
The behavior you are seeing doesn't sound normal to me. I have a relatively new 5.1 environment, but we haven't noticed any weirdness with our transaction logs. We're using Full recovery mode as well.
There is a KB article that doesn't speak to your exact issue, but might give you a hint about what is changing so frequently in your database to cause rapid log growth.
If I had to take a wild guess, I'd start looking at what else besides vCenter and ESXi hosts might be included in the vSphere environment... any plugins, any monitoring apps, backup applications, anything that might be logging tasks/events/history in the database. The fact that the actual mdb files aren't really growing makes me think retention policy is set, which would purge old history/stats and further change the database.
If you can tell what time the logs are growing so much, maybe try to watch the activity in Management Studio during that window and see what applications/processes/hosts are connected to the database and see where the activity is coming from.
If you aren't able to figure it out, your VMware admins should be able to open a case with VMware support and hash it out.
Thanks for the feedback....that link you provided is where I started...it doesn't say much about root causes, just explains FULL vs SIMPLE recovery mode and tells you to truncate and shrink the transaction log.
I will see if our sysadmins can enter a support issue....consider this discussion dormant until we get some feedback....or if someone has an answer
Thanks
Final result:
We reported the problem and got an official response I'm very unsatisfied with.
I provided a SQL trace of 7 minutes of processing that generated 180MB of transaction logs. But it appears they didn't look at it.
They said to:
* Check the SQL Server Version is supported - standard stuff
* Shrink the Transaction Log File - Useless advice, we explained in our report that the MDF and LDF file sizes were not an issue. we were doing TRN backups every 15 minutes....it was the sheer volume of transactions and subsequent large backup files that was the issue.
* Finally they said to check SIMPLE vs FULL recovery mode...We were told to use SIMPLE. Well in our report, we explained we were using FULL because we wanted point-in-time recovery. But they avoided that completely by telling us we must use SIMPLE.
Result, I didnt want to argue with VMWare since our Sysadmins were willing to give up point-in-time recovery for a Differential backup every 6 hours and the ability to recover back to then...they seemed to be ok with potentially losing 6-hours of stats, etc...
But whoever sees this, I hope you understand that VMWare did not really understand the issue: Why is their application generating so much TLOG activity?....and that they are in effect stating they do not support SQL Server's ability to restore to a point-in-time because when this happens they "Punt" to SIMPLE recovery mode. I am a DBA, not a sysadmin and this drives me crazy...but I have other things to work on.:smileyplain:
jshem wrote:
Final result:
We reported the problem and got an official response I'm very unsatisfied with.
I provided a SQL trace of 7 minutes of processing that generated 180MB of transaction logs. But it appears they didn't look at it.
They said to:
* Check the SQL Server Version is supported - standard stuff
* Shrink the Transaction Log File - Useless advice, we explained in our report that the MDF and LDF file sizes were not an issue. we were doing TRN backups every 15 minutes....it was the sheer volume of transactions and subsequent large backup files that was the issue.
* Finally they said to check SIMPLE vs FULL recovery mode...We were told to use SIMPLE. Well in our report, we explained we were using FULL because we wanted point-in-time recovery. But they avoided that completely by telling us we must use SIMPLE.
Result, I didnt want to argue with VMWare since our Sysadmins were willing to give up point-in-time recovery for a Differential backup every 6 hours and the ability to recover back to then...they seemed to be ok with potentially losing 6-hours of stats, etc...
But whoever sees this, I hope you understand that VMWare did not really understand the issue: Why is their application generating so much TLOG activity?....and that they are in effect stating they do not support SQL Server's ability to restore to a point-in-time because when this happens they "Punt" to SIMPLE recovery mode. I am a DBA, not a sysadmin and this drives me crazy...but I have other things to work on.:smileyplain:
I'm sorry for your experience with VMware's support team. My current 5.1 envrionment is not exceedingly complex, but I do have several of the vCenter virtual appliances implemented, external monitoring from Solarwinds, and backups occuring, and I do not see the same transaction log bloat. Our DBA set our tlog backups to every 3 hours.
I still bet there is something configured in your environment that is causing these transactions (correctly, or not), but if your VMware admin team is OK with the current workaround, it probably isn't worth chasing down. There there has to be a reason VMware as a rule recommends/encourages simple recovery mode on the vC database, but they should elaborate on that instead of just trying hiding behind the fact that they recommend simple mode (IMO) when someone calls for support.
I'm an Oracle dba facing the same issues. We also would like our database to run in "archivelogmode" to make a point in time recovery possible. VMWare is really SQL Server oriented and keeps talking about "simple recovery mode".
Anyway, we had the same problems with a tremendous number of redolog files (similar to transaction log). It went upto 195 logs per hour (each of them being 50MB in size). Finally we discovered that one of our monitoring-tools caused this behaviour. Based on API invocation the "Orion" tooling was updating the table "VPX_SDRS_STATS_VM" like crazy. As soon as we stopped "Orion" the number of redologs dropped down to less than an acceptable 10 per hour.
Perhaps you have a similar situation with any other third-party monitoring?
Same problem here, 3.5 years later.
PS Running ESXi v5.5 Update 3b