after having struggled to solve the reason for above error-message over the past weeks, i want to share my experience and the solution.
The information to (really) get rid of the well known error message is allready available online but scattered over different places.
Let me sum things up:
We have three different Vsphere 4.1 Essential Environments with 120 VMs on 3 vm-hosts each. Two of them running since 2 years now, without any problems. They all are deployed using the MS SQL 2005 express-edition supplied with the vcenter server installer. The 4GB datasize limit of this edition should not be a real limit for our usecases.
2 of them started showing the following "transaction log"-full messages in the windows eventlog after having upgraded to vsphere 4.1 U2.
Event 9002, MSSQL$SQLEXP_VIM
The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
This was repeated every 6 hours at 1:01 am, 7:01 am, 1:01pm, 7:01pm each day on both vcenter server hosts.
So - what changed? what to do? Did we hit the 4GB limit of the engine when datasize + logsize is greater then 4GB ?
The workaround on
was easely spotted and applied. I increased the transaction log area limit vom 512MB to 2GB as suggested.
This eased the situation a bit but dit not really solve the underlying problem. The error-messages continued. More problematic: the vcenter server service started to crash - so the problem started to reach my users. Manual shrink's of the logarea solved those cases.
The situation at this point:
Site1: vcenter-database is about 2.4GB in size, transaction log limit is increased to 3, 4 then 5 GB. Still "transaction log" errors.
Site2: vcenter-database is about 1.8GB in size, transaction log limit increased to 2, 3 then 6 GB. Still "transaction log" errors.
At this point we where a little bit puzzled. Our production vcenter database is running since 3 years now, has an 360days retention period but is still only 800 MB in size. Why are the databases of our development enviroments this large?
So. What lead to a solution where the table names VPX_EVENT_ARG and VPX_EVENT i found mentioned in a forum message.
A closer inspection of our vcenter database in site1 using the SQL Server Management Studio showed this picture:
VPX_EVENT was about 700 MB in size, with an additional index of 400MB
VPX_EVENT_ARG was about 400 MB in size with an additional index of 300 MB.
So in sum, this tables did hold about 1.5GB of data. A look into the table led to details for events up to 2 years ago. Seems they never had been cleaned up, despite the retention period was never longer than 180days on those servers. In site1 the EVENT-table had 2.1 million lines of event-data, in site2 we had 1.1 million lines.
As we dont need event information older than 30 days we could easely drop all those information. Using the procedure described here:
we were able to simply drop/erase all of the existing event history, starting from scratch afterwards.
In short: During a maintenance-windows we simply made a full-backup of the database and issued
truncate table vpx_event_arg;
delete from vpx_event;
But - here comes the trick - whereas the "truncate"-step only takes seconds and no transaction-log space at all, the "delete" took 35 minutes and 13GB(!) of transaction log. On site2 it took 15 minutes and 8GB of transaction log size. With the default limits of 1GB or the suggest 2-3GB for the transaction log in place from the tipps above this procedure would never had succeeded. So, we needed 13 GB of transaction log size to erase 1.5GB of actual data consisting of 2.1 million table rows on site1.
Which leads us this this conclusion: The updated vcenter 4.1 (u2 ?) tried to clean up the mess in the event-tables but never had a chance to come to an successfull end with the default (or slightly increased) transaction log limits discussed everywhere. If your disk-space allows, increase the transaction log limit to 20GB (guess your actualy needs based on my numbers above).
The real problem here is, i think, the counterintuitive storage needs for the transaction log when cleaning up internal data. Who had guessed a factor of 10 in the first place? Anyone?
I also suspect, the internal cleanup-procedures would have catched up by increasing the transaction log limit alone - but i havent tested this in practice.
Hope this helps