Every 1-2 hours the vCenter server service will stop. In the event logs I find these two error msgs:
Application Log:
Event Type: Error
Event Source: VMware VirtualCenter Server
Event Category: None
Event ID: 1000
Date: 2/4/2011
Time: 7:31:25 AM
User: N/A
Computer: VCENTER
Description:
The description for Event ID ( 1000 ) in Source ( VMware VirtualCenter Server ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Error: Error[VdbODBCError] (-1) "ODBC error: () - " is returned when executing SQL statement "INSERT INTO VPX_EVENT WITH (ROWLOCK) (EVENT_ID, CHAIN_ID, EVENT_TYPE, EXTENDED_CLASS, CREATE_TIME, USERNAME, CATEGORY, VM_ID, VM_NAME, HOST_ID, HOST_NAME, COMPUTERESOURCE_ID, COMPUTERESOURCE_TYPE, COMPUTERESOURCE_NAME, DATACENTER_ID, DATACENTER_NAME, DATASTORE_ID, DATASTORE_NAME, NETWORK_ID, NETWORK_NAME, NETWORK_TYPE, DVS_ID, DVS_NAME, CHANGE_TAG_ID) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)".
System Log:
Event Type: Error
Event Source: Service Control Manager
Event Category: None
Event ID: 7024
Date: 2/4/2011
Time: 7:31:25 AM
User: N/A
Computer: VCENTER
Description:
The VMware VirtualCenter Server service terminated with service-specific error 2 (0x2).For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
I can restart the vCenter Server service fine after that. It seems like its trying to write to the DB, fails, then it stops. I have the DB running on an external SQl 2008 server. The VIM_VCDB db is approx 9.5 GB, which seems very large to me. Ive set the Database Retention Policy to 90 days. I took a full backup of the DB and tried to shrink it 4-5 times but it doesnt every really shrink. I did do the log files as well and that shrunk a lot.
Any thoughts on this? Nothing really changed and this started happening this week.
Thanks-
Rob
We think that the problem is related to the SQL JOB: Past Day stats rollup. This job is executed each 30 minuts.
The job fills and delete rows of the table VPX_HIST_STAT1.
If something goes wrong, the table can grow a lot (each 30 minutes new data can be added but not deleted if the second step of the job fails).
In our case , we had 90.000.000 of rows on the table, so the Past Day stats never ends.
Having the DB in Full mode did the log full very quickly (doing a rollback DB after) and the job fails, until the next 30 minutes that it also fails and ......
Putting the DB in Recovery creates a system.outofMemory -> SQL does not have enough resources to find and delete the rows in the VPX_HIST_STAT1 table. So, TCP-ip connections against the Vmware DB will close. -> then the Vmware Service will stop (each 30 minuts aprox).
What we did to solve the problem:
0- Backup your Virtulal Center Database.
1- Put DB in simple recovery Model
2- Create an SCRIPT of the table VPX_HIST_STAT1
3- drop table delete from VPX_HIST_STAT1
4- Recreate the table with the SQL script
5- Try to run the job Past Day stats rollupVIM_VCDB_T
ALL Works OK and the Vmware Server Service is alive after 48 hours.
do you have IIS running on your vCenter Host OS?
No, why? I never had it installed.
event ID 1000 are usually related to something conflicting with port 80. Usually it's IIS. http://kb.vmware.com/kb/1015101
The SQL DB, it's compatible with the vCente instance you are running?
http://www.vmware.com/pdf/vsphere4/r40/vsp_compatibility_matrix.pdf
Have you received any errors on the SQL server?
Do you have any processes kicking in around the same time as the VC crashes?
Are you losing network connectivity?
Unfortunately, your event logs are not particularly helpful - so this is tough to troubleshoot.
Have a look at your log files and see if you can provide any more info:
@Troy - IIS was never on the server and nothing else is using port 80. I shut down the vcenter server services and verified. Also, my SQL DB is 2008 Standard 64bit - which is listed as being compatible.
@BulletProofFool
The only error I get on the db server is this:
Event Type: Error
Event Source: MSSQLSERVER
Event Category: Server
Event ID: 9002
Date: 2/4/2011
Time: 6:02:23 AM
User: KSDDBS01\sql_server_agent
Computer: KSDDBS01
Description:
The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
I have seen this before (earlier this week). Thats what prompted me to lower the Database Retention Policy, peform a full backup (to clear the transaction logs) and shink it, but it doesnt seem to help.
No other procs are starting when when vcenter stops.
Network connectivity is fine. Ive been keeping an rdp session open to the vcenter server so I can restart the service quickly.
From searching around, I know the logs Ive provided are vague...but I thought Id put this out there for some help. I am getting some of the same error msgs as on the db server, but what Im doing doesnt seem to help...thoughts???
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] SQL execution failed: UPDATE VPX_ALARM_RUNTIME WITH (ROWLOCK) SET ENTITY_TYPE = ? , STATE_VALUE = ? , METRIC_VALUE = ? , CREATED_TIME = ? , STATUS_VALUE = ? WHERE ENTITY_ID = ? AND ALARM_ID = ? AND EXPRESSION_NAME = ?
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] Execution elapsed time: 1000 ms
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] Diagnostic data from driver is 42000:1:9002:[Microsoft][SQL Server Native Client 10.0][SQL Server]The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] Bind parameters:
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 11, size: 0,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = ""
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 9299
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 10, size: 23,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 11, size: 6,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = "red"
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 140
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 4
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 11, size: 228,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = "<obj xmlns="urn:vim25" versionId="4.0" xsi:type="PerfMetricId"><counterId>2</counterId><instance></instance></obj>"
[2011-02-04 08:02:26.624 02088 warning 'VpxdMoLock'] ***WARNING*** Lock vm-140 mode EXCLUSIVE held for 1027 ms
[2011-02-04 08:02:26.624 02088 error 'App'] [ProcessEntityChanged] Unexpected exception
[2011-02-04 08:02:26.624 02088 error 'App'] [CheckAndFireAlarms] Unhandled exception.
[2011-02-04 08:02:26.639 02088 error 'App'] [ScheduledTaskManager] Unhandled exception.
[2011-02-04 08:02:26.655 02088 error 'App'] An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Error: Error[VdbODBCError] (-1) "ODBC error: (42000) - [Microsoft][SQL Server Native Client 10.0][SQL Server]The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases" is returned when executing SQL statement "UPDATE VPX_ALARM_RUNTIME WITH (ROWLOCK) SET ENTITY_TYPE = ? , STATE_VALUE = ? , METRIC_VALUE = ? , CREATED_TIME = ? , STATUS_VALUE = ? WHERE ENTITY_ID = ? AND ALARM_ID = ? AND EXPRESSION_NAME = ?"
[2011-02-04 08:02:26.655 05856 warning 'ProxySvc Req00029'] Error reading from client while waiting for header: class Vmacore::CanceledException(Operation was canceled)
[2011-02-04 08:02:26.655 05928 warning 'ProxySvc Req00018'] Read from server localhost:8085 with pending response, failed with error class Vmacore::CanceledException(Operation was canceled).
[2011-02-04 08:02:26.655 02420 warning 'ProxySvc'] Accept on client connection failed: Operation was canceled
The transaction log for database 'VIM_VCDB' is full.
That is your problem
I have performed those exact steps earlier in the week and it is still failing. I did it again tho and we'll see how it goes. Both the data and log files are set to autogrow by 10% so...why is it running out of space in the first place? There is more than enough physical space on the drive...
-Rob
what is your recovery model? Remote DB? If so, arethe rollup jobs are running? Bottom line is your transaction logs are getting filled up and this is what is causing the vCenter Server Service to stop.
Yes, this is a remote DB. The recovery model has always been set to Simple. Im not sure about the rollup jobs...where/how can I check that?
look at the SQL Agent, is it running, and you see the rollup jobs?
Ah, I see now. Yes the Agent is running. I checked the logs for the rollup job that run every 30 minutes and they are all failing with the transaction logs out of space msg. 10 am's job just ran and it completed fine. Hopefully that sticks.
I had the Database Retention Policy set to 180 two days ago and set it to 90 this morning. I wonder if setting that and then shrinking the logs again after that did the trick.
Whoops... nevermind... I just went down again. Same error msgs. :smileyangry:
The logs always state:
To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
How do I run a query to check that?
I'm not a DBA, so I can't help, other than doing a KB article or google search, which will say to change your recovery model to Simple, which you said you are already at. If you have a DBA, talk the him/her, or open an SR with VMware.
Did you solve this issue??? I´m just having the same problem as you.
try change service to automatic (delay).
Automatic delay? What do you mean?
if vcenter are installed in W2K8, run > services.msc > Vcenter > prorieties > automatic (delay)
If your vCenter server is running on Windows 2008, there is a new startup option for services called "Automatic Delayed". In other words this chose will still start the service automatically, but will wait till 2 minutes after the system has sucessfully started the OS.
I opened a ticket with support and they are reviewing the logs. Although I think I may have fixed it. The problem occurred when the rollup job ran. After looking a bit closer, that job uses the TempDB to do its process. The TempDB was either full or too larger (Im not quite sure). I shrunk that DB and its files. The problem occured another time or two after that, but its been sitting good for the last 36 hours.
-Rob