VMware Cloud Community
pseudoyams
Contributor
Contributor
Jump to solution

VCenter server service keeps stopping

Every 1-2 hours the vCenter server service will stop.  In the event logs I find these two error msgs:

Application Log:

Event Type:    Error

Event Source:    VMware VirtualCenter Server

Event Category:    None

Event ID:    1000

Date:        2/4/2011

Time:        7:31:25 AM

User:        N/A

Computer:    VCENTER

Description:

The description for Event ID ( 1000 ) in Source ( VMware VirtualCenter Server ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Error: Error[VdbODBCError] (-1) "ODBC error: () - " is returned when executing SQL statement "INSERT INTO VPX_EVENT WITH (ROWLOCK) (EVENT_ID, CHAIN_ID, EVENT_TYPE, EXTENDED_CLASS, CREATE_TIME, USERNAME, CATEGORY, VM_ID, VM_NAME, HOST_ID, HOST_NAME, COMPUTERESOURCE_ID, COMPUTERESOURCE_TYPE, COMPUTERESOURCE_NAME, DATACENTER_ID, DATACENTER_NAME, DATASTORE_ID, DATASTORE_NAME, NETWORK_ID, NETWORK_NAME, NETWORK_TYPE, DVS_ID, DVS_NAME, CHANGE_TAG_ID) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)".

System Log:

Event Type:    Error
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7024
Date:        2/4/2011
Time:        7:31:25 AM
User:        N/A
Computer:    VCENTER
Description:
The VMware VirtualCenter Server service terminated with service-specific error 2 (0x2).

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

I can restart the vCenter Server service fine after that.  It seems like its trying to write to the DB, fails, then it stops.  I have the DB running on an external SQl 2008 server.  The VIM_VCDB db is approx 9.5 GB, which seems very large to me. Ive set the Database Retention Policy to 90 days. I took a full backup of the DB and tried to shrink it 4-5 times but it doesnt every really shrink.  I did do the log files as well and that shrunk a lot.

Any thoughts on this?  Nothing really changed and this started happening this week.

Thanks-

Rob

Reply
0 Kudos
1 Solution

Accepted Solutions
kfeina
Enthusiast
Enthusiast
Jump to solution

We think that  the problem is related to the SQL JOB:  Past Day stats rollup.  This job is  executed each 30 minuts.

The job fills and  delete rows of the table VPX_HIST_STAT1. 

If something goes  wrong, the table can grow a lot (each 30 minutes new data can be added but not  deleted if the second step of the job fails).

In our case , we had  90.000.000 of rows on the table, so the Past Day stats never ends.

        Having the DB  in Full mode did the log full very quickly (doing a rollback DB after) and the  job fails, until the next 30 minutes that it also fails and ......

        Putting the  DB in Recovery creates a system.outofMemory -> SQL does not have enough  resources to find and delete the rows in the  VPX_HIST_STAT1 table. So, TCP-ip  connections against the Vmware DB will close. -> then the Vmware Service will  stop (each 30 minuts aprox).

What we did to solve  the problem:

0- Backup your  Virtulal Center Database.
1- Put DB in simple  recovery Model
2- Create an SCRIPT  of the table VPX_HIST_STAT1
3- drop table    delete from VPX_HIST_STAT1
4- Recreate the table  with the SQL script
5- Try to run the job  Past Day stats rollupVIM_VCDB_T

ALL Works OK and the  Vmware Server Service is alive after 48 hours.

View solution in original post

Reply
0 Kudos
20 Replies
Troy_Clavell
Immortal
Immortal
Jump to solution

do you have IIS running on your vCenter Host OS?

Reply
0 Kudos
pseudoyams
Contributor
Contributor
Jump to solution

No, why?  I never had it installed.

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

event ID 1000 are usually related to something conflicting with port 80.  Usually it's IIS.  http://kb.vmware.com/kb/1015101

The SQL DB, it's compatible with the vCente instance you are running?

http://www.vmware.com/pdf/vsphere4/r40/vsp_compatibility_matrix.pdf

Reply
0 Kudos
bulletprooffool
Champion
Champion
Jump to solution

Have you received any errors on the SQL server?

Do you have any processes kicking in around the same time as the VC crashes?

Are you losing network connectivity?

Unfortunately, your event logs are not particularly helpful - so this is tough to troubleshoot.

Have a look at your log files and see if you can provide any more info:

The vCenter  Server logs can be viewed from:
  • The vSphere  Client connected to vCenter Server (click Home > Administration > System Logs)
  • The  vSphere Client connected to VirtualCenter Server (click AdministrationSystem Logs).

The  logs are located in %ALLUSERSPROFILE%\Application  Data\VMware\VMware VirtualCenter\Logs,  which translates to C:\Documents  and Settings\All Users\Application Data\VMware\VirtualCenter\logs  in Windows 2003 and C:\ProgramData\VMware\VMware  VirtualCenter\Logs in  Windows 2008.
One day I will virtualise myself . . .
Reply
0 Kudos
pseudoyams
Contributor
Contributor
Jump to solution

@Troy - IIS was never on the server and nothing else is using port 80.  I shut down the vcenter server services and verified.  Also, my SQL DB is 2008 Standard 64bit - which is listed as being compatible.

@BulletProofFool

The only error I get on the db server is this:

Event Type:    Error
Event Source:    MSSQLSERVER
Event Category:    Server
Event ID:    9002
Date:        2/4/2011
Time:        6:02:23 AM
User:        KSDDBS01\sql_server_agent
Computer:    KSDDBS01
Description:
The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

I have seen this before (earlier this week).  Thats what prompted me to lower the Database Retention Policy, peform a full backup (to clear the transaction logs) and shink it, but it doesnt seem to help.

No other procs are starting when when vcenter stops.

Network connectivity is fine.  Ive been keeping an rdp session open to the vcenter server so I can restart the service quickly.

From searching around, I know the logs Ive provided are vague...but I thought Id put this out there for some help. I am getting some of the same error msgs as on the db server, but what Im doing doesnt seem to help...thoughts???

[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] SQL execution failed: UPDATE VPX_ALARM_RUNTIME WITH (ROWLOCK) SET ENTITY_TYPE = ? , STATE_VALUE = ? , METRIC_VALUE = ? , CREATED_TIME = ? , STATUS_VALUE = ? WHERE ENTITY_ID = ? AND ALARM_ID = ? AND EXPRESSION_NAME = ?
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] Execution elapsed time: 1000 ms
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] Diagnostic data from driver is 42000:1:9002:[Microsoft][SQL Server Native Client 10.0][SQL Server]The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] Bind parameters:
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 11, size: 0,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = ""
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 9299
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 10, size: 23,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 11, size: 6,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = "red"
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 140
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 1, size: 4,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = 4
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] datatype: 11, size: 228,arraySize: 0
[2011-02-04 08:02:26.608 02088 error 'App'] [VdbStatement] value = "<obj xmlns="urn:vim25" versionId="4.0" xsi:type="PerfMetricId"><counterId>2</counterId><instance></instance></obj>"
[2011-02-04 08:02:26.624 02088 warning 'VpxdMoLock'] ***WARNING*** Lock vm-140 mode EXCLUSIVE held for 1027 ms
[2011-02-04 08:02:26.624 02088 error 'App'] [ProcessEntityChanged] Unexpected exception
[2011-02-04 08:02:26.624 02088 error 'App'] [CheckAndFireAlarms] Unhandled exception.
[2011-02-04 08:02:26.639 02088 error 'App'] [ScheduledTaskManager] Unhandled exception.
[2011-02-04 08:02:26.655 02088 error 'App'] An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Error: Error[VdbODBCError] (-1) "ODBC error: (42000) - [Microsoft][SQL Server Native Client 10.0][SQL Server]The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases" is returned when executing SQL statement "UPDATE VPX_ALARM_RUNTIME WITH (ROWLOCK) SET ENTITY_TYPE = ? , STATE_VALUE = ? , METRIC_VALUE = ? , CREATED_TIME = ? , STATUS_VALUE = ? WHERE ENTITY_ID = ? AND ALARM_ID = ? AND EXPRESSION_NAME = ?"
[2011-02-04 08:02:26.655 05856 warning 'ProxySvc Req00029'] Error reading from client while waiting for header: class Vmacore::CanceledException(Operation was canceled)
[2011-02-04 08:02:26.655 05928 warning 'ProxySvc Req00018'] Read from server localhost:8085 with pending response, failed with error class Vmacore::CanceledException(Operation was canceled).
[2011-02-04 08:02:26.655 02420 warning 'ProxySvc'] Accept on client connection failed: Operation was canceled
Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

The transaction log for database 'VIM_VCDB' is full.

That  is your problem

http://kb.vmware.com/kb/1003980

pseudoyams
Contributor
Contributor
Jump to solution

I have performed those exact steps earlier in the week and it is still failing.  I did it again tho and we'll see how it goes.  Both the data and log files are set to autogrow by 10% so...why is it running out of space in the first place?  There is more than enough physical space on the drive...

-Rob

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

what is your recovery model?  Remote DB? If so, arethe rollup jobs are running?  Bottom line is your transaction logs are getting filled up and this is what is causing the vCenter Server Service to stop.

Reply
0 Kudos
pseudoyams
Contributor
Contributor
Jump to solution

Yes, this is a remote DB.  The recovery model has always been set to Simple.  Im not sure about the rollup jobs...where/how can I check that?

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

look at the SQL Agent, is it running, and you see the rollup jobs?

Reply
0 Kudos
pseudoyams
Contributor
Contributor
Jump to solution

Ah, I see now.  Yes the Agent is running.  I checked the logs for the rollup job that run every 30 minutes and they are all failing with the transaction logs out of space msg.  10 am's job just ran and it completed fine.  Hopefully that sticks.

I had the Database Retention Policy set to 180 two days ago and set it to 90 this morning. I wonder if setting that and then shrinking the logs again after that did the trick.

Reply
0 Kudos
pseudoyams
Contributor
Contributor
Jump to solution

Whoops... nevermind... I just went down again.  Same error msgs.  :smileyangry:

The logs always state:

To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

How do I run a query to check that?

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

I'm not a DBA, so I can't help, other than doing a KB article or google search, which will say to change your recovery model to Simple, which you said you are already at.  If you have a DBA, talk the him/her, or open an SR with VMware.

Reply
0 Kudos
Maria_hernando
Contributor
Contributor
Jump to solution

Did you solve this issue??? I´m just having the same problem as you.

Reply
0 Kudos
MauroBonder
VMware Employee
VMware Employee
Jump to solution

try change service to automatic (delay).

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado
Reply
0 Kudos
Maria_hernando
Contributor
Contributor
Jump to solution

Automatic delay? What do you mean?

Reply
0 Kudos
MauroBonder
VMware Employee
VMware Employee
Jump to solution

if vcenter are installed in W2K8, run > services.msc > Vcenter > prorieties > automatic (delay)

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado
Reply
0 Kudos
calladd
Enthusiast
Enthusiast
Jump to solution

If your vCenter server is running on Windows 2008, there is a new startup option for services called "Automatic Delayed".  In other words this chose will still start the service automatically, but will wait till 2 minutes after the system has sucessfully started the OS.

Reply
0 Kudos
pseudoyams
Contributor
Contributor
Jump to solution

I opened a ticket with support and they are reviewing the logs. Although I think I may have fixed it.  The problem occurred when the rollup job ran. After looking a bit closer, that job uses the TempDB to do its process.  The TempDB was either full or too larger (Im not quite sure).  I shrunk that DB and its files.  The problem occured another time or two after that, but its been sitting good for the last 36 hours.

-Rob

Reply
0 Kudos