VMware Cloud Community
ahilton
Contributor
Contributor

vCenter server service shuts down when ESXi Host connects to it

I have an ESXi server which had some hardware issues that required a firmware update and pulling of the CMOS battery. After this update, every time thhe ESXi server tries to connect to the vCenter server, the vCenter server has a database issue and shuts itself down. The vCenter server is working great as long as I don't try to connect this troubled host to it. Here are some details...

The vCenter server is version 4.0.0 - build 208111. The ESXi server is version 4.00 - build 208167.

Looking in the vpxd logs on the vCenter server, I see the following when the problem host tries to connect...

[2011-12-08 22:54:11.468 00408 error 'App'] An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Error: Error[VdbODBCError] (-1) "ODBC error: (23000) - [Microsoft][SQL Native Client][SQL Server]Violation of PRIMARY KEY constraint 'PK_LUN_TARGET'. Cannot insert duplicate key in object 'dbo.VPX_LUN_TARGET'." is returned when executing SQL statement "INSERT INTO VPX_LUN_TARGET WITH (ROWLOCK) (SCSI_TARGET_ID, SCSI_LUN_ID, LUN_VAL, KEY_VAL, LINK_KEY, HOST_ID, UPDATE_KEY) VALUES (?, ?, ?, ?, ?, ?, ?)"
[2011-12-08 22:54:11.468 00408 verbose 'App'] Backtrace:
[2011-12-08 22:54:11.468 00408 info 'App'] Forcing shutdown of VMware VirtualCenter now
[2011-12-08 22:54:11.468 02532 trivia 'PropertyCollector'] ComputeFilterUpdate session[70F14976-8A56-4A7A-955B-0F9AF0421944]B7C66306-6CE4-4416-82EA-515F02759196 (1/3 scanned, 1 changes) took 762 microSec
[2011-12-08 22:54:11.468 02532 trivia 'PropertyCollector'] ComputeGUReq took 899 microSec
[2011-12-08 22:54:11.468 02532 trivia 'PropertyCollector'] ApplyQueuedOps (TriggerProcessGUReqs): Session 70F14976-8A56-4A7A-955B-0F9AF0421944
[2011-12-08 22:54:11.468 02532 verbose 'App'] [VpxVmomi] Invoke done: vmodl.query.PropertyCollector.waitForUpdates session: 70F14976-8A56-4A7A-955B-0F9AF0421944
[2011-12-08 22:54:11.468 01028 trivia 'PropertyCollector'] ProcessGUReqs Start: Session 70F14976-8A56-4A7A-955B-0F9AF0421944
[2011-12-08 22:54:11.468 01028 trivia 'PropertyCollector'] ProcessGUReqs End: Session 70F14976-8A56-4A7A-955B-0F9AF0421944 (0 filter updates, 0 GUReqs)
[2011-12-08 22:54:11.468 04392 trivia 'ProxySvc Req00032'] Read from server localhost:8085 failed with error class Vmacore::CanceledException(Operation was canceled).
[2011-12-08 22:54:11.468 02420 trivia 'ProxySvc Req01534'] Read from server localhost:8085 failed with error class Vmacore::CanceledException(Operation was canceled).
[2011-12-08 22:54:11.468 02532 trivia 'SOAP'] Sending response to []: waitForUpdates
[2011-12-08 22:54:11.468 02532 verbose 'SoapAdapter.HTTPService'] HTTP Response: Complete (processed 2262 bytes)
[2011-12-08 22:54:11.468 02532 trivia 'SoapAdapter.HTTPService'] HTTP Response: Flush(lastBlock = true)
[2011-12-08 22:54:11.468 02532 trivia 'SoapAdapter.HTTPService'] HTTP Response: Setting Content-Length: 2262
[2011-12-08 22:54:11.468 02532 trivia 'SoapAdapter.HTTPService'] HTTP Response: Header size is 165
[2011-12-08 22:54:11.468 02532 trivia 'SoapAdapter.HTTPService'] HTTP Response: Writing 2427 bytes to stream
[2011-12-08 22:54:11.468 02532 trivia 'App'] ResponseCompleted(false), request version 272, closeStream false
[2011-12-08 22:54:11.468 02696 trivia 'ProxySvc Req05262 Tunnel'] The server has closed the connection to the tunnel.
I have tried removing the host from the vCenter server's inventory, but when I re-add it, the vCenter server shuts down again.
Any help or suggestions are apreciated!
Aaron

Tags (3)
0 Kudos
9 Replies
a_p_
Leadership
Leadership

Looks like not able to add host to vcenter. Maybe the solution - deleting the vpxuser from the host - solves your issue too.

André

0 Kudos
ahilton
Contributor
Contributor

Deleting the vpxuser from the host and reconnecting to it did not resolve the problem. Any other suggestions?

Thanks!

0 Kudos
zxr45
Contributor
Contributor

I am getting the exact same errors on adding an ESX 4.0 164009 to vcenter 4.1 update 1, were you able to get this resolved?

I have a support ticket opened with vmware and they have not been able to resolve this yet.

Thanks

0 Kudos
ahilton
Contributor
Contributor

I spent 2 hours on a support call with a VMware, and the final conclusion that they came up with was that something was messed up with the database and we had two options. 1. We could attempt to clean up the databasem which could take hours of hand-holding by VMware engineers, and they couldn't even guarantee that would work 100%. The second and "supported" solution is to rebuild the ESXi server from scratch and re-add it to the cluster. They told me that the server should have a different UUID when it is rebuilt (with a possibility of a duplicate UUID if I name it the same thing), and the Host with a different UUID should be able to add itself to the cluster without any problems.

I am taking course #3, which is to use this as an ooportunity to upgrade my cluster to 5.0 complete with a new (and clean) database. I'll post here if I have any troubles as I move forward.

0 Kudos
zxr45
Contributor
Contributor

I had also spent about 2 hours yesterday with vmware support without any resolution, they did suggest if I could setup a new vcenter.  Which I did today and I still received the error when trying to add the host. So after my last post above I looked at the error again(I think i have it memorized)  anyways I did a esxcfg-mpath -l on the host and noticed I had multiple dead paths to the SAN.  I did a rescan using the vsphere client and it got rid of all the dead paths but one.  So I tried to add the host back to vcenter and it completed without errors.  Looks like vcenter doesnt like more than one dead path showing on the host.  Hope that helps

Next is migrating the vms off the hosts and do a reboot which I hope will clear the existing dead path.

0 Kudos
ahilton
Contributor
Contributor

I just tried the same on my server - I didn't have any dead paths but went ahead and rescanned. It did not solve the problem. Thanks for the suggestion!

0 Kudos
ahilton
Contributor
Contributor

And why does it say that this question has been answered.... Any way to change it back?

0 Kudos
zxr45
Contributor
Contributor

no clue,  I've just checked on the posts

0 Kudos
ahilton
Contributor
Contributor

I found the solution to my problem. Acting on the recommendations of VMWare tech support, I rebuilt the ESXi server, however the same problem remained. This time, when I looked at the vpxd logs on the vCenter server, there were some lines that told the information it was trying to add to the SQL database when it errored out, and in this information were two lines which showed some LUN information for the problem LUN. I'm not sure why this information wasn't there before..... Anyhow, Upon further investigation, I found that the LUN number for this particular mapping had been changed from the correct value to '0' (I found one other mapping with the same incorrect LUN number, for a different server and volume). After deleting the mapping on the SAN and remapping it with the correct LUN number, this host was able to connect to the VMWare Cluster without any problems. Yeehaw!!!

0 Kudos