This morning I noticed a problem with one of our ESX servers (running v3.5), it appeared to be no longer responding to connections from the virtual infrastructure client (version 2.5). And is currently showing up as disconnected on the VI client.
When I tried to click into the configuration tab of this esx server, I was getting errors stating the connection had timed out to the host. I can't recall the exact message that appeared, but one did have someting along the lines of the 'vpxa' failed to respond. Looking at the log on the VI console it has an error relating to this server stating, "unable to access the specific host, it either does not exist, the server software is not responding or there is a network problem".
When I did a quick check of the kb's here, in relation to the vpxa message, many of the suggestions pointed to restarting the 'mgmt-vmware' service. Having run this from the ESX console, the restart appears to have hung at the stage of stopping the vmware host agent. The previous service '...host agent watchdog' has stopped.
Do I resort to killing the PID for the hostd service? (bearing in mind that I have about 30 VM's currently running on this server which really can't let go offline at the moment).
Other suggestions would be appreciated.
Cant you vmotion the VM to work on that host and maybe restart it in the worst case?
I happened to had the same problem as you, a VM that was backep up with vreplicator and vranger used to crash. Whenever i tried to reset the VM it was hunging, so i tried restarting the mgmt-vmware service who would hung after the Watchdog too. I then pressed ctrl+Z to stop the process.
Have you tried
service mgmt-vmware restart
Thanks Sangokan, for the info, much appreciated.
In relation to the problem, it appears to have resolved itself in the time it took to do a bit of research into restarting the vpxa process. The restart of the mgmt-vmware service completed, albeit it just took along time to finish completing. (the shutdown of the host service must flush data onto disk before the restart) I didn't have to resort to stopping the vpxa service myself.
A futher dig about on the logs, it appears that the VMware update manager plug-in was enabled on the VI 2.5 server, and it seemed that in the midst of trying to update the ESX3.5 server the process hung, as the update manager couldn't properly retreive the updates from the web, so it looks like the vpxa process hung (I think to be the cause of the disconnection from virtual centre). I've since removed the update plug-in as we only have one 3.5 server and a number of other 3.0x servers.