VMware Cloud Community
DFATAnt
Enthusiast
Enthusiast
Jump to solution

Hosts Disconnecting in VirtualCenter

I have a VirtualCenter server that manages ESX servers located all around the world. Our WAN links vary in size (and coming out of Australia they aren't as big as other countries around the world generally have), but they are sufficient enough to get the ESX servers to connect to VirtualCenter and work.

Now the problem that I have is that on the occasion, I get in to work of a morning to find that an ESX server in one of the remote sites is disconnected from VirtualCenter. I would be blaming the WAN link for the issue except that we have several ESX servers located at the site and only one is disconnected. I know the ESX server has still up and working and the guests on the ESX server are also up and working. I am also able to connect to the ESX server using the VirtualCenter client to login directly.

The only way to get the ESX server connected back in to VirtualCenter is to disconnect (there is no option to connect) and then to connect again. This process takes along time for both the disconnect and the connect. On the occassion, the connect doesn't work and we either have to restart the vmware-mgmt service or worse still, reboot the ESX server.

This problem was happening on VirtualCenter 2.0 and now on 2.5.

Does anyone have any suggestions for some fine tuning or anything else that I might be doing that is causing this issue.

Cheers

Ant

Reply
0 Kudos
1 Solution

Accepted Solutions
RParker
Immortal
Immortal
Jump to solution

Your problem is DNS. I had this problem, and eventually figured out that is what is happening.

The hosts file on that machine should have the short and FQDN entries for that ESX host name. Then /etc/sysconfig/network should also be updated properly. Then the VC should be able to ping the FQDN for that ESXhost.

After doing those updates, delete the certificates and reissue new ones.

delete/rename the files in /etc/vmware/ssl and do service mgmt-vmware restart to reissue the certificates. If you do the updates in this order, you can save yourself a reboot and only restart the service once.

Verify that the host has the proper DNS entries for the name and IP and that VC communicates to that ESX host name and that ESX host can communicate via FQDN to the VC.

That should fix it.

View solution in original post

Reply
0 Kudos
3 Replies
serracon_suppor
Enthusiast
Enthusiast
Jump to solution

i have no real solution to that, but reading the "reboot server" sentence made me post a reply.

when you stop the mgmt-vmware service send another stop for vmware-vpxa

additionaly do a "ps aux|grep hostd" and kill any such tasks, begin with the watchdog. then do a "ps aux|grep vpxa" and kill it the same way. (if any hung)

you then can start mgmt-vmware and vmware-vpxa in that sequence.

this might save most of the reboots.....

RParker
Immortal
Immortal
Jump to solution

Your problem is DNS. I had this problem, and eventually figured out that is what is happening.

The hosts file on that machine should have the short and FQDN entries for that ESX host name. Then /etc/sysconfig/network should also be updated properly. Then the VC should be able to ping the FQDN for that ESXhost.

After doing those updates, delete the certificates and reissue new ones.

delete/rename the files in /etc/vmware/ssl and do service mgmt-vmware restart to reissue the certificates. If you do the updates in this order, you can save yourself a reboot and only restart the service once.

Verify that the host has the proper DNS entries for the name and IP and that VC communicates to that ESX host name and that ESX host can communicate via FQDN to the VC.

That should fix it.

Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast
Jump to solution

Thanks RParker,

I had DNS setup correctly. The only thing that I didn't have was the short name of the ESX host in the hosts file. I think the thing that got the ESX server to connect back in to VirtualCenter was deleting the ssl files and having the certificates reissued. That was much easier than doing a reboot (which is always the last resort).

I'm still at a loss as to why this happened in the first place. I'm not convinced that the lack of a short name in the hosts file would cause this. I had another ESX server from the same site disconnect yesterday after I posted this original message, but it was able to connect back to VirtualCenter without any issue, and it doesn't have its short name in the hosts file.

Thank again for the help.

Cheers

Ant

Reply
0 Kudos