VMware Cloud Community
jerob319
Contributor
Contributor

Host keeps disconnecting from vCenter Server 4.1

Hi Everyone,

I have tried relentlessly to find a solution to my problem over here with vCenter 4.1 where a host keeps getting disconnected from vCenter after every 90 seconds. Based on the kb articles I have read it could be related to an IP/DNS/Heartbeat issue, none of which seem to be improperly configuerd in my situation.

Anyways here's a layout of my network, I have the main LAN which is on a 10.1.1.X network and a DMZ network which is on a 192.168.200.X ip scheme. There's no DHCP or DNS setup for the DMZ port, I assign every computer a static ip and google's public dns servers.Communication works between the subnets. I can ping the DMZ gateway just fine from my 10.x network and vice versa.

My ESXi host is on the DMZ network with an IP address of 192.168.200.11. My vSphere Client 4.1 is on my 10.1.1.x network as that's where my workstation is and I use that to connect to the host which works just fine. Never had an issue.

So today I downloaded vCenter 4.1 and installed it on my workstation and I added my ESXi host to it with the little wizard it has. The license agents etc are all added to the host just fine and everything shows up and running. Then about 90 seconds later, the host gets disconnected and theres a (not responding) message next to the host.

Now i'm not sure what exactly the issue is wether its a DNS issue or IP issue or some config files on the kernel. Solutions I have tried so far and which have not worked are :

1) Disabled the firewall on my workstation to see if that was an issue

2) Changed the xmlconfiguration file to xmlconfiguration.old ( as per VMware's kb article I had found)

3) Under vCenter settings and runtime settings, I assigned the management IP as the ip address of the computer where my vCenter is installed on which is 10.1.1.54 in my case.

4) Rebooted ESXi, restarted mangement services, disconnected host from vCenter and re-added host

Now i'm wondering since the host is on a DMZ, if anything has to be done with ports etc on the router. But i'm not sure what exactly has to be done. I found this on one of the VMware kb articles, but not really sure how to interpret it.

--The host goes into the Not Responding mode for a default 90 seconds time after adding it to vCenter Server. In case the vCenter Server is multi-homed, verify that the internal IP (that is reachable by the ESX hosts) is set as the management IP--.

But that's exactly the issue i'm having. Any ideas?

0 Kudos
14 Replies
Troy_Clavell
Immortal
Immortal

I would say to first add the ESXi Host into inventory using FQDN.  Also, setup the /etc/hosts file with the FQDN and IP of the ESXi Host as well as the vCenter Host.  vCenter should have it's hosts file updated as well with the same entries.

Once that is done, see if the disconnects go away.

0 Kudos
bulletprooffool
Champion
Champion

First thing you should always do when you can't connect to - or if  there is an issue with your connection between vCenter and ESXi \ ESX is  to:

1) check DNS configuration on the ESXi server and your DNS server  that ESX points to making sure you have the appropriate entries
2) Check host files etc in /etc/hosts, /etc/resolve.conf, /etc/sysconfig/network and /etc/vmware/esx.conf files
2) try to disconnect and reconnect your ESXi host from your vCenter  inventory, this uninstalls and reinstalls the vCenter agent using FQDN  and then with IP address if FQDN didn't work
3) Try Restarting both the vCenter management agent on the ESX host and  the ESX host management agent. Learn how to do this here: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100349...
4) If the above didn't do anything for you, it could be lost  connectivity to a LUN, which can cause problems with ESX (less now than  earlier versions ESX 2.x), connect to ESXi host directly with VI Client  and perform a rescan of your storage adaptors and LUNs.

One day I will virtualise myself . . .
jerob319
Contributor
Contributor

Imagine my DMZ networks as a home network where a user gets DHCP and DNS from their router. So my ESXi host ip is statically assigned and the DNS address i'm using at the kernel is just google's dns at 8.8.8.8. Could this be an issue?

I'm connecting to my ESXi from vCenter through ip address anyways, so where does FQDN come into the picture? I'm not that well versed with assigning host names to ip, but can you explain why I would need FQDN when i'm using an ip to connect.

0 Kudos
jerob319
Contributor
Contributor

On my esxi host these are the following entries. (My esxi host has a gateway of 192.168.200.1 and and ip address of 192.168.200.11) There's no DNS server or DHCP server for my DMZ network. Its all static. The ip address of where my vCenter server is installed on is 10.1.1.54.

1) Under etc/hosts - 192.168.200.11      esxilab.local esxilab

2) Under etc/resolve.conf - nameserver 192.168.200.1

                                            search local

3) etc/sysconfig/network - This path is not there in my esxi host

4) etc/vmware/esx.conf  - This is a huge config file of everything set for my esxi host, what exactly should I look for here in this file? So I decided to attach it over here, and maybe someone can take a look?

I'm a realy newbie at this, so thanks for all the help.

0 Kudos
Radico
Contributor
Contributor

Could it be some sort of power saving setting of your vCenter NIC?

0 Kudos
DSTAVERT
Immortal
Immortal

Do you have routes on the router that provides the DMZ? Can you ping in both directions between the ESXi host in the DMZ to the vCenter server and from the vCenter server to the DMZ ESXi host? You can use hosts files on the vCenter server and on the ESXi hosts.

-- David -- VMware Communities Moderator
0 Kudos
jerob319
Contributor
Contributor

Power setting? Where do I check for something like that on vCenter. Can you tell me where please.

Thanks.

0 Kudos
jerob319
Contributor
Contributor

yes I can ping from a comptuer on the DMZ to the computer where my vCenter is and vice-versa also. So the routes are good. I'm not using any hostnames in my scenario. I'm only using IP addresses on my esxi hosts, so not sure why I would have to add hostnames.

Plus, the wierd part is that it adds the host and then disconnects after 90 seconds, its like something is just getting mixed up after a while.

0 Kudos
Radico
Contributor
Contributor

check the properties of the network adapter, power savings tab

0 Kudos
jerob319
Contributor
Contributor

On network adapters, there's no power savings tab. Do you know where the log file is where I can check once the host is connected and then disconnected, that way I can see what exactly is triggering the disconnect after 90 seconds.

0 Kudos
vtucker
Contributor
Contributor

On the host in question, on the host side, check the messages for anything relating to the Uplink...it'd look something like:

vobd: May 01 05:57:01.698: 347821098811us: [vob.net.pg.uplink.transition.down] Uplink: vmnic0 is down. Affected portgroup: VM Network. 0 uplinks up. Failed criteria: 130.

Vmkernel would give a similiar message (if classic esx)

vmkernel: 4:00:37:00.775 cpu3:4099)<6>tg3: vmnic0: Link is down.

At least maybe we can see which side of the situation is more likely to be the cause.

These would be

/var/log/messages

/var/log/vmkernel

NCIE, VCAP-DCA[45] #283, VCP5-DT - http://sev3.net - Severity 3
0 Kudos
legalcloud
Contributor
Contributor

Have you checked that UDP Port 902 is open on your vCenter Windows firewall?

If I understand correctly, ESX hosts send a heartbeat to vCenter server every 30secs or so, and if no reply is given will disconnect.

I think this port gets opened up by default when vCenter is installed. However, have experienced the issue when I have joined our vCenter servers to the Domain, and the Domain Firewall is put in place that doesn't have this port opened.

Here is another forum discussion on this topic

http://communities.vmware.com/thread/156810

0 Kudos
AleShima
Contributor
Contributor

Thanks legalcloud.

That was my problem.

I have two ESX hosts with two network cards, and I recently changed the IP of these servers in vCenter, to use the private network (in this way I'll not be billed by traffic between hosts and vCenter in SoftLayer - vmotions, backups, etc), and since then the servers were been disconnected from vCenter every 90 seconds.

I've checked all configuration files mentioned in this and other posts, removed and add the hosts from vCenter, checked the DNS configuration, etc, with no luck.

But the problem in fact was the firewall configuration in the vCenter server, that wasn't allowing comunication from the ESX's private network.

0 Kudos
JonRavenscraft
Contributor
Contributor

I ran into a similar issue with a remote-site ESX 4.1 server connecting to vCenter Server 5.0.  The host was pinging, ports 902 and 903 were open and our vCenter server was able to get the host to connect for short periods of time, but after about 90 seconds the host would go back to a disconnected state.  I was not aware that there was any heartbeat related traffic going across the SSH protocol (port 22) but when comparing this host to another that was not experiencing the issue, the only thing that was not identical was:

# esxcfg-firewall -q sshClient

Service sshClient is blocked.

I followed that up with:

[root@localhost ~]# esxcfg-firewall -e sshClient

[root@localhost ~]# esxcfg-firewall -q sshClient

Service sshClient is enabled.

Suddenly the vCenter instance for the host reconnected and has stayed stable and connected ever since.

0 Kudos