VMware Cloud Community
BDorroh
Contributor
Contributor

Host Disconnecting

I have an ESX 3.5 host (fully updated) that continually gets disconnected from the HA cluster. Normally it will disconnect and stay that way for 5-10 minutes and then reconnect its self. It happens daily and will usually result in VMs being migrated around. At times, I have to manually disconnect and reconnect the host. I've checked the NIC config and it seems ok (the same as the other three hosts in the cluster) but this is the only one that has this problem. Any thoughts?

Reply
0 Kudos
11 Replies
java_cat33
Virtuoso
Virtuoso

Does the same issue happen if you reboot the host?

Have you ever restarted the vpxa and hostd services?

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal

The host disconnects or it goes into "not responding"? I had this happen yesterday was told by a senior escalation enginner at vmware, while working on another HA issue that this is a known issue in U2. It has something to do with hostd spawns. They spawn so much hostd crashes. It comes back after a period of time, but from what I was told it's a known issue.

We haven't had a chance to get a fix yet for our enviornment since we're still working on unrelated HA issues. You should open up an SR to get a fix for it.

I was told this only happens in an U2 HA cluster.

hope this helps a bit.

Reply
0 Kudos
fmateo
Hot Shot
Hot Shot

Hello,

try to remove the server from the cluster, and the join the server again. Sometimes could work.

Other thing you could try is review the cable connection or port switch.

Regards

Reply
0 Kudos
BDorroh
Contributor
Contributor

I'll answer all of these in one place:

Java_Cat33: Yes. We can reboot the host or restart vpxa/hostd and i still have the same issues. It's the only host in the cluster that has this issue.

Troy: I guess it goes into not-responding and gets grayed out. if I leave it long enough, it will come back (90% of the time.) Other times, i have to disconnect it and then add it back. It does always come back though, at some point... I actually updated Virutal Center to U3 hoping it would fix this problem. But even after the upgrade, the hosts disconnects.

Fmateo: I've thought about making a new cluster and then moving all of the machines in there. That's something I will try this weekend. Also, I've though about cabling. I'm going to have the local admin swap the console and kernel cables this afternoon. We'll also try different switch ports for both as well. I hope some combination of all of that will resolve this.

Reply
0 Kudos
java_cat33
Virtuoso
Virtuoso

If your attempts this weekend fail - rebuild the box. Sometimes it's just quicker and easier (and often fixes the problem!)

Reply
0 Kudos
Rajeev_S
Expert
Expert

When it is not responding, does the root directory goes out of space?? I had the issue hostd process crashing and the log directory was getting filled up & ther was no space in root directory. Due to that the host went unresponding.

I renamed the /var/core directory to /var/core1 and renamed back to original name after a day. The issue was resolved.

Hope this might help you.

Reply
0 Kudos
Dean_Holland
Enthusiast
Enthusiast

We are seeing the same problem, I have had a case open for a week and a half and the Support Engineer hasn't managed to diagnose the issue yet.

Did the VMware engineer reference a bug ID that I can have attached to my SR?

Reply
0 Kudos
RParker
Immortal
Immortal

We are seeing the same problem, I have had a case open for a week and a half and the Support Engineer hasn't managed to diagnose the issue yet.

Did the VMware engineer reference a bug ID that I can have attached to my SR?

I know how to fix it. But first verify that your DNS is properly setup. If it is, then follow these steps.

Disconnect the host from VC. Login direct to the ESX host. Remove from VC also so there is no referrence to this host in VC (you won't lose the VM's)

Remove the vimuser and vimvmx users. Do a service managment restart (or from putty do service mgmt-vmware restart). If you have the SC, then also remove all the certificates (/etc/vmware/ssl) file.

Then wait about 2 minutes, long enough for the service agent to calm down. Then reconnect again. We had this problems, and betwen DNS remove / reconnecting the host I figured out where the problem is.

When I say verify the DNS, look at the /etc/hosts file and make sure it's right. VI may show one thing, but the config files may be different.

Reply
0 Kudos
Dean_Holland
Enthusiast
Enthusiast

Disconnect the host from VC. Login direct to the ESX host. Remove from VC also so there is no referrence to this host in VC (you won't lose the VM's)

Unfortunately removing the host from VC really isn't an option - we've got quite a bit of historical performance data we'd like to keep.

Reply
0 Kudos
malaysiavm
Expert
Expert

http://malaysiavm.com/blog/resolution-esx-hosts-unexpected-disconnected-from-virtual-center-esx-35-u...

hope this will help

Malaysia VMware Communities - http://www.malaysiavm.com

Craig vExpert 2009 & 2010 Netapp NCIE, NCDA 8.0.1 Malaysia VMware Communities - http://www.malaysiavm.com
Reply
0 Kudos
Dean_Holland
Enthusiast
Enthusiast

Unfortunately removing the host from VC really isn't an option - we've got quite a bit of historical performance data we'd like to keep.

RParker posted the same solution. Please read the whole thread before replying. ?:|

Reply
0 Kudos