VMware Cloud Community
t3r0
Contributor
Contributor

Host keeps disconnecting from vcenter server

Hello,

We are having some major issues with one of our esx hosts connected to our vcenter server. It keeps disconnecting from vcenter server all the time. When reconnecting the host it connects just fine, but after like 1minute it just disconnects with no error messages. There's just alert saying that the host connection was lost and the host is grayed out in vcenter.

Everything was working fine for months and this started to happen few days ago. The host is up and running normally and if we connect to that host directly using VI client everything seems to be just fine.

It's not likely to be a network issue since the network connection between vcenter server and the host is up and working normally when this happens. no packet loss or anything.

There are two hosts connected to that same vcenter server and the other one has no issues at all.

versions:

vSphere 4 essentials

vCenter server 4.0.0, 162856

Both Hosts are ESX 4.0.0, 164009

Any ideas what to do or where to start debugging this issue?

Thanks,

- Tero

Reply
0 Kudos
20 Replies
bulletprooffool
Champion
Champion

Verify your DNS / IP information on the host.

also try using a different NIC - and of course verify that your NIC is negotiating the correct speed for thw switch to which it is connecting.

One day I will virtualise myself . . .
Reply
0 Kudos
t3r0
Contributor
Contributor

Thanks for your reply,

DNS and IP configurations on the host are all fine, I just double checked and tested the network connectivity between vcenter server and the host. Everything works there.

I cannot try a other NIC the host is running production virtual machines and all the 8 physical NICs on that host are in use.

Any other advise i could try?

- Tero

Reply
0 Kudos
t3r0
Contributor
Contributor

Anyone?

Reply
0 Kudos
t3r0
Contributor
Contributor

Hello,

Ok now I'am totally out of ideas why this keeps happening.. The network is fine, firewall settings on vcenter and on the hosts are (as far as I can tell) configured right.

Host firewall configuration:

Incoming:

SSH Server 22 (TCP)

CIM Secure Server 5989 (TCP)

SIM SLP 427 (TCP, UDP)

CIM Server 5988 (TCP)

Outgoing:

VMware Update Manager 80,9000-9100 (TCP)

VMware Consolidated Backup 443,902 (TCP)

VMware vCenter Agent 902 (UDP)

CIM SLP 427 (TCP, UDP)

Software iSCSI Client 3260 (TCP)

vCenter firewall allow rules have the following:

VMware vCenter Server - Host heartbeat

VMware vCenter Server - HTTP

VMware vCenter Server - HTTPS

VMware vCenter Server - Web Services HTTPS

VMware vCenter Server - VMwareVCMSDS LDAP Port

VMware vCenter Server Web Services HTTP

VMware vCenter Update Manager - SOAP Port

VMware vCenter Update Manager - SSL Port

VMware vCenter Update Manager - Web Port

All the Host services are running normally on the hosts and also the vCenter seems fine AND the other host connected to it works normally.

Can anyone help me with this? Is there some other firewall port or something that i've missed?

Any help is greatly appreciated!

- Tero

Reply
0 Kudos
NZSolly
Contributor
Contributor

Hi, I just had the exact same isse as you after host upgrade. Luckily, I have 2 service consoles, i connected using the 2nd s\c, however the next day i noticed that had disconnected also. I used putty to login and restart the management agents on the host and also checked the memory for the service console at 512Mb....I wish I could say I knew what did the trick to stop disconnection, but really, I when through the host disconnection check list here on the site...there's a list of about 12 different things to check...

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100340...

Reply
0 Kudos
t3r0
Contributor
Contributor

Hi, Thanks for your reply!

I've checked all the steps in to KB article about this issue, but none of them helped.

The service console memory is set to 561MB and about 70-100MB of that is unused all the time.

The host has 20 VMs running and cpu usage on the host is about 15% during daytime (some spikes go upto 25-30%) and loadavg is around 1

-Tero

Reply
0 Kudos
beyondvm
Hot Shot
Hot Shot

Have you tried restarting the management network services on the ESX host? As for DNS make sure that your VC server can resolve your ESX host(s) via FQDN and hostname and that your ESX host can resolve your VC host via FQDN and hostname.

Can you give us an idea of how your network is laid out in relation to your VC host and your ESX servers?

---

If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!!

www.beyondvm.com

--- If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!! www.beyondvm.com
Reply
0 Kudos
t3r0
Contributor
Contributor

The Host can connect to the vCenter normally via IP and FQDN.

I haven't tried to restart the management daemon.. Is it safe while the host has VMs running? and is that going to stop the VMs?

is the restarting as simple as: service mgmt-vmware restart ?

The network layout is really simple: all the computers are on the same network..

x.x.x.25 => vCenter

x.x.x.100 => Non working host

x.x.x.101 => Working host

Reply
0 Kudos
beyondvm
Hot Shot
Hot Shot

Nope it shouldnt disrupt the VMs, it just restarts the management agent. It will disrupt the connection to virtual center as well, but thats not an issue since it doesnt work already!

---

If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!!

www.beyondvm.com

--- If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!! www.beyondvm.com
Reply
0 Kudos
t3r0
Contributor
Contributor

Ok, I've just been reading about issues with esx3.5 shutting down all the VMs when the management service restarts. So this bug doesn't exist any more in esx4?

Should I disable the "start and stop virtual machines with the system" anyway just to be sure?

- Tero

Reply
0 Kudos
bulletprooffool
Champion
Champion

Can you open the firewalls to allow all ports, for a short period to test? Point to point rule?

The other thing to verify is DNS is fully functional from both ends (ie both host and vCentre can resolve each other)

Also, make sure that you have no hosts files messing you about.

try connecting by IP, elminate resolution issues as a test.

2 of the 3 times I had this, the problem was network related.

Once, firewalls and once we had a routing loop.

One day I will virtualise myself . . .
Reply
0 Kudos
t3r0
Contributor
Contributor

It seems that restarting the management service did the trick. Thank you all for help!

- Tero

Reply
0 Kudos
beyondvm
Hot Shot
Hot Shot

Excellent!

I haven't experienced that bug myself with 3.5 yet either, I would imagine that they fixed that.

---

If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!!

www.beyondvm.com

--- If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!! www.beyondvm.com
Reply
0 Kudos
patanne
Contributor
Contributor

I have been experiencing the exact same problem over the last 24 hrs. and seem to have determined that the disconnects are happening due to a firewall port. I haven't completely confirmed what ports. What I can say is this.

Since its release I have been running several installations of 3.5.x. I have chosen not to perform upgrades. I am migrating vm's to new 4.x host servers, afterwhich I am reinstalling the now-abandoned 3.5.x host with ESX 4.x. As part of the process I also created a whole new vCenter Server in a VMware Workstation VM. All was working great until I migrated the vCenter Server VM to the first ESX 4.x host. At that time I started getting the constant disconnects. Shutting off the firewall on the vCenter Server caused everything to immediately reconnect. While I am not advocating running a machine with the firewall shut off, I am simply saying there appears to be a port, that I have not yet identified, that is causing the disconnect issue.

Nailing down the offending port shouldn't take too long to do. I just need to dig up the right doc's first.

What I have not been able to explain is why the issue didn't happen when the vCenter Server VM was running on VMware Workstation. It only started when the VM was migrated to a ESX host.

Reply
0 Kudos
Bindegal
Contributor
Contributor

This may or may not be of relevance to your situation, but...

I noticed the exact same problems with hosts disconnecting from the Vcenter server.

However, since I installed Update 1 on the Vcenter 4 server, the disconnect problem disappeared completely.

Prior to that I tried, among other desperate things, reinstalling the Vcenter server = no change.

/Allan

Reply
0 Kudos
POP593
Contributor
Contributor

same issue, however, my happens only after the vCentre box is joined to the domain. (it works fine while stay in workgroup)

test it twice and same issues....any ideas?

Reply
0 Kudos
lihou
Contributor
Contributor

Same problem here. The vCenter is on Win2008 Std 64bit. Server is in AD.

At

first it is fine however after I upgraded all hosts from VI3.5 to

vSphere, and patch Windows 2008, problem happens. Host keep

disconnecting.

After 1 day troubleshooting, I found there are two problems in windows firewall rules. After modify it,vcenter looks fine so far.

1) VMWare vCenter Server - Host heartbeat rule, profile was 'Public' somhow.It needs to be changed to 'All profiles'

2) VMWare vCenter Server - HTTPS rule, profile was 'Public' somhow.It needs to be changed to 'All profiles'

I think all other VMware rules should be modified as well.

Reply
0 Kudos
merovingianA51
Contributor
Contributor

Hey folks, just wanted to add that I pretty much experienced the same scenario as Tero. I had migrated all my VMs to two ESX 4 hosts/vCenter 4 a few months ago – everything ran fine… until one maintenance window I shutdown both of my ESX hosts (note they did not shutdown properly – had to hard power them – I don’t believe this is related).

Anyways, when I brought them back up I found that one of the hosts was constantly disconnecting. I originally thought it was because my old Virtual center server (2.5) had been restarted and the host was trying to report to it instead, removing itself from the vCenter 4 server. However, I stopped and disabled all the services on the 2.5 server and the problem persisted. I tried all combinations of removing and re-adding the host to vcenter, to no avail. Then I found this thread.

The stopping and starting of the management agents solved my problem as well. Did it live with no affect to the service console or running VMs.






----


"Crippling Microsoft is the geek equivalent of taking down the Death Star"

-TabarnacST
Reply
0 Kudos