VMware Cloud Community
xcom3
Contributor
Contributor

ESX 3.5 host is disconnecting all the time

I wonder if any other had the same problem and if anyone has a solution.

Scenario is this:

I just installed VC Foundation (2.5 U4) and a single ESX 3.5 U4 host. Innstallation was fine, no errors. But just a very short moment after I added the host to VC, it is in a not responding state is is grayed out. The VC and ESX is in different subnets, but with all ports open.

If I disconnect and reconnect the host, it comes back online for about 30-60 seconds, and then it happens again. I can ping the server fra all subnets all the time when this is happening, and I can also use VI Client directly into the ESX host from any subnet.

Is this something of an ARP issue or what? The ESX is a IBM 3850 M2 server with Broadcom NetExtreme II dual nic. I also have 2 dual nic Intel cards in it, but I haven't tried to use those cards instead of the onboard nic, maybe it would solve the issue?

Thanks in advance!

Tags (3)
0 Kudos
7 Replies
COdlk
Hot Shot
Hot Shot

Anything in the logs (i.e. /var/log/messages, /var/log/vmkernel etc. )on the ESX host?

david

0 Kudos
AndreTheGiant
Immortal
Immortal

Is this something of an ARP issue or what?

Seems no... as you say that ping does work for all time.

PS: have you tried to ping both from VC and from ESX?

Do you connect the ESX host using IP or using full hostname?

Andre

**if you found this or any other answer useful please consider allocating points for helpful or correct answers

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
COdlk
Hot Shot
Hot Shot

Do you have any monitoring setup so you can pin point exactly when this is happening? Did you check you logs? Talk to your network admin and see if they are seeing anything on the switch.

david

0 Kudos
mvoss18
Hot Shot
Hot Shot

First thing is to check the logs. I've seen hosts disconnect if there is a storage related issue.

I've seen random host disconnects in a larger environment and it's a real pain. The fact that vCenter and Host are on different subnets shouldn't matter since vCenter can be used to manage hosts across a WAN. I might check that the switch ports are set correctly (Auto for GigE) and make sure the network settings for vCenter & Host are setup correctly as well. For example if you're using NIC teaming on the vCenter server, are there any unusual configurations? Another recommendation we got from VMware is to make sure the Service Console is on its own vSwitch, not shared with VM port groups.

Lastly, you could try creating a VM on the host and install vCenter, then add the Host to this vCenter and see if you still have connection issues.

0 Kudos
CiscoKid
Enthusiast
Enthusiast

XCOM3, you maybe experiencing resource issues on the Service Console that maybe causing your disconnects. We currently are experiencing the same issue on HP Proliant BL480c servers in multiple clusters. Some say it's a problem with the storage while others are pointing at the HP Agents, memory exhaustion, CPU utilization and etc. I have tried all recommendations such as stopping the HP Agents, ensuring proper zoning to storage arrays, increase service console memory to 800MB, giving 1500MHz CPU reservation to the service console, restarting the pegasus service on a daily basis, restarting the mgmt-vmware service and so forth. Restarting the services brings the host temporarily online but after 2 minutes it shows as not responding again.

It is rumored that VMware is aware of this scenario and are working towards resolving this issue in Update 5 that should be released in the very near future. In the meantime, ensure that the service console memory is set to 800MB if it can be afforded, set the CPU reservation for the service console to 1500MHz and if it does happen again, issue the following commands from the service console:

service pegasus restart

service mgmt-vmware restart

service vmware-vpxa restart

Please ensure to award points if it resolves you issue (which may not be until you install Update 5). Thanks.

0 Kudos
xcom3
Contributor
Contributor

Thanks for all reply so far.

It seems that I have some firewall problems as well, I can't do dns lookups or ping from the esx host to VC, so I need to correct that first before I go any further I believe.

0 Kudos
xcom3
Contributor
Contributor

Ah.. problem solved!

It turned out that I indeed had firewall issues, I thought everything was open (i'm not the fw admin). And, I also made correct reverse DNS entries, don't know if ESX or VC is picky about this, I tought that it only cared for A records.

But anyway, not it works like a dream Smiley Happy

0 Kudos