VMware Cloud Community
XavierE
Enthusiast
Enthusiast

HA not working as expected

  • vSphere 4.0 U1 (2 ESX running with 4.0 U1 and vCenter 4.0 U1)

  • HA is configured with the default settings with 2 ESX server nodes (the only non-default settings are das.failuredectioninterval=5, das-failuredetectiontime=60000).

  • There's only 1 Service Console on each ESX (lets call them ESX1 and ESX2)

  • No firewalls in between.

  • Service consoles are in the same subnet and the default gateway (default Isolation address) as well.

HA had worked fine but the other day we had network issues which caused ESX1 to lose network connection, HA did not work correctly and All VMs on both ESX servers were shutdown. I brought everything back and from the events I noticed this error : All hosts in the HA cluster CLUSTER in DATA_CENTER were isolated from the network. Check the network configuration for proper network redundancy in the management network.

Later I reproduced the error by manually unplugging the network cables from ESX1. Once again all VMs on both ESXs were shutdown. I SSH to ESX2 and I confirmed that it had network connectivity, I could ping the default gateway, the vCenter (by name and IP) just fine and other hosts on the network as well. From vCenter I could ping ESX2 too.

I don't know why vCenter is reporting that all hosts are isolated.

Suggestions? has anyone experienced the same?

Tags (2)
0 Kudos
1 Reply
XavierE
Enthusiast
Enthusiast

It's been awhile since I posted this.

I had this resolved by cleaning up agents on ESX hosts:

1. Disconnect ESX host from vCenter server.

2. Check to see if VMware Automatic Startup/Shutdown is enabled with grep enable /etc/vmware/hostd/vmAutoStart.xml. (Do not proceed if this VM autostart is enabled since you can restart VM's, you should disable this feature first)

3. Check what agents are installed and if /tmp/vmware-root command was present with: rpm -qa | egrep -i '(vpx|lgto|aam)'; ls /tmp/vmware-root

4. If folder /tmp/vmware-root is not present run command mkdir -p /tmp/vmware-root. This folder is used for the vCenter to push the rpms into.

5. Uninstall agents, pools.xml, license.conf on ESX host then restart mgmt-vmware with the command:

rpm -e `rpm -qa | egrep -ir '(a|vpxam)'` ;rm -rf /etc/vmware/hostd/pools.xml; rm -rf /etc/vmware/license.cfg; userdel vpxuser; service mgmt-vmware restart

6. Look for BEGIN SERVICES in hostd logs with command tail -f /var/log/vmware/hostd.log or command grep "BEGIN SERVICES" /var/log/vmware/hostd.log

7. When hosts logs show BEGIN SERVICES, connect ESX host back into vCenter server.

0 Kudos