There are two ESX Servers (both version: 3.0.1) working fine in Cluster with HA & DRS. Today, I upgraded those two Servers to 3.0.2 with tarball individually. When I put them in the Cluster, there are two error messages show:
"Insufficient resources to satisfy HA failover level on cluster in data center."
"HA agent on 'XXXX' in cluster 'XXXX' in "DataCenter" has an error."
Both ESX Servers just support 7 VMs, and only 1 VM is running. We didnt do any configuration after upgrade.
I checked /etc/hosts[/i] file on both servers. It says:
127.0.0.1 localhost.localdomain localhost
192.168.0.1 esx1
192.168.0.2 esx2
When we put HA enable in Cluster setting, both host servers will be showed with red alert. When we put HA disable but DRS, everything is fine.
Any ideas???
Thanks all.
hihiy
I remember something simmilar happening with the 3.0.0 to 3.0.1 upgrade - you might search for that...
Have you tried turning HA off completely on that cluster waiting on it to finish then re-enabling it...?
I had a similar issue with HA on my ESX 3.0.2 install - I tried to upgrade all my hosts but failed, so did a bare-metal install. After adding all the hosts back into Virtual Center, I initially got the HA error. I disabled HA on the cluster and re-enabled it, and everything is fine now.
Also, try setting your hosts file to list the localhost entry after all the other entries.
edit - see this thread also: http://www.vmware.com/community/thread.jspa?threadID=97774&tstart=0
Are you running VC 2.0.2? Have you restarted Virtual Center since upgrading?
Message was edited by:
jprior
I take the server out of the cluster first, then remove it from VIC, do the upgrade then add it back to VIC then the Cluster.
Didn't have problems doing it that way
I see some issue about HA error and DNS resolution was the problem.
If you have internal DNS in your network, try to just remove the static entry on the /etc/hosts file.
If you don't, the static entry must be
192.168.1.xxx esx01.domain.com esx01
Have you try to just ping with the hostname one ESX from the other?
Message was edited by:
jfrichard
Have you tried creating a new cluster, enabling HA and then adding the hosts to it? It's a pain, but I had experienced similar issues and that was the only way to resolve. Review the legato logs to see if there are additional messages regarding the HA error.
Are you able to vMotion between 2 of your host when they are outside your cluster
HA is anal about the network settings i suggest you check all your settings
I made a list of everything i need to check prior to clustering some of my servers, this is to standardise my network configuration on all my hosts, hope this helps somebody
Ensure network configuration is correct in the following config files
1. Putty into ESX host
2. Logon as Root
The following you can copy and past as is
3. vi /etc/sysconfig/network
4. vi /etc/hosts
5. vi /etc/resolv.conf
6. service network restart
7. In the VI client go to configuration, Software - Routing.. Ensure that configuration matches what you changed above.
PS it would seem that case does matter.
8. Migrate all guests off the host.
9. In VI go to Configuration DNS & Routing make sure your Domain and Search domain and host names are correct.
Restart esx server
Any suggestions of comments will be welcome
I see some issue about HA error and DNS resolution
was the problem.
If you have internal DNS in your network, try to just
remove the static entry on the /etc/hosts file.
If you don't, the static entry must be
192.168.1.xxx esx01.domain.com esx01
y to just ping with the hostname one ESX from the
other?
Message was edited by:
jfrichard
If you have added your ESX hosts to VC with the FQDNs, which is best practices, then you also need to have the FQDNs of the hosts in your hosts file. The syntax is :
192.168.1.1 esx01.yourdomain.com esx01
Note the lower case (important)
After you've changed that on all ESX hosts, disable HA on the cluster and re-enable it.
Hope this helps.