hihiy
Contributor
Contributor

Urgent!!! HA error after ESX upgrade from 3.0.1 to 3.0.2!!!

There are two ESX Servers (both version: 3.0.1) working fine in Cluster with HA & DRS. Today, I upgraded those two Servers to 3.0.2 with tarball individually. When I put them in the Cluster, there are two error messages show:

"Insufficient resources to satisfy HA failover level on cluster in data center."

"HA agent on 'XXXX' in cluster 'XXXX' in "DataCenter" has an error."

Both ESX Servers just support 7 VMs, and only 1 VM is running. We didnt do any configuration after upgrade.

I checked /etc/hosts[/i] file on both servers. It says:

127.0.0.1 localhost.localdomain localhost

192.168.0.1 esx1

192.168.0.2 esx2

When we put HA enable in Cluster setting, both host servers will be showed with red alert. When we put HA disable but DRS, everything is fine.

Any ideas???

Thanks all.

hihiy

0 Kudos
8 Replies
WSPSE
Contributor
Contributor

I remember something simmilar happening with the 3.0.0 to 3.0.1 upgrade - you might search for that...

Have you tried turning HA off completely on that cluster waiting on it to finish then re-enabling it...?

0 Kudos
jprior
Enthusiast
Enthusiast

I had a similar issue with HA on my ESX 3.0.2 install - I tried to upgrade all my hosts but failed, so did a bare-metal install. After adding all the hosts back into Virtual Center, I initially got the HA error. I disabled HA on the cluster and re-enabled it, and everything is fine now.

Also, try setting your hosts file to list the localhost entry after all the other entries.

edit - see this thread also: http://www.vmware.com/community/thread.jspa?threadID=97774&tstart=0

Are you running VC 2.0.2? Have you restarted Virtual Center since upgrading?

Message was edited by:

jprior

0 Kudos
zbenga
Enthusiast
Enthusiast

I take the server out of the cluster first, then remove it from VIC, do the upgrade then add it back to VIC then the Cluster.

Didn't have problems doing it that way

0 Kudos
admin
Immortal
Immortal

I see some issue about HA error and DNS resolution was the problem.

If you have internal DNS in your network, try to just remove the static entry on the /etc/hosts file.

If you don't, the static entry must be

192.168.1.xxx esx01.domain.com esx01

Have you try to just ping with the hostname one ESX from the other?

Message was edited by:

jfrichard

0 Kudos
darren_boyd
Enthusiast
Enthusiast

Have you tried creating a new cluster, enabling HA and then adding the hosts to it? It's a pain, but I had experienced similar issues and that was the only way to resolve. Review the legato logs to see if there are additional messages regarding the HA error.

0 Kudos
admin
Immortal
Immortal

Are you able to vMotion between 2 of your host when they are outside your cluster

0 Kudos
Timber_Wolf
Contributor
Contributor

HA is anal about the network settings i suggest you check all your settings

I made a list of everything i need to check prior to clustering some of my servers, this is to standardise my network configuration on all my hosts, hope this helps somebody

Ensure network configuration is correct in the following config files

1. Putty into ESX host

2. Logon as Root

The following you can copy and past as is

3. vi /etc/sysconfig/network

4. vi /etc/hosts

5. vi /etc/resolv.conf

6. service network restart

7. In the VI client go to configuration, Software - Routing.. Ensure that configuration matches what you changed above.

PS it would seem that case does matter.

8. Migrate all guests off the host.

9. In VI go to Configuration DNS & Routing make sure your Domain and Search domain and host names are correct.

Restart esx server

Any suggestions of comments will be welcome

bjmoore
Enthusiast
Enthusiast

I see some issue about HA error and DNS resolution

was the problem.

If you have internal DNS in your network, try to just

remove the static entry on the /etc/hosts file.

If you don't, the static entry must be

192.168.1.xxx esx01.domain.com esx01

y to just ping with the hostname one ESX from the

other?

Message was edited by:

jfrichard

If you have added your ESX hosts to VC with the FQDNs, which is best practices, then you also need to have the FQDNs of the hosts in your hosts file. The syntax is :

192.168.1.1 esx01.yourdomain.com esx01

Note the lower case (important)

After you've changed that on all ESX hosts, disable HA on the cluster and re-enable it.

Hope this helps.