cognitoErgo
Contributor
Contributor

VMware HA

Hello,

I am doing a proof of concept at the moment using 2x Dell PE 2950's I am trying to configure HA and I am running into some problems. I have created the datacentre and added the two ESX hosts and configured all of their networking. I have added a cluster and added both host to it. I have configure DNS and allmachines involved including the VIC can ping each other via DNS. I can enable either machines HA agent but not both. For instance I can exit maintenance mode on esx1 and configure the HA agent everything is fine, exit esx2 from maintenacne mode and HA agent configurations fails giving nothing more than "Description: HA agent has an error - <date time>, HA agent on <192.168.xxx.xxx> in cluster HA in <datacentre> has an error".

Can anyone give some possibilities on what is causing this and how to fix it?

Thanks in advance

Tags (2)
0 Kudos
8 Replies
weinstein5
Immortal
Immortal

Make sure no typos were made when setting the FQDN of your ESX host - I have seen this occur when some one mistyped the FQDN on one of the hosts -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
cognitoErgo
Contributor
Contributor

both FQDN's listed in VIC are spelt correctly

Name: esx1

Domain: test.co.uk

Search Domains

test.co.uk

Name: esx2

Domain: test.co.uk

Search Domains

test.co.uk

thanks for your response

0 Kudos
postfixreload
Hot Shot
Hot Shot

put both esx server name (FQDN and short) in your /etc/hosts. Try reconfigure HA again.

0 Kudos
cognitoErgo
Contributor
Contributor

I believe they are correct however I cannot ping via the short names.

esx1

127.0.0.1 localhost.localdomain localhost esx1

192.168.xxx.xxx esx1.test.co.uk esx1

esx2

127.0.0.1 localhost.localdomain localhost esx2

192.168.xxx.xxx esx2.test.co.uk esx2

0 Kudos
cognitoErgo
Contributor
Contributor

well I think I have got a little further but it still isnt working.

Due to my lack of windows skill (live in a linux world) I have found it helps to ensure that the DNS server records have been update to its data file. once this was done I could ping via FQDN and shortname tried to configure HA again and it failed becasue esx1 resolved to both 127.0.0.1 and 192.168.xxx.xxx so I dropped esx1 from the local host line and now it doesnt give me any usable error.

Cheers,

0 Kudos
Erik_Zandboer
Expert
Expert

So is HA now activated in your testing environment? "no usable error": Does this mean it works or does this mean it does not work and ESX doesn't tell me why?

Visit my blog at http://www.vmdamentals.com
0 Kudos
cognitoErgo
Contributor
Contributor

Sorry that wasn't too clear. It still doesnt work but it doesnt tell me why it doesn't work just reports the HA agent has an error

0 Kudos
Erik_Zandboer
Expert
Expert

Ow ok. I recently ran into this exact situation where I upgraded an environment to ESX 3.5 update2. It seems that an ol dproblem is back, which is capatalization of hostnames. Make sure all your hostname references all all lower case letters (in order to match DNS name to configured Hostname to /etc/hosts file). That solved the problem for me. Do not forget to reboot your ESX host(s) if you need to change the hostname of the server.

Next, try to reconfigure the host for HA (rightclick the host then select the option). If that fails, try to disable HA on a cluster level, then reenable HA on the cluster.

Visit my blog at http://www.vmdamentals.com
0 Kudos