I am doing a proof of concept at the moment using 2x Dell PE 2950's I am trying to configure HA and I am running into some problems. I have created the datacentre and added the two ESX hosts and configured all of their networking. I have added a cluster and added both host to it. I have configure DNS and allmachines involved including the VIC can ping each other via DNS. I can enable either machines HA agent but not both. For instance I can exit maintenance mode on esx1 and configure the HA agent everything is fine, exit esx2 from maintenacne mode and HA agent configurations fails giving nothing more than "Description: HA agent has an error - <date time>, HA agent on <192.168.xxx.xxx> in cluster HA in <datacentre> has an error".
Can anyone give some possibilities on what is causing this and how to fix it?
Thanks in advance
Make sure no typos were made when setting the FQDN of your ESX host - I have seen this occur when some one mistyped the FQDN on one of the hosts -
both FQDN's listed in VIC are spelt correctly
thanks for your response
put both esx server name (FQDN and short) in your /etc/hosts. Try reconfigure HA again.
I believe they are correct however I cannot ping via the short names.
127.0.0.1 localhost.localdomain localhost esx1
192.168.xxx.xxx esx1.test.co.uk esx1
127.0.0.1 localhost.localdomain localhost esx2
192.168.xxx.xxx esx2.test.co.uk esx2
well I think I have got a little further but it still isnt working.
Due to my lack of windows skill (live in a linux world) I have found it helps to ensure that the DNS server records have been update to its data file. once this was done I could ping via FQDN and shortname tried to configure HA again and it failed becasue esx1 resolved to both 127.0.0.1 and 192.168.xxx.xxx so I dropped esx1 from the local host line and now it doesnt give me any usable error.
So is HA now activated in your testing environment? "no usable error": Does this mean it works or does this mean it does not work and ESX doesn't tell me why?
Sorry that wasn't too clear. It still doesnt work but it doesnt tell me why it doesn't work just reports the HA agent has an error
Ow ok. I recently ran into this exact situation where I upgraded an environment to ESX 3.5 update2. It seems that an ol dproblem is back, which is capatalization of hostnames. Make sure all your hostname references all all lower case letters (in order to match DNS name to configured Hostname to /etc/hosts file). That solved the problem for me. Do not forget to reboot your ESX host(s) if you need to change the hostname of the server.
Next, try to reconfigure the host for HA (rightclick the host then select the option). If that fails, try to disable HA on a cluster level, then reenable HA on the cluster.