VMware Communities > VMTN > Datacenter Virtualization Products > VI: ESX 3.5 > Discussions
8 Replies Last post: Jul 31, 2008 3:01 AM by Erik Zandboer
Reply

HA failure with ESX 3.5 update2

Jul 27, 2008 12:26 PM

Click to view Erik Zandboer's profile Expert Erik Zandboer 452 posts since
Jun 11, 2007

Hi,

Upgraded my two hosts from ESX 3.5 update1 to ESX 3.5 update2 using update manager. I upgraded my VirtualCenter server first. All went well, except I got an error on HA. On one of the hosts I got "HA agent has an error", the cluster showed "could not contact primary HA agent". Happens sometimes, so I selected "reconfigure for HA" on the failing host. This did not work. Ok, so I deselected HA all together, then reselected HA on clster level. Same problem occured.

I finally found the answer: When I looked at the configuration of my hosts, at "DNS and routing", I noticed a little "reboot" sign next to the hostname. Weird, since I did not change the names of the hosts. Reboot of the ESX hosts plus virtualcenter did not solve the problem. Finally I noticed that my hostnames started with a capital letter, and the dns name (obsiously) did not. After changing the hostsname to all non-capital characters, the reboot sign disappeared immediately without any reboot (!!??). Performed a reboot anyway, enabled HA on the cluster and all was well again.

Kinda answered my own question :) However, this may help others with HA problems after the upgrade...

Reply Re: HA failure with ESX 3.5 update2 Jul 27, 2008 12:45 PM
Click to view weinstein5's profile Champion weinstein5 3,999 posts since
Nov 19, 2005
It will be interesting to see if this is a bug in the upgrade process - capitalizing the hosts name on the upgraded hosts -
Reply Re: HA failure with ESX 3.5 update2 Jul 28, 2008 1:24 AM
in response to: weinstein5
Click to view Erik Zandboer's profile Expert Erik Zandboer 452 posts since
Jun 11, 2007

Hi... Done some more testing on other environments... It appears that you get a problem when either the configured hostname does not match your /etc/hosts file when it comes to capital chars.... Another environment had a capital letter in the /etc/hosts file, but none in the configured hostname and it also suffered from HA failures (allthough not so problematic... reconfigure for HA worked in this case)

So basically: If you have HA issues after upgrading to ESX 3.5 update2, make sure you match capitals in your configured hostnames and /etc/hosts file!

Reply Re: HA failure with ESX 3.5 update2 Jul 28, 2008 1:37 AM
in response to: Erik Zandboer
Click to view depping's profile Virtuoso depping 1,501 posts since
Jan 17, 2005
VMware Moderator
I can imagine this happens because VirtualCenter functions as some sort of DNS for HA as of 3.5U2. Inconsistent hostnames is probably misunderstood by VirtualCenter / HA.

Duncan
My virtualisation blog:
http://www.yellow-bricks.com

If you find this information useful, please award points for "correct" or "helpful".

Reply Re: HA failure with ESX 3.5 update2 Jul 28, 2008 1:44 AM
in response to: depping
Click to view Erik Zandboer's profile Expert Erik Zandboer 452 posts since
Jun 11, 2007
Sure thing... VMware used to have it as a best practice to fill the hosts file with the ESX servers and VC... Nowadays I think the best practice has been changed to not filling the hosts file. But I have seen three envirnments updated so far, and all had HA problems in some degree... So keep the capitalisation of your hostnames (and hosts file) in mind!
Reply Re: HA failure with ESX 3.5 update2 Jul 28, 2008 1:58 AM
in response to: Erik Zandboer
Click to view depping's profile Virtuoso depping 1,501 posts since
Jan 17, 2005
VMware Moderator
Anyway thanks for the information, and i will blog about this if you don't mind! this is valuable!

Duncan
My virtualisation blog:
http://www.yellow-bricks.com

If you find this information useful, please award points for "correct" or "helpful".

Reply Re: HA failure with ESX 3.5 update2 Jul 28, 2008 6:50 AM
in response to: Erik Zandboer
Click to view jobl's profile Enthusiast jobl 26 posts since
May 10, 2005
I have seen problems with capital letters in hostfiles before (On version 3.0.X atleast) as the HA scripts tries/tried to grep for the host name without using "grep -i".
Reply Re: HA failure with ESX 3.5 update2 Jul 31, 2008 2:57 AM
Click to view Argyle's profile Enthusiast Argyle 65 posts since
Dec 29, 2006
Maybe I'm missing something but in U2 isn't Virtual Center suppose to take over the DNS role. Why is there in that case still a dependancy on the hosts file? It should ignore hosts and DNS entries should it not?
Reply Re: HA failure with ESX 3.5 update2 Jul 31, 2008 3:01 AM
in response to: Argyle
Click to view Erik Zandboer's profile Expert Erik Zandboer 452 posts since
Jun 11, 2007

Uhm I think the ESX hosts will check their hosts file first, and DNS only after that. So leaving your hosts file empty might solve the problem as well... It used to be best practice to always fill your hosts file (for HA especially), but nowadays I think VMwares best practice is NOT to fill the hosts file (HA uses its own dynamically built hosts file anyway...)
Actions