Dave365
Contributor
Contributor

HA problems after VirtualCenter 2.5 update 2

Jump to solution

Hey guys,

After upgrading VirtualCenter from 2.5 to 2.5 update 2, a number of the ESX hosts now have the same HA error.

"cmd addnode failed for primary node: /opt/vmware/aam/bin/ft_startup failed"

I have tried the "Reconfigure for VMware HA" and rebooting the hosts but the error still remains.

Anyone any ideas?

0 Kudos
1 Solution

Accepted Solutions
dmaster
VMware Employee
VMware Employee

Hi Dave365,

VMware HA is still a little bit buggy..

maybe this will resolve your problem.

create a new VMware HA cluster and add the esx hosts into the new cluster.

(if possible put your esx hosts in maintenance mode before you remove them from the old VMware HA cluster.)

View solution in original post

0 Kudos
11 Replies
dmaster
VMware Employee
VMware Employee

Hi Dave365,

VMware HA is still a little bit buggy..

maybe this will resolve your problem.

create a new VMware HA cluster and add the esx hosts into the new cluster.

(if possible put your esx hosts in maintenance mode before you remove them from the old VMware HA cluster.)

View solution in original post

0 Kudos
Rubeck
Virtuoso
Virtuoso
0 Kudos
bguthrie1
Contributor
Contributor

Just had the same issue. VMWare support had me edit ny hosts file to use all lower case, and change the host name to be all lowercase. Something about the latest upgrade did away with a fix that allowed mismatched cases.

0 Kudos
Dave365
Contributor
Contributor

I can see how hostnames could cause a problem alright but everything seems to be lower case

0 Kudos
dmaster
VMware Employee
VMware Employee

did you try my suggestion ? it looks a little bit stupid, but we went in a couple of scenarios with problems due to HA clusters and this was the only working "workaround".

0 Kudos
JanMolbysBelly
Contributor
Contributor

I assume it doesnt matter that Hostnames in DNS are in caps?

In our case they are all in caps in DNS and some are working and some aint.

0 Kudos
bguthrie1
Contributor
Contributor

I don’t see how it would matter. After renaming I had to create a new cluster and re add my host. Seems some of the data stored in the cluster does not change with the hostname.

Thanks,

Bill Guthrie MCSE 2003, VCP/VI3

Network Administrator

Information Technology Department C.C.B.C.C

Office: 941.764.5536

Cell: 941.628.5554

Fax: 941.743.1957

Bill.Guthrie@charlotteFL.com

www.CharlotteCountyfl.com

“To Exceed Expectations in the Delivery of Public Services”

Dave365
Contributor
Contributor

Sorry dmaster, I was a little too low on resources to try your idea but I've freed up a few hosts now and I was able to try it.

It seems moving the host to a different cluster fixes the problem. And when you move it back to the original cluster the problem returns.

So I guess the solution is simply make a new cluster and move all your hosts into it, then remove the old empty one. It feels like a shallow victory but I'll take it Smiley Happy

0 Kudos
DGWS
Contributor
Contributor

HI

I've set the DNS in the /etc/hosts file to lower case and also edited the /etc/vmware/esx.conf to be lowercase and also the /etcsysconfig/network file to be in lowercase for the hostname. How ever I'm still geting HA errors across alot of my ESX hosts.

Thanks

0 Kudos
JeffXstate
Contributor
Contributor

I was getting the same error - /opt/vmware/aam/bin/ft_startup - on one of my ESX hosts in a cluster of four - lets call that Cluster A with HA enabled. I moved the ESX that had the error out of Cluster A and moved it back in Cluster A but that did not fix it.

I then created a new/test cluster - Cluster B with HA enabled on it as well. I moved the ESX out of Cluster A and into Cluster B. HA was successful on the ESX in Cluster B (the only ESX in the Cluster). I then moved the ESX server back to the Cluster A and HA was enabled successfully. I hope that can help someone else...

0 Kudos
dmanconi
Enthusiast
Enthusiast

Hi Jeff

That worked a treat until I moved the server back into the original cluster then the HA error popped up again. I think one of the other nodes might be causing the issue personally.

The catch here is if VMware HA is enabled and you have the HA error on a node then you can vmotion anything to that server.

Hmm might be time to call VMware I think.

Cheers

David

0 Kudos