VMware Cloud Community
gerryjohnbarnet
Contributor
Contributor
Jump to solution

ESXi HA failures

Hi,

I have a 2-node ESXi cluster (4.1U1) and after enabling HA I am having constant intermittent failures reported by HA. Typically its one of 2 messages:-

"A possible host failure has been detected by HA on host xxxx" or "HA recovered from a total cluster failure in cluster xxx in Data Center yyy"

Sometimes then VC events view shows that it has recovered and is "healthy" and other times it just stays in what looks like a failure situation with a red ! next to the node. Typically this is happening on node-2.

I have tried entering/exiting Maintenance mode which sometimes resets everything to healthy but sometimes not. I have also disabled and re-enable HA which again works for a day or so.

Today I removed disabled HA, removed both nodes from the cluster, deleted the cluster object and re-added the nodes and re-enabled HA.All reported successful but only lasted a few hours before the errors occurred again.

I dont think Name resolution is an issue as aI can ping each node from the other using short and FDQN.

Any ideas would be greatly appreicated as its driving me nuts Smiley Happy

.

0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership
Jump to solution

Please take a look at http://kb.vmware.com/kb/1026825 to rule out duplicate IP addresses.

André

View solution in original post

0 Kudos
2 Replies
a_p_
Leadership
Leadership
Jump to solution

Please take a look at http://kb.vmware.com/kb/1026825 to rule out duplicate IP addresses.

André

0 Kudos
gerryjohnbarnet
Contributor
Contributor
Jump to solution

Andre,

Thank you very much. That was exactly what the problem was. I couldn't find any reference to "Duplicate IP" in either log, but I did see ping failures.

Thanks again,

Gerry

0 Kudos