VMware Cloud Community
mirceaflorin
Contributor
Contributor
Jump to solution

vSphere HA state stuck in "Election"

Hello,

  I am currently playing with several test environments , and I have the following case :

- 1 vCenter 5.0 913577 that has  1 cluster with 2 ESXi hosts 5.0 1117897 , HA enabled and  3 powered on virtual machines

  Now , showing someone what HA does ,  I powered off the Slave Host ( from the power button ) , everything went ok , the VMs were restarted on the Master ESXi host .

  After I powered on the Slave , moved 1 machine on the Slave and 2 remained on the Master . Everything was looking good, I tried the same test with the Master host ( powered it off from the button ) , and now the fun begins :

- I can see in Summary of the Slave host : vSphere HA state : Election and the error on the cluster : Cannot find vSphere HA master agent . The VMs that were running on the Master host are still down.

Shouldn't the Slave host take over the Master function ( as there is no other ESXi host ) ? Am I missing something, or ?

Please let me know if you need additional details or something .

Thank you in advance.

0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership
Jump to solution

And when I configured HA  , I always had the warning that "vSphere HA agent on this host could not reach isolation address: 192.168.1.1"

This might actually be the reason. If the Master dies, the slave can only see himself, i.e. no isolation address, no election traffic from other hosts. In this case the Slave considers itself isolated and doesn't become a Master. To resolve this, either configure an existing IP address as the default gateway (e.g. the host's VMware Workstation virtual adapter address for the network you use) or define an existing isolation address in the advanced settings.

André

View solution in original post

0 Kudos
6 Replies
julienvarela
Commander
Commander
Jump to solution

Hi,

Can you attach the logs of vpxa and fdm (Located on your host with the problem) :

/var/log/vpxa.log

/var/log/fdm.log

Regards,

Julien

Regards, J.Varela http://vthink.fr
0 Kudos
mirceaflorin
Contributor
Contributor
Jump to solution

Hi,

  Thank you for your reply. I recreated again the condition on my testenv ( mentioned above running in VMware Workstation ) .

Shutting down the slave host will reboot the virtual machines on the master host . Afterwards, I powered on the Slave and I moved with vMotion one machine there . So , now I have 2 VMs running on the Master and 1 VM running on the Slave.

- Shutting down the Master will not reboot the virtual machines on the Slave host.

I think I should provide also the additional details that my Management network has a non-existent gateway configured ( 192.168.1.1 - as I mentioned above , I'm running the environment in VMware Workstation, management networks are in the same subnet ) . To be honest, I don't remember why I configured the gateway - but I'm sure I had a reason, something to test ! Smiley Happy

And when I configured HA  , I always had the warning that "vSphere HA agent on this host could not reach isolation address: 192.168.1.1" .

So HA works and reboots the VMs when the Slave fails( or in our case , is powered off ) , but not when the Master is powered off ...

Attached is the fdm.log and the vpxa.log from the Slave host where the VMs should be restarted( as the Master is now offline).

Thank you in advance for your support.

0 Kudos
julienvarela
Commander
Commander
Jump to solution

Hi,

My guess is , your host cannot reach your master host,  then it will try to find if your host is isolated or not. The next step is to reach the GW. In your case your GW is not reachable, so your host was in a "host isolation" configuation. In this case your host is a master too. That's why you have 2 master in your cluster.,

2013-08-04T20:45:06.190Z [FFC2AB90 error 'Election' opID=SWI-6fd0b774] [ClusterElection::SendAll] sendto 192.168.1.9 failed: Host is down

2013-08-04T20:45:13.759Z [FFDF1B90 warning 'Cluster'] [HostPing::Ping] sendto[ipv4] 192.168.1.1: Host is down

What is the option configured for "Host Isolation Response"?



pastedImage_13.png

By default, your VMs will not be restarted.

Regards,

Julien

Regards, J.Varela http://vthink.fr
a_p_
Leadership
Leadership
Jump to solution

And when I configured HA  , I always had the warning that "vSphere HA agent on this host could not reach isolation address: 192.168.1.1"

This might actually be the reason. If the Master dies, the slave can only see himself, i.e. no isolation address, no election traffic from other hosts. In this case the Slave considers itself isolated and doesn't become a Master. To resolve this, either configure an existing IP address as the default gateway (e.g. the host's VMware Workstation virtual adapter address for the network you use) or define an existing isolation address in the advanced settings.

André

0 Kudos
mirceaflorin
Contributor
Contributor
Jump to solution

Thank you all for your support. Indeed , that was the cause . I configured a reachable gateway, and now it acts at is should do.

My mistake is that I haven't been keeping a list of the changes that I do on the test env ( as I power it on very rarely and just to test something ) . At some point I had a valid "gateway" (a valid Ip address) and everything was working correctly, but for some reason I changed it to a invalid gateway / non existent IP.

Lesson learned . Thank you again !

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Well, you that's exactly it: "Lessen learned". Isn't it actually one of the reasons you have a lab? Break and fix things to learn and understand how they work. Smiley Wink

André

0 Kudos