VMware Cloud Community
network_user
Enthusiast
Enthusiast
Jump to solution

vSphere HA unsuccessfully failed over. Operation is not allowed in current state! ?

Hello,

I am testing vSphere HA with two ESXi hosts in my lab. I have one VM on each ESXi host and when I try to initiate HA by shutting down the switch port connection to one of the hosts, the vSphere cluster tries to failover the VM on the failed host to the other host, but is not successful.

The events under the Cluster gives this warning:

vSphere HA unsuccessfully failed over <virtual-machine> on <host> in cluster <cluster name>. vSphere will retry if the max number of attempts has not been exceeded. Reason: The operation is not allowed in the current state.

Attached is the screenshot of the error.

I tried restarting the vCenter server service and that did not help. What could be wrong here?

Thank you.

Shivani

1 Solution

Accepted Solutions
sajal1
Hot Shot
Hot Shot
Jump to solution

Hello Shivani,

Don't disable the datastore HA. If you have a physical server simply power down the server by pulling the power plug or simply crash it. If you have virtual server vESXi then directly "Power Off" it instead of Shutdown.

View solution in original post

0 Kudos
18 Replies
sajal1
Hot Shot
Hot Shot
Jump to solution

Hello Shivani,

Try powering down the ESXi host instead of shutting down the switch port. Also did you configure any specific behavior for VMs in case of HA failover. Check the following KB article, though it is not exactly what you are doing but it explains the situation.

HA has two ways to check heartbeat network as well as storage. So if you switch off the switch port still the storage path lock is on. In most probability it is leading to split brain scenario.

Just try physically powering off the ESXi host to initiate the failover.

network_user
Enthusiast
Enthusiast
Jump to solution

Hello sajal,

I did disable the datastore heartbeats so that it thinks that one of the host is down. I was thinking that if I physically shut down the server it may think of it as graceful shutdown and not initiate the failover. But I can try that.

I don't have any specific settings for HA on the hosts or VMs.

Thank you.

0 Kudos
sajal1
Hot Shot
Hot Shot
Jump to solution

Hello Shivani,

Don't disable the datastore HA. If you have a physical server simply power down the server by pulling the power plug or simply crash it. If you have virtual server vESXi then directly "Power Off" it instead of Shutdown.

0 Kudos
network_user
Enthusiast
Enthusiast
Jump to solution

Sajal,

I was going by this document from vmware for my HA test. But I will now try what you said.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=205663...

Shivani

0 Kudos
network_user
Enthusiast
Enthusiast
Jump to solution

Hello Sajal,

It does work when I pull the power cable out from one of the servers.

I don't understand why it did not work the way vmware suggested it to do it in below article, by disabling the datastore heartbeat and then shutting down the switch port where one of the servers is connected? I was expecting that to show similar behavior.

VMware KB: Simulating VMware High Availability failover

Thank you.

Shivani

0 Kudos
network_user
Enthusiast
Enthusiast
Jump to solution

On another topic, should a virtual machine fallback to its original/primary host once the primary server is back online? In may case I see the host is back up, but the VM still resides on the second host. I think I can migrate it to the primary host, but I was expecting it to do it on its own once the primary server comes back online. What do you think?

Thank you.

Shivani

0 Kudos
sajal1
Hot Shot
Hot Shot
Jump to solution

Hi Shivani,

HA takes care of the host failure only. It does not do failback. So when a host is down HA only restarts the VMs running on that host to other available hosts (provided resources are available and the configuration that you set). You can use DRS along with it to do a load balancing. If DRS is enabled then once the main host comes up it will load balance the entire cluster, but please note that it will not essentially bring back the original vm's to this host.

What is the version on which are you testing the same? Is it vSphere 5.5? The KB that you were following was valid till 5.1. Saying that it seems in your case it was taking as split brain scenario.

Just being curious. Can you try the same method once more (the one that you were following)? . But please note the master node first. Check from the status tab between the two nodes which one is the master node and then try disabling the switch port for the slave node and check whether it works or not.

For more details check : http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-55-availab...

Chapter 2 of the above.

network_user
Enthusiast
Enthusiast
Jump to solution

Hello Sajal,

All the tests I did was with the slave taking down. So it does not work even with that.

I did not realize that the KB I was looking at was specific to the vSphere version. Thanks for pointing it out.

If it was split brain then it would have created VMs on both machines ( if I understood it right) ?

Thank you.

Shivani

0 Kudos
network_user
Enthusiast
Enthusiast
Jump to solution

Sajal,

You were right that it had to do something with the split brain scenario. I had the host isolation response set to "leave powered on" and since the VM was powered ON on the isolated host, it was not failing over to the master host. Now when I set the isolation response to "shut down" it failed over when I shutdown the switch port connected to slave host and datastore heartbeat was NOT disabled.

Thanks!!

0 Kudos
sajal1
Hot Shot
Hot Shot
Jump to solution

HI Shivani,

Yes in split brain there would be two instances. Since in your case it was not the case then it is not split brain. But since it gives the same error that the KB explains so need to get into the logs to see exactly what is going on with the failover scenario to give and get more information.

sajal1
Hot Shot
Hot Shot
Jump to solution

Seems you got your answer Smiley Happy . Well in normal full split brain you should get two VMs. Since you did not I assumed it may not be the case.

Why dont you try looking into logs. They give a wealth of information Smiley Happy

0 Kudos
network_user
Enthusiast
Enthusiast
Jump to solution

Sajal,

Yes, I will take a look at the fdm logs. Thanks for all your guidance and help!

Shivani

0 Kudos
admin
Immortal
Immortal
Jump to solution

Hi Shivani,

The behavior depends on the datastore type that VM resides - is it a FC or network backed datastore? FC/NFS/iSCSI ? One possible reason it didn't work could be as follows: (I am speculating here given the above details, so bear with me if I misstate something)

- HA master tries to failover the VM (since in this case the slave is considered dead as there is no HB datastores during n/w isolation)

- If the VM resides in a FC based datastore, then HA master cannot failover (i.e. register and powerOn the VM on master's host in this case) the VM since the lock was still held by the other host (Note, VM is still powered on)

Thanks for bringing this up - we will also clarify this in the KB article.

network_user
Enthusiast
Enthusiast
Jump to solution

Hello Krishnanm,

Datastore is network iSCSI SAN. I agree that since my "host isolation response" was set to "leave powered on" the lock from the slave host was still active and so the HA was not able to restart the VM on Master host.

I am wondering in what case would "leave powered on" setting for isolation response be used. Since you may want to failover the VM in any case when your host fails.

Thank you.

Shivani

0 Kudos
mtrento
Enthusiast
Enthusiast
Jump to solution

Hello network_user ,

From this kb1018325

If there is a general network failure, such as a switch failure, and your storage is iSCSI or NFS and it is located on the same network as your ESX hosts, all hosts are unable to contact the storage

in the above scenario, the setting "Leave power on" is appropriate because the virtual machine is unable to contact the storage or other hosts. Since the virtual machine cannot be moved to another host, it does not need to be powered off.



This is advantageous because virtual machines remain powered on if there is a network issue and are restored when network connectivity has been re-established. The disadvantage is that if a host becomes isolated from the rest, it keeps its virtual machines locked on the storage and other hosts cannot bring them back up.
If you have iSCSI or NFS and your storage is going through the same switch as your virtual machine and ESX traffic, leaving virtual machines powered on as the isolation response is recommended, although you should choose the appropriate option for your environment.

Regards

0 Kudos
mjha
Hot Shot
Hot Shot
Jump to solution

HI Shivani,

HA only takes care of restarting the VM's on surviving nodes once a node in a cluster goes down. If the failed node comes back online also after the failure HA is not going to move around VM's on that host. To achieve this you should have DRS enabled on your cluster and that too in fully automated mode. But DRS also don't give guarantee that only those VM's will be migrated back to failed host which was restarted by HA. VM migration by DRS depends upon what is the load on the cluster and as well as on individual hosts.

Please consider marking this answer "correct" or "helpful" if you think your query have been answered correctly. Manish Jha | Operations Support Engineer | vCloud Air Operations vExpert 2015-17 | vExpert-NSX | vExpert-Cloud | VCAP6-DCV | VCP6-DCV | RHCE-7 Website : http://vstellar.com
0 Kudos
mjha
Hot Shot
Hot Shot
Jump to solution

in vSphere 5.x versions split brain scenarios are not gonna happen because HA will automatically answer the question which has been generated for VM in case of split brain scenario and VM will be running on only host at a given time.

Please consider marking this answer "correct" or "helpful" if you think your query have been answered correctly. Manish Jha | Operations Support Engineer | vCloud Air Operations vExpert 2015-17 | vExpert-NSX | vExpert-Cloud | VCAP6-DCV | VCP6-DCV | RHCE-7 Website : http://vstellar.com
0 Kudos