VMware Cloud Community
ManuelDB
Enthusiast
Enthusiast

VSphere HA restart the VM but does not shut down source host VM (VSAN)

Hi,

I'm facing a weird issue on a new VxRail cluster.

Cluster up and running, all green marks, VSAN on dedicated nics and Management on other nics.

HA configured with disabled APD/PDL but enabled Host isolation to restart.

 

Now, I unplug both the 2 cables of VSAN (so he host will be still accessible from VCenter, but lose access to VSAN datastore).

What happen actually, is that the VM will be migrated (HA) on another host, but will not be shutted down from source/impacted host, so I have the VM running on both hosts (if I access vSphere client of the 2 hosts I see the running VM on both), and from VCenter perspective I see a continuous refresh of the host in the VM details between the 2 hosts that is running the VM.

Tested 2 times, it's reproducible

 

ESXI 7.0 U3K

VCenter 7.0u3i

Can't find any logs regarding the "Cannot shutdown vm" on vCenter->monitor->events and can't see any known issue or resolved issue on successive release of vCenter.

 

Any idea?

Thanks

Manuel

0 Kudos
6 Replies
StephenMoll
Expert
Expert

Sounds to me like it is doing what you have asked it to do...

By disabling APD/PDL actions, you are in effect saying to the host : If you lose connection to your storage, do nothing. So the VM is not shutdown or reset due to this.

No isolation actions are triggered because the host is not isolated, vCenter can still see it and the host can see other hosts in the HA cluster, it has simply been disconnected from its storage.

The VM is probably being restarted due to "HA VM Protection" actions, yes? This is rebooting the VM because it has stopped heart beating. The new VM is functioning normally, however the old one, appears to be active in vCenter but in reality is not doing anything. How can it, it has been disconnected from its virtual disks?

Tags (1)
0 Kudos
ManuelDB
Enthusiast
Enthusiast

VSAN does not need APD/PDL, so it's normal to disable them. Anyway the behaviour had happened even with APD/PDL enabled.

0 Kudos
StephenMoll
Expert
Expert

Interesting, I shall add that to my list of things to experiment with, as we are currently evaluating VSAN as storage solution for our systems at the moment. We have to look at failure cases and document system behaviours. Currently awaiting the hardware for a full set-up to be assembled, which hopefully will be available in a week or two.

I can only presume that that the source VM is able to run using local cached data, in which case why did HA restart the VM elsewhere? Does sound weird.

0 Kudos
depping
Leadership
Leadership

you need to do a few things when you use vSAN:

  1. Disable the use of the default isolation address
  2. Configured the isolation address to be a reliable IP on the vSAN network
  3. Enabled the Isolation Response and set it to power off

All of this is documented here:

https://www.yellow-bricks.com/2017/11/08/vsphere-ha-heartbeat-datastores-isolation-address-vsan/

or

https://core.vmware.com/resource/vmware-vsan-design-guide#sec6870-sub2 

depping
Leadership
Leadership


@StephenMoll wrote:

Interesting, I shall add that to my list of things to experiment with, as we are currently evaluating VSAN as storage solution for our systems at the moment. We have to look at failure cases and document system behaviours. Currently awaiting the hardware for a full set-up to be assembled, which hopefully will be available in a week or two.

I can only presume that that the source VM is able to run using local cached data, in which case why did HA restart the VM elsewhere? Does sound weird.


Euuh no, it is not running on locally cached data. The VM will actually lose access to storage, but as most likely the isolation address is reachable on a different network (or not configured correctly), isolation is never declared and such the VM is not powered off

0 Kudos
StephenMoll
Expert
Expert

Phew! That's relief to know that can't happen.

 

0 Kudos