VMware Cloud Community
marnow
Enthusiast
Enthusiast

2-node vSAN - failure scenario

Hello guys,

I have 2-node cluster with direct connect. I was recently testing the different failure scenarios and the cluster behaves as desired, but in one specific case...

I disconnected both physical up-links from the ESXi (VM networks, management, VMotion and witness) and kept the direct connect between the host connected. The host became disconnected from the vCenter perspective....but the VM keeps running on that host? I would say if the host disconnected from the network, I would like have the VM restarted on the other host with full network connectivity.

I was a little confused why the VM was not restarted on the another host due to lack of connectivity...am I missing something? Both hosts are running 6.7u2 and all HA setting are set as recommended fin the official guide.

Thanks

Mario

0 Kudos
7 Replies
marnow
Enthusiast
Enthusiast

Has anybody tested it?

0 Kudos
TheBobkin
Champion
Champion

Hello Mario,

The issue here is that by default we use the vSAN network for determining HA liveness - your nodes would have remained connected here and thus no HA failure response. The other pertinent thing to note is that quorum of data is also retained from both nodes and thus all data Objects remain accessible from both (as opposed to scenario where link between nodes or link between a node node and Witness is lost) and thus nothing is going to get restarted. I had a think about other potential solutions to this such as Virtual Machine Monitoring but from what I have read this doesn't appear to have awareness of whether a VM has network or not. The response of powering off and re-registering the VMs on the other node could likely be achieved by setting isolation addresses somewhere else but I think this could have the potential to cause more harm than good (e.g. what happens if BOTH hosts lose access to this/these addresses but are otherwise functional).

Potentially depping or someone else stronger in HA might have a more robust solution.

Bob

0 Kudos
marnow
Enthusiast
Enthusiast

Thanks TheBobkin​ it does make sense in this specific case.

I also found depping​ response to similar question on his blog article vSphere HA heartbeat datastores, the isolation address and vSAN - Yellow Bricks

Fred says

6 October, 2018 at 04:19

Hi Duncan

There is a scenario: When the management network is lost but the vSAN network is good, and some VMs are also shared uplink with the management network, then the VMs cannot be reached and the vCenter can not connect ESXi node, but the vSAN network is not lost, so the HA and isolation will not be trigerred.

How to design the cluster configuraiton in this scenario to avoid the VMs lost?

  • Duncan Epping says9 October, 2018 at 11:57Normally people don’t have VMs sharing the management network to be honest. Not sure how to get around what you are saying right now. I have filed a feature request that would solve the problem, but today it is not available, and I do not know how long this will take. More on this later.
0 Kudos
mrgecco
Contributor
Contributor

marnow​ just wondering. Did you find a solution for that? Struggeling with the same situation at the moment.

Or maybe TheBobkin​ or depping​ do have an answer? depping​ did you get any update for us regarind the feature request in 2018?
Thank you very much in advance!

0 Kudos
depping
Leadership
Leadership

No update unfortunately.

0 Kudos
marnow
Enthusiast
Enthusiast

No solutions, i live with it. Never occurred yet.

0 Kudos
mrgecco
Contributor
Contributor

Thanks for the anser depping​ and marnow

0 Kudos