VMware Cloud Community
baber
Expert
Expert

vSphere HA and vSAN

I read this document

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vsan-planning.doc/GUID-D68890D8-841...

 

but could not understand clear why we have to disable HA and then Enable vSAN and after that enable HA . It just saying HA traffic traverses from management kernel and after vSAN it will traverses from vSAN vmkernel but while we have separate physical adapter for management and vSAN vmkernel what problem will happen ?

I am really confused is that related to HA or host isolation ?

Would you please give an example ?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
28 Replies
a_p_
Leadership
Leadership

Yes, it's basically to avoid conflicts due to network issues. Using the same network for vSAN and HA avoids such conflicts.

From https://cormachogan.com/2013/09/17/vsan-part-9-host-failure-scenarios-vsphere-ha-interop/
~snip~
... Notably, vSphere HA agents communicate over the VSAN network when the hosts participates in a VSAN cluster. The reasoning behind this is that VMware wishes for HA & VSAN nodes to be part of the same partition in the event of a network failure; this avoid conflicts when there is different partitions between HA & VSAN, with different partitions laying claim to the same object. ...
~snip~

André

0 Kudos
baber
Expert
Expert

I read it but I am confuse when we are using mix storage in our env  (traditional and vSAN)

 

I am using mix env and have traditional storage and vSAN

I will explain 2 scenario


In a non- vSAN Env (just traditional env):


I have 4 hosts :
Host1:
Management IP (vmk0) = 10.10.10.10
GW=10.10.10.1

Host2:
Management IP (vmk0)= 10.10.10.20
GW=10.10.10.1

Host3:
Management IP (vmk0)= 10.10.10.30
GW=10.10.10.1

Host4:
Management IP (vmk0)= 10.10.10.40
GW=10.10.10.1

in all of them for host isolation it checks ping from vmk0 to GW as destination for example in Host1 :
vmk0(10.10.10.20) had to ping 10.10.10.1 and if it could not ping then will check via Datastore Heartbeat if Datastore Heartbeat was ok so can say host isolation happen and will response according to Admin configuration

for vSAN env (traditional and vSAN env) :

I have 4 hosts :
Host1:
Management IP (vmk0) = 10.10.10.10
GW=10.10.10.1
vSAN IP (vmk1) = 20.20.20.20
GW = 20.20.20.1


Host2:
Management IP (vmk0)= 10.10.10.20
GW=10.10.10.1
vSAN IP (vmk1) = 20.20.20.21
GW = 20.20.20.1


Host3:
Management IP (vmk0)= 10.10.10.30
GW=10.10.10.1
vSAN IP (vmk1) = 20.20.20.22
GW = 20.20.20.1


Host4:
Management IP (vmk0)= 10.10.10.40
GW=10.10.10.1
vSAN IP (vmk1) = 20.20.20.23
GW = 20.20.20.1

Now according to doc in the first step we should set :
das.useDefaultIsolationAddress=false
das.isolationAddress0=20.20.20.1

according to these configuration now in all of them for host isolation it checks ping from vmk1 to GW as destination for example in Host1:
vmk1(20.20.20.20) had to ping 20.20.20.1 and if it could not ping it detects isolation happen and wll response according to Admin configuration

Actually I am confuse for this mode what happen for vmk0 in this mode ? it is not use for isolation ? what happen will do if vmk0 dropped ?

Please mark helpful or correct if my answer resolved your issue.
Tags (1)
0 Kudos
depping
Leadership
Leadership

It is really simple, it does not matter that you have traditional storage, if you have vSAN then:

  • - HA needs to be disabled before enabling vSAN
  • - HA can be enabled after enabling vSAN
  • - the isolation address needs to be reachable from the vSAN network as the vSAN network is used for HA traffic
0 Kudos
baber
Expert
Expert

Yes that is easy but just want to understand the concept

When we run vSAN and change the isolation address "for example set vsan vmkernel gateway" source and destination for check ping

1- source is vSAN vmkernel  and destination is vSAN vmkernel gateway ?

2- what will happen now if the management network failure ?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
a_p_
Leadership
Leadership

1- source is vSAN vmkernel and destination is vSAN vmkernel gateway ? --> destination can be any IP-address that can be reached from the vSAN VMkernel ports.
2- what will happen now if the management network failure ? --> You won't be able to manage the host. No reason fo HA to kick in.

André

0 Kudos
baber
Expert
Expert

what will happen now if the management network failure ? --> You won't be able to manage the host. No reason for HA to kick in

but previously in non-vSAN Env when management failed it start to check for isolation (first start to check ping isolation address and next check for datastore heartbeat) but now it not check it . Is that your means when we  are using vSAN isolation for management is not important instead that is important for vSAN vmkernel ?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
a_p_
Leadership
Leadership

HA is still available with vSAN, it just uses the vSAN VMkernel port insted of the Management Network.

@depping explains this in his blog, see e.g. https://www.yellow-bricks.com/2017/11/08/vsphere-ha-heartbeat-datastores-isolation-address-vsan/

André

0 Kudos
baber
Expert
Expert

My problem is with isolation because in non-vSAN ENV management vmkernel can check for isolation status but when we run vSAN  ,  the vSAN vmkernel check for isolation status now while we have different pnic ( one for management and one for vSAN) if management failure we just cannot connect to management and isolation will not check anytime because isolation check now ping from vsan vmkernel to isolation address that I set vmkernel vsan's gateway and not check ping from management vmkernel  as source .

 

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
depping
Leadership
Leadership

the mechanism is:

if an isolation occurs, HA will ping the defined isolation address via the vSAN vmkernel interface. You define the isolation address via the advanced setting called das.isolationaddress0. There's plenty of material online, and my book also (vSphere 6.7 Clustering Deep Dive) describes this in-depth.

0 Kudos
baber
Expert
Expert

Thanks. I read a few documents about it but my question is when we are using both type of storage (traditional and vSAN ) in a cluster I want to know in this mode how can isolation detects for vms that are reside on traditional storage because the isolation address has changed to vSAN vmkernel gateway  and  vsan vmkernel start ping isolation address ?  and for vms that are on traditional storage vSAN vmkernel is not important.

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
depping
Leadership
Leadership

Isolation Detection has absolutely nothing to do with storage, it is detecting whether your network is still functioning or not on which HA is communicating.

0 Kudos
baber
Expert
Expert

but I thought  in the first step the management vmkernel start to ping default isolation address (which is management vmkernel's default gateway) if it fails  will go to next step and check datastore heartbeat if could find vm has lock file on storage now detect isolation happen and will action according to our configure .

Is that correct my imagine ?

because first want to make sure from this imagine for the next question

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
depping
Leadership
Leadership

yes, that is the next step. I am still not sure what the problem is here.

0 Kudos
baber
Expert
Expert

Now . in that cluster I enabled vSAN  (so first disabled HA , next enable vSAN and  another enable HA)

next add follow config:

das.useDefaultIsolationAddress=false
das.isolationAddress0=<vSAN vmkernel Gateway address>

now the vsan vmkernel start to ping isolated address (vSAN vmkernel Gateway address) and if failed isolation detect on vSAN and action according to my configure

id the above explain is correct so

My main question is here now some of my vms are on shared storage . How isolation detects for these vms? because maybe vSAN detect isolation happen so it will restart vms on vSAN but isolation not detect for vms that are on shared storage so both groups of vm will restart.

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
depping
Leadership
Leadership

if the vSAN network is isolated then the result is the same for both the VMs on vSAN, as well as the VMs on the other storage system. This will then trigger the isolation response, this could be for instance "power off". this will kill all the VMs, which will then allow the master to restart them all.

0 Kudos
baber
Expert
Expert

As I understood . When vSAN network detect isolation on HOST1 (and set power off and restart vm)  it will power of and restart all vms  that reside on HOST even vms that are not on vSAN and are resided shared storage ) restart both group

is that correct ?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
depping
Leadership
Leadership

Yes

0 Kudos
baber
Expert
Expert

So thanks  from you . Now do you think is this function rational ? because just isolation happen for vSAN vmkernel on HOST1 so just there is problem for vms that are reside on vSAN on this host (for example vm1,vm2,vm3 that reside on vSAN) . why should restart other vms ?(vm5,vm6,vm7 are not on vSAN and not use vSAN vmkernel so they are working without problem) . I wanted to reach this question. 

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
depping
Leadership
Leadership

Again, the isolation response has nothing to do with storage. The same could be said for the "management network" that HA uses. VMs don't use this network, yet an isolation response is still triggered when that network has failed. The isolation response is triggered as a result of the HA functionality not working. Whether I agree or not with the design doesn't really matter, as I can't influence this.

If you don't want to restart the VMs that run on traditional storage, you simply override them.

Screenshot 2021-06-23 at 09.41.36.png

0 Kudos