VMware Cloud Community
tarun2630
Contributor
Contributor

VSAN VMs getting registered on 2 hosts after VMware HA

I have configured nested infra in home lab, running vSAN stretched cluster. 6 + 6 + 1  witness configuration. The connectivity seems OK

management IP of VMs 10.10.10.x

vSAN IP site 1 - 30.30.30.10 - 15

vSAN IP site 2 - 31.31.31.10 - 15

isolation address 0 - 30.30.30.1 / isolation address 1 - 31.31.31.1 / default isolation address = false

Raid policies Site mirroring with raid 5

keep data on preferred site

keep data on secondary site

 

Issue 1 (VMs flickering)

When I isolate preferred site, the HA kicks in and triggers a failover the VMs with replicated policy HAs to the secondary site and works fine. However when I connect the primary site again, the VMs starts flickering between to hosts (preferred and secondary) basically registering it at 2 hosts. I removed the VM from inventory and also deployed a new VM but still see same behavior.

Issue 2 (VM in preferred site not losing access)

The VM which run storage policy of keep data on preferred site does not lose connectivity even when the preferred site fails. The status in VC shows as disconnected, however the VSAN component shows as healthy, ideally it should show as absent. Even though the hosts on preferred site are down and not accessible, the VM continues to work fine and show healthy. 

Please let me know what can I be looking for configuration wise. 

0 Kudos
3 Replies
depping
Leadership
Leadership

The FDM log should tell you what is happening probably, as that is what causes the VMs to be restarted. Without that it is hard to say why this happens. Are you sure the VMs were shutdown in the preferred site? (Check via the commandline if you run the tests if the VMs are killed or not!)

0 Kudos
tarun2630
Contributor
Contributor

@depping 

Please find attached screenshot, the component is located on host 10.14, I powered down that host still the VSAN component was showing as active rather than being inaccessible. This is happening for the VMs set with policy to run at a preferred site or a secondary site

If the policy the is set to mirror with Raid 5, the objects becomes inaccessible at one of the sites, however when we restore the connectivity the VMs start flickering

Attaching host logs from 2 hosts as well. VM name tiny core

 

 

0 Kudos
depping
Leadership
Leadership

I doubt that is the FDM log file? Anyway, it is difficult for me to say why you are seeing what you are seeing without having a broader understanding of the environment, the network connectivity, what has failed etc. If you have a support contract I would recommend contacting them and ask them to analyze the scenario.

0 Kudos