I have narrowed down the issue
If you put a host on a 2node VSAN cluster in maintenance mode and disconnect the vmnic that the witness traffic (with WTS implemented) is using… it terminates the VMs on the other node!
Regardless of your fault domain settings (I tried preferred and secondary – no difference)
This is only if you have HA turned on and the host is in maintenance mode
If you disconnect the witness traffic vmnics when the host isn’t in maintenance mode… nothing happens
VMware have acknowledged this as a bug and have escalated it to engineering
Thanks for sharing. I have been facing same issue. VMware and HPE said that is a bug.
We are waiting an exact solution. If you are using HPE servers and HPE customized VMware image, there is workaround. Please request it from VMware GSS. The issue seems related to NICMGMTD daemon.
Thanks for the reply
This is Dell - but it's not related to the ESXi ISO or hardware as I can replicate it on my nested lab with the native ESXi builds.
I have figured out a few 'work arounds'
1) Don't put the host in maintenance mode when working on the witness vmnics (turn off DRS and move the VMs off it of course)
2) Turn off HA while doing your maintenance
3) Change the advanced vSAN setting on each host for VSAN.AutoTerminateGhostVm to 0 (not recommended as vSAN won't terminate the VMs if a real host isolation occurs)
What was the work around VMware gave you?
They gave me a script that is refreshed "NICMGMTD" daemon when its memory allocation size became full. I scheduled it to run every 5 minutes via crond.