We have the following hypotetical situation.
We have 2 ESX 3.5 machines configured with HA.
Each machine has it's own IP address in the vmkernel network in different clases and are connected by routing.
What happens if the routing fails and both machines have ping in their gateways(isolation address) via the service console and vmkernel networks but absolutely no connection to each other.
How will this split brains problem be solved? Would both machines try to start the other VMs?
Thanks!
The ESX server which does not own a Virtual Machine will attempt to boot it. Whether it succeeds depends on the policy of the owning server: turn off the VM in case of isolation or keep it running. If the VM keeps running, the boot on the other ESX server will fail, because the files are still locked in the VMFS (I've seen this situation in the logfiles).
Indeed the split brain situation is solved by file locking. Since both HA enabled ESX servers will see the other disappear, and the gateway/das.isolation remaining visible, each ESX server will conclude that the other ESX host has gone down. Result will be that each ESX server will attempt to start the VMs registered on the other ESX server. All these starts will fail though, because all files on the (shared!) storage are locked.
Regardless of HA configuration, no VMs will be shutdowned or powered off in this scenario. This is because each ESX host thinks "all but himself" has gone down (or is it herself )
> no VMs will be shutdowned or powered off
This depends on the isolation reponse setting. Today, the default is to keep the VMs running, but it wasn't the case in the early versions - a VM was powered off after 13 seconds.
I have to disagree. Since each ESX servers still sees the gateway, it will conclude that he/she is NOT the one that got isolated. Therefore it will not attempt to shutdown/poweroff any VM running on it. The same goes for all other ESX servers in this example.
No matter how HA is configured, in the end no VM will be shutdowned or powered off.