VMware Cloud Community
johnnysnq
Contributor
Contributor

HA config rogue router

We have the following hypotetical situation.

We have 2 ESX 3.5 machines configured with HA.

Each machine has it's own IP address in the vmkernel network in different clases and are connected by routing.

What happens if the routing fails and both machines have ping in their gateways(isolation address) via the service console and vmkernel networks but absolutely no connection to each other.

How will this split brains problem be solved? Would both machines try to start the other VMs?

Thanks!

0 Kudos
4 Replies
BUGCHK
Commander
Commander

The ESX server which does not own a Virtual Machine will attempt to boot it. Whether it succeeds depends on the policy of the owning server: turn off the VM in case of isolation or keep it running. If the VM keeps running, the boot on the other ESX server will fail, because the files are still locked in the VMFS (I've seen this situation in the logfiles).

0 Kudos
Erik_Zandboer
Expert
Expert

Indeed the split brain situation is solved by file locking. Since both HA enabled ESX servers will see the other disappear, and the gateway/das.isolation remaining visible, each ESX server will conclude that the other ESX host has gone down. Result will be that each ESX server will attempt to start the VMs registered on the other ESX server. All these starts will fail though, because all files on the (shared!) storage are locked.

Regardless of HA configuration, no VMs will be shutdowned or powered off in this scenario. This is because each ESX host thinks "all but himself" has gone down (or is it herself Smiley Wink )

Visit my blog at http://www.vmdamentals.com
0 Kudos
BUGCHK
Commander
Commander

> no VMs will be shutdowned or powered off

This depends on the isolation reponse setting. Today, the default is to keep the VMs running, but it wasn't the case in the early versions - a VM was powered off after 13 seconds.

0 Kudos
Erik_Zandboer
Expert
Expert

I have to disagree. Since each ESX servers still sees the gateway, it will conclude that he/she is NOT the one that got isolated. Therefore it will not attempt to shutdown/poweroff any VM running on it. The same goes for all other ESX servers in this example.

No matter how HA is configured, in the end no VM will be shutdowned or powered off.

Visit my blog at http://www.vmdamentals.com
0 Kudos