wxb2744
Contributor
Contributor

Multiple Host Isolation

Jump to solution

Imagine a scenario where we had a four-node HA cluster spread over a campus with two nodes in one location and two in the other. What would the host isolation response be if the network connection between the two sites was lost?

If we lose one host then the it is deemed it to be isolated after 12s and then failed aftet 15s.If we lose two, however, then no one becomes isolated and I'm assuming that nothing happens.

Now; imagine that we have datastores that are all shared, but some are in one site and some are in the other. The guests running on local datastores would be unaffected. The guests that are running on remote datastores would fail. The question here is: what would happen to the failed hosts?

Thanks,

Warren Barnes

0 Kudos
1 Solution

Accepted Solutions
rpasserini
VMware Employee
VMware Employee

Before I answer this in detail, I want to make sure I'm clear on my assumptions:

1. There are 4 hosts in the cluster, two on each side of the stretch. If this is the case, then all 4 hosts are primaries. (The first 5 hosts in any cluster are primaries, so you only get secondaries when there are 6 or more hosts.)

2. If the network goes down between the two sites, will the storage be split-brained as well? I'm assuming it does, based on one of your comments.

So, given site #1 has hosts A and B, and site #2 has hosts C an D....

If, after the split between site 1 and site 2, A and B can still heartbeat with each other, and C and D can heartbeat with each other then there is no isolation response attempted. Isolation responses only kick in when a host cannot heartbeat with any of the other primaries, and it also cannot ping its isolation address (usually the gateway(s)) for the network(s) that host is on.

So, what happens is that A & B at site 1 conclude that C & D at site 2 have failed. And vice-versa. A and B will attempt to power-on the VMs that were running on C and D, same for C & D -- they will try and power on the VMs that were on A and B. Now, because some VMs' storage are located at site 1 and other VMs' storage are located at site 2, some of the power-ons might fail, as the storage is not accessible. But as A&B will attempt to power on all of C&D's VMs and C&D will attempt to power on all of A&B's VMs (this is assuming that admission control allows all these power-ons) then each VM will end up powered on correctly on either site 1 or site 2.

Now for the ugly part -- if any of the VMs at site 1 lost their storage in the partition, or vice-versa, then the vmware-vmx processes that represent those VMs will still be running on the host(s) on the side of the partition that lost the storage, and there will also now be a vmware-vmx process representing the same VM running on a host on the other side of the partition that has now acquired a lock on that VM. None of this is an issue until the partition rejoins. That's when the behavior described by Elisha happens -- i.e. the VM will appear to bounce back and forth between the two hosts until the question about the lost lock is answered by pointing the VC client directly at the host. And as he pointed out, the question will be auto-answered by VC for vSphere 4.0 U2 and above.

-- Ron

View solution in original post

0 Kudos
5 Replies
admin
Immortal
Immortal

This is classic split brain - an increased risk when running in a stretch cluster. Each side will think the other is dead and will try to failover the vms. Failover will fail if both sides can see storage and disk locks are held. If side A also loses access to some storage, then vms running from that storage will lose their disk locks and the vms will be failed over to side B. The vmx processes will actually remain running on side A but only the ones on side B will have access to the vmdk files so data corruption is prevented. When the network is restored, vCenter will see the vm running on 2 hosts and in the client you'll see the vm switching back and forth between the 2 hosts since vCenter doesn't really know how to handle this. The copy of the vm running on side A (which has lost the lock) will issue a question indicating that the lock was lost. If you connect directly to the host and answer the question, the vm will automatically power down, leaving the vm running on side B and things will clear up in vCenter. In 4.0 update 2, the question will be auto-answered and things should clear up without user action.

Elisha

js411
Contributor
Contributor

What's this "question" you are talking about? Can you give more details....?

I thought that Secondaries only heartbeated with primaries... so in this scenario, if both nodes in SiteB are secondaries and they get isolated, won't they both go into isolation mode?

Now if there is a primary, then the secondaries will heartbeat to the primary... and then there is no isolation response, correct?

But what happens when primaries become split brained? I assume that there is some way for the primaries to solve split brain so there is only a single Active Primary, and keep the primary in SiteB from becoming the active primary? Or is this manual, and the "question" you are inferring to?

But what's the point of stretching hosts across sites, if you don't have resilency in the storage....???? So you need replicated storage... and in this case, how are disk locks held when the storage becomes split brain itself?

0 Kudos
admin
Immortal
Immortal

I'm attaching a screenshot of the "locklost" message I referred to above. BTW, only ESX 4.0 hosts (and newer) will display this.

0 Kudos
rpasserini
VMware Employee
VMware Employee

Before I answer this in detail, I want to make sure I'm clear on my assumptions:

1. There are 4 hosts in the cluster, two on each side of the stretch. If this is the case, then all 4 hosts are primaries. (The first 5 hosts in any cluster are primaries, so you only get secondaries when there are 6 or more hosts.)

2. If the network goes down between the two sites, will the storage be split-brained as well? I'm assuming it does, based on one of your comments.

So, given site #1 has hosts A and B, and site #2 has hosts C an D....

If, after the split between site 1 and site 2, A and B can still heartbeat with each other, and C and D can heartbeat with each other then there is no isolation response attempted. Isolation responses only kick in when a host cannot heartbeat with any of the other primaries, and it also cannot ping its isolation address (usually the gateway(s)) for the network(s) that host is on.

So, what happens is that A & B at site 1 conclude that C & D at site 2 have failed. And vice-versa. A and B will attempt to power-on the VMs that were running on C and D, same for C & D -- they will try and power on the VMs that were on A and B. Now, because some VMs' storage are located at site 1 and other VMs' storage are located at site 2, some of the power-ons might fail, as the storage is not accessible. But as A&B will attempt to power on all of C&D's VMs and C&D will attempt to power on all of A&B's VMs (this is assuming that admission control allows all these power-ons) then each VM will end up powered on correctly on either site 1 or site 2.

Now for the ugly part -- if any of the VMs at site 1 lost their storage in the partition, or vice-versa, then the vmware-vmx processes that represent those VMs will still be running on the host(s) on the side of the partition that lost the storage, and there will also now be a vmware-vmx process representing the same VM running on a host on the other side of the partition that has now acquired a lock on that VM. None of this is an issue until the partition rejoins. That's when the behavior described by Elisha happens -- i.e. the VM will appear to bounce back and forth between the two hosts until the question about the lost lock is answered by pointing the VC client directly at the host. And as he pointed out, the question will be auto-answered by VC for vSphere 4.0 U2 and above.

-- Ron

View solution in original post

0 Kudos
wxb2744
Contributor
Contributor

Thanks guys. It is more or less as I thought, but it is good to have a second opinion.

Warren

0 Kudos