Re: HA and FT in multi-site cluster?

hutchingsp · ‎10-07-2010

Suppose we have two sites linked by fibre (latency not an issue).

Suppose we have clustered shared storage that can be addressed by a single IP from both sites.

Suppose we put a single vSphere host in each site and use the shared storage.

Now, if we lose a site due to power, or a link gets cut or "something" happens to cause "split brain", how does vSphere deal with it in terms of HA and FT?

The above is fairly generic wording, specifics would most likely be HP P4000 storage clustered, one (or two stacked) dedicated switch for iSCSI and vMotion/FT (thinking VLAN's) in each location with a dedicated fibre link to the other.

My brain is frankly fried with researching stuff, and I'm struggling to remember/work out how HA and FT cope in "split brain" where both boxes are up, but the link is down (the P4000 shared storage has a "quorum" mechanism so your preferred site seizes control of the group IP and the storage at the other site goes invisible).

Thanks.

weinstein5 · ‎10-07-2010

What you describe is not supported but In the split brain scenario you can control the reposnse by setting the Isolation Response in the HA cluster settings - for FT I believe the primary VM continues to runn and there are no updates being delivered to the backup VM - The other I see would be on the networking side because you will need to span the subnets/VLAN across the two sites -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

hutchingsp · ‎10-07-2010

Thanks for the info, this is the HP white paper/guide I was looking at:

http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA0-4385ENW.pdf

It does focus more on FT than HA.

I wouldn't see a need for different subnets whilst it's small/simple with just two switches/locations.

admin · ‎10-08-2010

VMware has also documented this jointly

with HP: http://kb.vmware.com/kb/1021660

Netapp: http://kb.vmware.com/kb/1001783

and EMC: http://kb.vmware.com/kb/1026692

The split brain scenario is handled differently by each vendor with varying levels of automation but it's worth noting that HA uses the storage as the tie-breaker to determine where the VM should be running. As long as the storage has provided access to a single copy of the VMDK, HA will work just fine to restart the VM by establishing a file lock on the VMDK. Netapp does not handle this automatically however EMC and HP do provide some automation for these cases. The key for the storage vendor is whether they can accurately detect a site/array failure vs. a network partition (i.e. split brain).

hutchingsp · ‎10-08-2010

Thanks, that looks similar to the HP PDF, seems to deal a lot with FT and node failures (ESX or Storage) but not so much with "split brain" which seems a distinct possibility with any kind of link failure?

Not sure how well you know the P4000 stuff but it has a "quorum" system so in split brain one site take over as primary and the other goes offline - if you only have two ESX hosts (one in each site) how would each ESX host deal with not being able to see the other ESX host, but one host still being able to access the shared storage?

admin · ‎10-08-2010

3 different possibilities for a split brain situation:

1. storage network partition: For EMC, one site is designated as the primary to win every time. For HP, their "quorum" will decide the winning site dynamically or you can set a preferred site to win every time. For Netapp, there is only one primary site and the secondary would never take over without manual intervention. (caveat: some partners may have added new functionality so I may not be up to date on this).

2. HA network partition: this is the HA "isolation response" settings for the HA cluster. There's a good reading of this on yellow-bricks: http://www.yellow-bricks.com/2010/03/29/cool-new-ha-feature-coming-up-to-prevent-a-split-brain-situa... and http://www.yellow-bricks.com/2009/05/24/vsphere-ha-isolation-response/

3. both HA cluster and storage network partition: In this case, I think it depends what happens with the storage and when it happens. If the storage is available in the preferred (winning) site fast enough, HA will perform a failover and restart all VMs on the host in that site. If the storage is not fast enough in declaring the winning site, HA may timeout in the retries of attempting to restart VMs (since the VMDKs for the failed VMs will be locked by the losing site). This one may depend on mostly on what happens in #1 (above) and how automated the recovery of the storage array in a network partition case.

All

HA and FT in multi-site cluster?