VMware Cloud Community
Whocarez
Contributor
Contributor

Geographically dispersed HA cluster

Currently we're investigating a DR solution for our site. So the first thing we've looked at was a solution like SRM. Mirror your san to the other location and let SRM handle the fail over of the VM's. In this situation we have 2 different subnets, so SRM is needed to automate the re-ip of the VM's

But with the option of a LAN extension we've found a couple of extra options.

One of the options we like the most is a split cluster like the picture below. The reason we liked it most is that we can split our DTAP environment. DTA on the second site and P on the fist site

http://www.van-lieshout.com/wp-content/uploads/2009/11/111509_1554_Geographica5.jpg

Due to line capacity we can only replicate a-sync so there is a little delay. So is it possible to use this design and what are the extra design considerations we have to take care of?

Reply
0 Kudos
5 Replies
mcowger
Immortal
Immortal

This is possible with a few different storage solutions out there (EMC VPLEX, HP LeftHand/P4000, possibly Compellent LiveVolume), but they all need syncronous-level response times and throughput....

--Matt VCDX #52 blog.cowger.us
Dracolith
Enthusiast
Enthusiast

Whocarez wrote:

Due to line capacity we can only replicate a-sync so there is a little delay. So is it possible to use this design and what are the extra design considerations we have to take care of?

This is the clincher.

If you don't have the line capacity, resiliency,  and sufficiently low latency for synchronous replication,

then your sites also don't have sufficient  characteristics  to form a proper VMware HA cluster across sites either.

Because synchronous mirroring is basically a requirement in this scenario.

Even if you do have synchronous mirroring and LAN extension;  HA alone is not a DR strategy.

There are other challenging issues in designing such a scenario, such as how to make sure a loss of one link between

datacenters doesn't result in HA failover causing a "split brain" situation,  with respect to both network and VMs.

Whocarez
Contributor
Contributor

If i'm correcty understanding

I have the following requierments to be made first

If i want an active/active geo dispersed Cluster whit HA enabled i should have enough line capacity for real time sync.

In the current situation, I can split the cluster. Use HA on each site but not over both sites. (affinity settings).

If the primairy site failes I should be able to manualy recover the vm's on the secondairy knowing that there is data loss.

The split brain problem i'm awair of and could be solved with vmware heartbeat if i'm correct.

Reply
0 Kudos
Dracolith
Enthusiast
Enthusiast

Whocarez wrote:

In the current situation, I can split the cluster. Use HA on each site but not over both sites. (affinity settings).

If the primairy site failes I should be able to manualy recover the vm's on the secondairy knowing that there is data loss.

The HA agent in ESXi5 utilizes a heartbeat on the datastores themselves. If your VMFS datastore is synchronously mirrored,   so that a write can be made on either side,  then it can possibly be configured to  "look like"  the same datastore to all the hosts.

So that any write operation to either side is immediately propagated  to the opposite side, before the write is committed.

In fact... you really need not only synchronous mirroring of the files, but also  mirroring of file locks,

so that a VM cannot accidentally be started on both sides simultaneously,  either by HA, or by a human.

However, if the datastore is only asynchronously mirrored,  the datastore will  "look different"  to different hosts.

Different hosts will see different filesystem contents,  and for HA purposes, the hosts care about that.

The split brain problem i'm awair of and could be solved with vmware heartbeat if i'm correct.

The VMware vCenter Heartbeat product is a product for backing up vCenter;  it doesn't help with VMware HA.

Other than forcing failover to be a manual operation...  (in that case, why try to use HA?)

The only real ways to really prevent split brain in a HA scenario are to have assuredly redundant communication links,

that you can guarantee will never fail together,  or have  a "third site",  with some kind of resource,  so you have

an odd number of "clustered systems"  (sites that can fail)  with independent communication paths.

Reply
0 Kudos
mcowger
Immortal
Immortal


The only real ways to really prevent split brain in a HA scenario are to have assuredly redundant communication links,

that you can guarantee will never fail together,  or have  a "third site",  with some kind of resource,  so you have

an odd number of "clustered systems"  (sites that can fail)  with independent communication paths.

Indeed, this is how VPLEX solves the problem, with a 'witness' machine (or VM) running at some third site.

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos