Re: How to avoid the split-brain situation with HA...

migeauxy · ‎01-31-2008

Hi folks

We have a configuration question about HA.

situation is as follows

4 ESX Servers ( A, B, C, D ), running 3.x entreprise, 2 of them in Building A, 2 of them in building B, sites linked with 1 network 1 Gbps and 2 FC.

Servers connected to SAN storage, virtualized with Datacore products - 2 way synchronization and mirroring, with direct availability of the LUNs in Building B if storage in building A crashes (same way if Building B crashes, LUNs are available in A). So it's "online-online"

We want to implement a stretched HA cluster for these 4 nodes so that if one computer room crashes, all the VMs are automatically restarted in the other building, on the mirrored storage.

Seems a configuration perfect for a DR plan, but...

what will happen if all the links between the building fail ? no ethernet anymore, and no FC connection.

regarding Storage, we know : each site will think he is alone and will provide the LUN access to the servers, locally, in read/write mode so all the VMs are indenpendanlty available.

But how will HA work ?

We are not in an isolation case, 'cos on each site, 2 ESX are still running and they can see each other through the local network, so no way to "play" with the isolation response.

But each site will think that the other site is down, and HA will detect it, and restart all the VMs from the other site on its own ESX Servers, 'cos the LUNs are already present. this means that in this case, we would have two times the entire production running independantly in each site. (oops when the links come back ).

Is this the way it will work ?

How can we avoid this "split-brain" situation ?

Thanks for your help

Oli_L · ‎02-03-2008

I believe that if the esx hosts cannot ping the designated gateway configured, then it will deem that there has been a host failure and ha will kick in and move the vms to another host.

if your link goes down then depending on how you've configured your gateway will depend on how your esx hosts will determine whether there has been a host failure

so your gateway plays an important role here... one site may be ok if the link goes down but the other site will deem that there has been a host failure.

hope this helps you

Oli

migeauxy · ‎02-03-2008

Thanks Oli, it helps a bit, but what could make a difference between a site link complete failure, and a complete site disaster.

For HA, it doesn't make any difference if I understand correctly.

If I follow your explaination, we MUST provide a redundant network link between the two sites, so that the gateway for site A is located in site B, and vice-versa.

if the gateway still exist in the other building then HA can see if hosts are down or not.

please continue to post

Yan

patrickds · ‎02-04-2008

Hi,

instead of changing the gateways, you could add additional addresses to be checked in case of HA heartbeat failures.

This way, you can help HA in getting a better look at the situation.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1002117&sl...

Oli_L · ‎02-04-2008

This is a nice idea

However you are going to have to weigh up the risk factors here..

I think Patrickds solution could be a good but obviously you will have to test. Are you basically requesting that HA does not work when you get a link failure? If this is the case then if this solution checks firstly the Gateway, then say the COS switch IP address and if both are down HA will kick in and move the VMs to another host which if you had a computer room crash would be a nice DR solution.

However if the link went down and HA could not connect to say the dedicated gateway address, as long as it sees the das.isolationaddress1 then there would not be a failure and the ESX hosts would continue to run.

However if your fc went and you have mirrored storage across the sites then would there not be a local stop on your datacore server causing some failover and disruption? Point being not a completing unoticable failover.

Oli

migeauxy · ‎02-04-2008

Thanks for the advices

To be more precise, about the Storage, the Datacore product make the storage available in an "always available" configuration, which means if all the links go down (network and FC), each side of the storage think he is the master and give the volumes to the "local" servers in Read/write.

And to anwer the question "Are you basically requesting that HA does not work when you get a Are you basically requesting that HA does not work when you get a link failure?"; the answer is YES.

If all the links go down, I want that HA DO NOT operate

but if all the computer room goes down, the it HAS to work

The point is how can you make the difference between a complete link break and a complete computer room disaster, in an HA point of view.?

Please post again

Yan

Oli_L · ‎02-04-2008

well this is possible if you use Patrickds solution simply because if your computer room goes down then the link to the gateway would be available and HA would fail the VMs over to your other cluster in the other site, and if the link was to go down then the das.isolationaddress1 (say your sonsole switch ip address) would still be availble and HA wouldn't do anything. It's like a boolean expression which can define the HA rule. Although one side of your site will hold your default gateway so bear that in mind

Remember that it's not the computer room that goes down it is the host. And if both hosts goes down because there is a problem in the computer room then as long as you have sufficient resources available in the cluster in your other site then the VMs will be powered up on the hosts in your other site. However you might have a single host failure in your computer room and you may want the VMs on that host to fail over to the other host in the computer room rather than DRS making that decesion. You 'can' use the Attribute: das.defaultfailoverhost in advanced settings of the VM but then you are not letting HA to control the environment for you. If both hosts go down this setting doesn't take effect

One thing to make sure is not to use automated DRS and try to keep the VMs running on the host relative to the side their primary storage is located on.

migeauxy · ‎02-07-2008

Thanks again

I'm pretty sure this is a good way to do it if one of the building is "primary" and the other one "secondary"

In our case, building A is the redundant site for building B, and Building B is redundant site for building A.

So I have to have one gateway in the other site (B) for the local site (A), and one gateway in the other site (A) for the local site (B).

I think we will have to go for HA cluster in site A, and another HA Cluster in site B

In case of disaster or link failure, we will have to deal it manually or with scripting

Because in our case, HA will not be able to identify if the other building is completely down or just "disconnetced".

If you can post any further advices, the are welcome.

All

How to avoid the split-brain situation with HA ?