VMware Cloud Community
vPatrickS
Enthusiast
Enthusiast
Jump to solution

vSphere 5.0 - Streched Storage Cluster - HA behavior?

Hello together

//Edit: I cut some lines to make the question a bit more general.

In this case I’m focusing on the scenario where the two datacenters get partitioned.

In case the LAN & FC link will fail simultaneously, (DataCore) won’t be able to deny access to none of the sites, so both sites will have a functional storage.

How will vSphere HA react to it?

For now I will assume:

-          vSphere 5.0

-          HA master in datacenter A

-          Only datacenter A has an active gateway

-          vCenter server resides in datacenter B

-          Yes this is not a supported vMSC solution

datacenter A

datacenter B

  • The master in datacenter A will recognize that the hosts/slaves from datacenter B stopped sending heartbeats.

  • Datastore heartbeats from all hosts in datacenter B will expire

  • The HA master still receives network heartbeats from its slaves in datacenter A

  • The master in datacenter A will declare all hosts in datacenter B as dead

  • The master will restart the protected virtual machines from datacenter B
  • The hosts in datacenter B will recognize that the master from datacenter A stopped sending heartbeats.

  • The slaves will elect a new master

  • Because all hosts in datacenter B receive election traffic they won’t   trigger an isolation response

  • The new master will check the poweron file and the protectedlist (or he will communicates with the vCenter server) to get the necessary information about   the virtual machines

  • The master will restart the protected virtual machines from datacenter A

Is this even possible or do I miss something and I'm completely wrong?

In case the vCenter server runs in datacenter A, the new elected master at least could use the protectedlist and the poweron file, couldn’t he?

With vSphere 4.1 this should be also possible in case there is at least one primary node in each datacenter?

At the moment I don’t see any solution to prevent this from happening (without losing the flexibility of a vSphere cluster) but not using a not supported stretched storage “cluster”?

Regards

Patrick

Reply
0 Kudos
1 Solution

Accepted Solutions
depping
Leadership
Leadership
Jump to solution

I don't know DataCore well enough to say anything about it from a storage perspective, but it seems that what you are saying is that in the case of a site partition / split brain the datastores will be "active" on both sides?

If that is the case then your assumptions are all true and the outcome could potentially be disastrous.

Not sure why you would to use a solution like that to be honest. I would sure hope they do have the concept of "site bias" / "site preference"

View solution in original post

Reply
0 Kudos
3 Replies
depping
Leadership
Leadership
Jump to solution

I don't know DataCore well enough to say anything about it from a storage perspective, but it seems that what you are saying is that in the case of a site partition / split brain the datastores will be "active" on both sides?

If that is the case then your assumptions are all true and the outcome could potentially be disastrous.

Not sure why you would to use a solution like that to be honest. I would sure hope they do have the concept of "site bias" / "site preference"

Reply
0 Kudos
vPatrickS
Enthusiast
Enthusiast
Jump to solution

Hi

Yes in case the mirror & the management links between two DataCore nodes get lost at the same time, both sides remain active.

DataCore does use something called a “preferred side” (which you can set on a host level), so you can tell the ESXi via ALUA which side he should access. But due to the fact DataCore is a real active-active grid a host can switch path to the other side with no problem.

Only in case when the management network is still alive and only the mirror links are down, DataCore will allow access to the “preferred side” and will disable access on the “secondary” side.

And you’re absolutely right; I’m not a fan of this solution rather my colleagues are …

Thanks!

Patrick

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

I guess the only thing I can suggest is to whiteboard you problem for them and let them "solve" it, that way they might get it. That is what I tend to do, draw out the scenario and then let them go over failure scenarios.

Reply
0 Kudos