VMware

This Question is Answered

1 2 Previous Next 23 Replies Last post: Nov 20, 2009 5:03 AM by king@it.ibm.c…  

NetApp Metrocluster: "disappointed" posted: Oct 20, 2009 2:40 PM

Click to view king@it.ibm.com's profile Virtuoso 2,927 posts since
Jan 16, 2004
Just wanted your points of view on the matter.

I was lately looking into the subject and I found it a pretty interesting solution:

http://www.netapp.com/us/library/technical-reports/tr-3548.html

My idea was that you could create a stretched storage server that would persist even in the case of one of the two buildings in the Campus collapse. One would be led to think this way as you read the "pyramid" at pag 4: Metrocluster can be used for "Datacenter / Site disasters".

However it turns out that a complete site disaster (i.e. head + shelves) doesn't provide automatic switch over to the other head (and mirrored shelves). You can depict this from this VMware KB (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001783) as well as from section 2.11 of the NetApp doc above which lists the advantages of MetroCluster Vs standard Syncronous replication:

1) Low aggregate level RAID mirroring (less performance impact)
2) Automatic switchover to remote copy upon failure
3) Site failover with a single command
4) Simpler to manage than multiple replication relationships
5) No extensive scripting required to make data available after failover

I am ok with them... however #3 looks a bit simplistic. While you can failover your storage with a single button... restarting all your VMs onto the surviving site can be as problematic as restarting them in a synchronous replication scenario (where at least you have SRM to optimize the whole thing). See the VMware KB above.

I was told that this "limitation" is due to the typical potential split brain issue of clustering solutions (i.e. you don't really know whether the building has collapsed or the sites just lost communication with each other). I can understand this is not trivial.

Don't get me wrong... I think it's a wonderful solution..... but this "little thing" left me with a bad taste in the mouth.....

Comments?

Massimo.

Re: NetApp Metrocluster: "disappointed"

1. Oct 20, 2009 9:26 PM in response to: king@it.ibm.c…
Click to view RussellCorey's profile Hot Shot 102 posts since
May 12, 2008
Have a look at TR-3788; I'm reviewing it now and it might provide you the information you're looking for.

edit:


Re-read your post. Basically when you lose an entire site, it's really up to somebody with authority to say "yes a disaster has occured" so you can execute the 'cf forcetakeover -f' command. You could obviously automate this with another tool (say Nagios) if you were feeling the need to live in the bleeding edge. Optionally you can create a PowerShell script on the dekstop of some system on the remote site that will ssh into the surviving head and execute the command. If you do it quick enough then HA will probably just power things on. Otherwise you will need to manually go through that process.


Part of the reason that complete site failover (shelves, array, hosts, and fabric) isn't automated is that failing back is a non-trivial operation. If you have to run at the other site for a few days you've got to re-sync the mirror which could take time, etc. Once thats done though you can execute your giveback and start vmotioning things back "home."

edit2:

This differs from using SnapMirror in that you don't need to break the mirror and present the devices/volumes as read write to the ESX server hosts on the failover side. It's pretty much just typing a single command to fail over.

The flip side of course is that you need to make sure you have the same subnet at both sites.

Re: NetApp Metrocluster: "disappointed"

2. Oct 20, 2009 9:49 PM in response to: king@it.ibm.c…
Click to view Preetom's profile Novice 2 posts since
Jan 16, 2008
Agree with Russell. The 'Operation Scenario 6' in section 5.2 of the document - TR-3788 (http://media.netapp.com/documents/tr-3788.pdf) should provide
you the steps to be taken in case of a complete site disaster scenario.
Also kindly check the other disaster scenarios too as described in the document.

As you can see, the recovery steps to be taken during a complete site failure is pretty much automated after manually issuing the 'cf forcetakeover'
command from the surviving site.

Re: NetApp Metrocluster: "disappointed"

4. Oct 21, 2009 1:20 AM in response to: king@it.ibm.c…
Click to view RussellCorey's profile Hot Shot 102 posts since
May 12, 2008
I was thinking about this more and one thought occured to me that's going to be easier in Metrocluster and that's going to be failback.

With SRM the process will look something like this:

1. Initialize snapmirrors back to the "home" site

2. Reconfigure the SRA to determine replication relationships

3. Build protection groups

4. Build recovery plans

5. Execute recovery plans.

With Metrocluster it's basically:

1. Wait for resync

2. Issue giveback.

3. Vmotion VMs back home or just wait for secondary site to fail (or not fail)

In either event you're going to want someone with authority to tell you its okay to "push the button"; wether you're kicking off a script to ssh into the filer or pressing the "run" button in SRM. The environment itself is going to dictate which of the two options will be more elegant.

For example, when there is a layer2 network connection (Same VLAN exists in both sites) with a low latency link (DWDM?) then Metrocluster would probably be the best choice.

When the DR site is on a separate network space and there is a higher latency connection between sites, then SRM becomes the better solution.

On the different technologies, Metrocluster is effectively good up to 100km (~60 miles), which is right around where we find a lot of DR sites these days. Additional protection can be provided by snapmirroring data to yet a 3rd site even further away in the event of a massive regional outage. Perhaps your primary site is located in your office complex which is fed by the city grid while your secondary head is in a datacenter on dedicated power feeds? In that situation your secondary metrocluster node could be across the street and still running while your office building is pitch black.

Regarding failures, lets suppose your blade chassis has a major component issue (all the PSUs fail for whatever reason, or perhaps someone unplugged them by accident) or a switch module misconfiguration that disconnects them from the network/SAN.

On switch failures, its not terribly uncommon for network/SAN fabric to be powered by DC infrastructure while your actual servers are being fed via AC. I've been in datacenters where all the AC had dropped off and the DC is running strong (this would provide the superman switch scenario). Losing power is a form of component failure.

" To this point I have to disagree. I'd rather use an
automated tool such as VMware SRM to semi-automate the recovery
end-to-end (from storage to VMs) of an otherwise more complex storage
solution .... rather than issuing manually a single easy command at the
storage level .... and facing the challenge of restarting hundreds of
VMs in the proper order w/o a tool that helps me doing that."

With propper planning you can get the same if not more granular controls. PowerCLI provides us the optiont to quickly build solutions to solve problems just like this and still provide the illusion of automation (admittedly requires scripting but thats what consultants and professionals are for!)

In a way, Metrocluster provides an overall more simple design at the expense of more infrastructure dollars allowing you to quickly recover AND provide the option for pro-active disaster avoidance.

Going back to the hurricane example, if I have my office in Galveston Texas using metrocluster with the secondary head/shelves in a datacenter in houston on higher ground (hosted for example at level3), I have the option of VMotioning my entire workload to another datacenter when I get notice of a hurricane warning. You could even go so far as to initiate failover prior to the disaster (without bothering to vmotion beforehand) so that if power IS lost then HA will automatically recover the virtual machines at the secondary site.

It also conveniently provides failover from your DR site back to your primary site without having to configure bi-directional snapmirrors and dealing with 2 sets of protection groups with X number of recovery plans on both sides.

SRM has its virtues and is a great tool but availability, flexibility, and agility might drive someone to consider metrocluster, especially if they have the requisite infrastructure in place. If your requirements say "hey my DR site is across the country on a whole other subnet" then SRM is going to be the better tool. In fact, I could think of some "clever' things to do with SRM to help automate pieces/parts of a metrocluster failover solution.

Re: NetApp Metrocluster: "disappointed"

5. Oct 21, 2009 2:47 AM in response to: king@it.ibm.c…
Click to view leiv's profile Novice 3 posts since
Feb 20, 2007
I just finished setting up a streched metrocluster with iSCSI and NFS connections to vSphere.

You don't have to do anything to fail over in case of an error. This happens 100% automatically. Notice that you have to set the cf.takeover.change_fsid parameter to keep the NFS ID. Otherwise, you'll have to remount the NFS volumes on the vSphere servers.


One thing you'll have to do manually is to fail back when the site which dies is back on it's feet. The command to do that is "cf giveback". The can also be done with the MMC based Netapp System Manager.


In my experience, Metrocluster is the best storage solution for vSphere at the time. With great software features, zero downtime in a site fail event both me and my customer is very happy with Metrocluster.


:)


<Leiv>

Re: NetApp Metrocluster: "disappointed"

6. Oct 21, 2009 2:42 AM in response to: RussellCorey
Click to view Preetom's profile Novice 2 posts since
Jan 16, 2008

Just to add: you can also have the advantage of 'zero data loss' with a MetroCluster based solution when compared to a replication based solution with SRM.

The guranteed RPO of the metroCluster based solution is bound to be better.

Re: NetApp Metrocluster: "disappointed"

10. Oct 21, 2009 7:29 AM in response to: king@it.ibm.c…
Click to view RussellCorey's profile Hot Shot 102 posts since
May 12, 2008

While both SM sync and metrocluster provide the same RPO, metrocluster can potentially provide a tighter RTO. Snapmirror requires that you break the mirror and present the volumes/luns, then rescan and resignature, then register virtual machines. SRM does this automatically but it can take time.

Optionally you could take a hybrid approach. Back end DBs could failover via metrocluster pretty much as soon as the primary site is unavailable and SRM could be used to bring up the middle tiers and application front ends after DB servers have been validated. These scripts can be kicked off as part of the SRM workflow early on.

Re: NetApp Metrocluster: "disappointed"

12. Oct 21, 2009 7:59 AM in response to: king@it.ibm.c…
Click to view RussellCorey's profile Hot Shot 102 posts since
May 12, 2008

It's a fun discussion for sure.

I definitely agree about the hybrid approach, it violates my core design principle K.I.S.S.

PowerCLI scripts are pretty easy to write though and very easy to read. I think most enterprise administrators would not have issues automating parts of the BC/DR workflow through scripting even when using SRM.

Re: NetApp Metrocluster: "disappointed"

14. Oct 22, 2009 8:42 AM in response to: king@it.ibm.c…
Click to view raadek's profile Hot Shot 89 posts since
Jun 5, 2006
Hi all,

I am aware of at least one NetApp MetroCluster implementation with fully automated fail-over capability (i.e. smoking hole scenario at site A will bring site B up without any manual intervention whatsoever).

Normally relying on full automation for fail-over purposes poses a threat of a 'split-brain' scenario, where site B thinks there was a disaster, whilst in fact only network connectivity was lost (sorry if anyone brought that to light already, but I must admit I didn't read all posts thoroughly).

To fill that gap a third node/site must be introduced acting as a witness or tie-breaker:
  • site B can't see site A, but can see a tie-breaker, which sees site A => comms problem between B & A => no fail-over
  • site B can't see site A & can't see a tie-breaker => likely comms problem between B & A & between B & tie-breaker => no fail-over
  • site B can't see site A, but can see a tie-breaker which confirms site A is down => disaster declared => automated fail-over issued

Caveats:
This solution has been implemented for a specific customer as a service offer
This solution is only supported at the French Geography level by the (NetApp) French NGS/PS team

Hope it all makes sense!

Regards,
Radek

VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities