Skip navigation
VMware

This Question is Answered (go to answer)

1,111 Views 11 Replies Last post: Apr 14, 2009 5:57 PM by cswaters1 RSS
cswaters1 Enthusiast vExpert 123 posts since
Jul 28, 2004
Currently Being Moderated

Mar 31, 2009 7:39 PM

Failback steps for recovered VMs

Hi,

 

I've just been reading the admin guide and some 3rd party vendor documents on reconfiguring SRM for failback for use in a production design.

 

I notice that there are steps in the admin guide to shut down all virtual machines that were recovered to recovery site as part of a completed failover (see attachment).  The steps guide you to  remove the placeholder VM files on the array and delete any files/directories in vCenter at the recovery site that contains VM configuration files created during the protection group creation.

 

This would mean this if you were performing an 'assisted failback' you would need a system outage for all recoverd VMs for the time it takes to complete all reconfiguration tasks and perform a test failback?  Is this correct or is this just a cut and paste from the evaluation guide where we don't care about system outages as it's eval?

 

We use EMC CLARiiONs for storage and I've reviewed the H5583-VMware_Site_Recovery_Manager_with_EMC_CLARiiON_CX3_and_MirrorViewS_Implementation (see attachment)- this document doens't mention any of these steps at all.

 

Can anyone who has sucessfully completed a failback comment?

 

Surely the placeholder VM files on the storage array and any folders/directories in vCenter can be removed after the failback is completed, that way all changes can be made while the recovery VMs are up and running, and no downtime will be experienced by the business? -isn't that what the outcome you would want in this situation?

 

Look forward to your opinion on this subject.

 

Craig.

Craig Waters | vExpert 2012 | Melbourne VMware User Group Leader | website: blog.rack.org.au | twitter: @cswaters1 @mvmug | Winner of the 2011 VMUG Leader Awards - President's Award
 
Smoggy Expert VMware Employees 228 posts since
Nov 9, 2005
Currently Being Moderated
2. Apr 3, 2009 11:05 PM in response to: cswaters1
Re: Failback steps for recovered VMs

 

Hi Craig,

 

 

I am sure your lack of replies was a friday afternoon'ism

 

 

Your basic understanding of the failback process is correct during failback when the storage is being reveresed you will need to bring down your VM's that you want to failback. Usually you would not do this en-mass since you would failback certain groups of luns at a time in a pre-defined order, probably same order you used to failover as that usually represents the "priority" of the VM's to your business.

 

 

I need an example to explain but lets say we start with Site1 (Protected) and LUNA replicated to LUNA-REP Site2.

 

 

When we perform a "Test" failover in SRM we will create (or utilize) an array based snapshot as the storage to boot the recovered VM's from.

 

 

When we perform a "Real" failover in SRM the SRA will usually split LUNA-REP from LUNA and now present LUNA-REP as read/write to the ESX hosts at Site2 and failover.

 

 

This means when you failback the storage array has to quiesce / lock LUNA-REP and copy its changes back to LUNA at Site1. On nearly all storage arrays I can think off to ensure consistency this means you need to shutdown the VM's running on LUNA-REP so that the array replication software can re-sync all the changed data back to the original site. At this time the LUNA-REP will be made read only to the ESX hosts.

 

 

The amount of time taken to copy the changes back to the original site is dependent on the array type and its replication software type. it is also a factor of the amount of time the source and destination luns have been "apart" so to speak. If the luns have been "apart" for a short time and rate of change of data is relatively small then many arrays are able to do incremental resyncs where only changed tracks are copied back which is much faster than doing a full lun copy.

 

 

During failback the amount of time it takes to cleanup placeholders etc shouldn't be that big a deal since you can easily script that, one thing i always do in my designs is have a non-replicated datastore at each "recovery" site that holds ALL my placeholder VM's. For me this makes more sense then spreading them around for 2 reasons:

 

 

1. i know where they ALL are for troubleshooting / cleanup (with scripts)

 

 

2. stops any other admin deleting them by mistake if they stumble across them in a datastore browser window for example and think "hmm a folder with 3 small vm config files in and no vmdk's, must be  an old VM we forgot to cleanup properly i'll delete that."

 

 

Hope this helps,

 

 

Lee

 

 

 

 

 

PS: will be out (and not online) most of next week...so if you have any follow on questions apologies in advance if my reply is delayed.

 

 

 

 

 

bladeraptor Hot Shot VMware Employees 82 posts since
Dec 19, 2007
Currently Being Moderated
4. Apr 6, 2009 12:11 AM in response to: cswaters1
Re: Failback steps for recovered VMs

 

Hi

 

 

I am writing as an EMC Employee

 

 

As you suggest in the scenario where we have a light site to dark site to light site requirement, the promote feature of MirrorView could indeed be used to fail over and then fail back an SRM environment

 

 

As you are no doubt aware SRM is a broad framework designed to allow storage vendors and other solution providers to interoperate with a common framework.

 

 

SRM v1 as a framework was designed to achieve some very specific criteria and to do it on a broad basis to allow as many storage vendors on VMware's HCL as possible  to participate with their relevant replication solutions

 

 

Due to differences in implementation and sophistication not all vendors have the components in place to be able to failover and failback and as the desire to provide a broad open framework in the first instance was the overriding goal - failback (in the scenario where the production array remained available) was not considered a mandatory element of recovering from a 'smoking hole in the ground genuine disaster.

 

 

If you are aware of the EMC portfolio you may know that we offer various geo-stretched clustering solution such as MirrorView Cluster enabler. This allows ther user to fail a cluster between a node located on Sites A (say London) and Site B (say New York) and then fail back again. This implementation demonstrates the ability of MirrorView to demote and promote the mirrors as you suggest. In the case of failback from a full failover - reconfiguration of the snapshots is not necessary as we suggest that both the production and the secondary array have snapshots configured and snapshots are not involved in a full failback as opposed to a test

 

 

So the ability to do as you suggest with promoting and demoting the mirrors exists now.

 

 

You are correct in that there is a Celerra failback vCenter plug-in which allows the user to failback selected or all failed over SRM Celerra Sessions. This was architected by the same team that wrote the SRM plug-in. Basically having written the scripts to fail the environment over, the same group then wrote the scripts to fail it back the other way and clean up the environment.

 

 

Now it cannot be emphasized enough that this is a purely EMC development and is not a pre-cursor on EMC's part to a wider VMware driven SRM failback scenario. VMware as I understand it and Lee can comment much more authoritatively is defining the SRM failback framework as we speak and my understanding is that it will be broader and deeper than the current EMC articulation.

 

 

The porting of the logic behind the Celerra failback wizard is coming to other of the EMC SRM failback solutions - I have no access to confirmed timescales but the commitment from my boss, Chad Sakac is there.

 

 

When it does appear it will ideally work as you suggests and allow a largely painless failover and failback of an SRM environment.

 

 

Note however, that due to the nature of the way in which protection groups are created - these are not automatically recreated upon failback and must be recreated manually.

 

 

Recovery plans however do not need to be recreated - the new protection groups can simply be added back into existing Protection Groups

 

 

I hope this helps

 

 

Kind regards

 

 

Alex Tanner

 

 

bladeraptor Hot Shot VMware Employees 82 posts since
Dec 19, 2007
Currently Being Moderated
6. Apr 7, 2009 1:40 AM in response to: cswaters1
Re: Failback steps for recovered VMs

 

Hi Craig,

 

 

You are most welcome. Both Lee and I are based in the UK - so we can understand the backwater comments :]

 

 

I am afraid at this time I cannot give you a firm date but I will escalate with Chad and will try and get something within a month date timeframe and get back to you

 

 

As for the EMC Storage Viewer it is avaialble now on PowerLink in the following section

 

 

Home > Support > Technical Documentation and Advisories > White Papers > Configuration/Administration

 

 

The White Paper is titled  White Paper: Using EMC Storage Viewer for Virtual Infrastructure Client - A Detailed Review

 

 

I will email you privately the details of your local EMC VMware specialist and please let me know if you don't get any joy from that route

 

 

I have worked extensively - in my lab and at VMworld US 2008 with failing over and failing back a CLARiiON SRM environment and it works well. The SRM failback tool for CLARiiON should simplify this process by automating many of the tasks which are done manually now

 

 

Many thanks

 

 

Alex Tanner

 

 

depping Champion VMware Employees User Moderators vExpert 4,233 posts since
Jan 17, 2005
Currently Being Moderated
7. Apr 7, 2009 2:32 AM in response to: cswaters1
Re: Failback steps for recovered VMs
  1. can't comment on this one

  2. http://virtualgeek.typepad.com/virtual_geek/2009/04/where-to-get-the-emc-storage-viewer-vcenter-plugin.html

  3. reachout to Chad via the link in 2). it's his blog and he can answer your questions for sure.

  4. no I can't comment on the availability of the next version of SRM and it's new features. no one can and/or will i guess...

 

 

 

 

Duncan

VMware Communities User Moderator

-


Blogging: http://www.yellow-bricks.com

Twitter: http://www.twitter.com/depping

 

If you find this information useful, please award points for "correct" or "helpful".

Duncan | Yellow-Bricks.com | Author of the vSphere 5.0 Clustering Deepdive
bladeraptor Hot Shot VMware Employees 82 posts since
Dec 19, 2007
Currently Being Moderated
10. Apr 14, 2009 4:36 AM in response to: cswaters1
Re: Failback steps for recovered VMs

 

Hi Craig

 

 

The word from Chad is the target date and this point, but subject to change, is June

 

 

This is in no way a comittment to make the functionality avaialble at that time - but the engineering teams are looking at that sort of time frame

 

 

Hope that helps

 

 

Kind regards

 

 

Alex Tanner

 

 

Bookmarked By (0)

Share This Page

Communities