Solved: How granular is SRM failover?

nyplnyc · ‎04-17-2010

I know it's probably a simple question - I get stuck with the marketing lingo and just am not clear on this.

Is SRM geared to recover an entire site or can it do that and also failover individual VMs?

For example, if I deploy:

Site A - 40 VMs, 4 physical servers, SAN, 4 data stores

Site B - 40 VMs, 4 physical servers, SAN, 4 data stores

If I lose 1 server at site A can I have it served from Site B or does it have to be all 40 VMs or an entire data store?

Is there a failback option?

Lastly, is there versioning/snapshots along with the replication (which is handled by the SAN i know, but are VM snapshots available)?

Thanks!

Michelle_Laveri · ‎04-18-2010

There are couple of interesting points here...

Firstly, the others in this thead are correct - SRM failovers all the VMs in the datastore - really the only way to recovery on VM out of datastore that contains 20 is to have a recovery plan just for it - you place 19 of the VMs into the "don't power on" area, and the single VM in the Normal Priority. The whole 20 get recovered during test - but effectively only one VM would be active. It's a clumsy work around which might not serve your purposes - and useless if you want to press run...

We will have to wait until the arrays are VM-ware - where the storage array "knows" that the volumes/luns are formatted for VMFS/NFS, and detect the file type of vmx/vmdk. Things are going in that direction - the new Clarrion CX4 and Navisphere give you excellent visablity to your VM right down to the size and type of virtual disk - within Navisphere...

As for what consitutes a failure. Some people would argue that the lost of one datastore couldn't be considered as disaster - given that might be recoverable by some sort of snapshot that local to the array, rather than replication to another geographical location. As ever i guess depends on the size and scope of your infrastructure...

Finally, automatic failover. With the SDK and some work with .net it is possible to automate the recovery plans. But i feel its dangerous terriritory - because you could could easily get split-brain/false positives. Really, if you looking at that kind of availability - perhaps a stretched HA cluster with something like NetApp's Metro cluster is was needs to be considered...

Regards

Mike Laverick

RTFM Education

http://www.rtfm-ed.co.uk

Author of the SRM Book:http://www.rtfm-ed.co.uk/2010/03/22/new-administrating-vmware-site-recovery-manager-4-0/

Free PDF or at-cost Hard Copy

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

View solution in original post

TimOudin · ‎04-17-2010

Regarding the virtual machine granularity, protection is based upon a datastore and that is the most granular object that can be recovered. To have one virtual machine that can be recovered, at this point in SRM's development, it must be the only vm on a LUN.

There is no version, snapshot or bookmark like functionality in SRM. All thing like that are dependent upon the SAN itself.

Tim Oudin

nyplnyc · ‎04-17-2010

So, thanks for the answer, it brings up another question though...

If it's the data store that failover, what constitutes a failure? That data store has to go bye-bye?

mal_michael · ‎04-17-2010

Failover in SRM does not starts automatically. It must be initiated manually, so it's up to you to decide if you want to run a recovery process.

Regarding VM snapshots - they are not supported with SRM.

Michelle_Laveri · ‎04-18-2010