VMware Cloud Community

SRM Disaster Recovery Failover Test of Production scenario


At work, we're making preparations to test our DR preparedness of our Production environment with the following scenario.


At our primary Data Centre:

  • Prior to the the network isolation, we'll do a final replication of all our vSphere replications
  • Shutdown the servers
  • Then the DC will be isolated to simulate the DR

At our Disaster Recovery Data Centre:

  • Perform a Disaster Recovery of several recovery plans using SRM
  • Manually recover some servers using vSphere replication using the recover option
  • Ensure each service is available from our DR site

Typically once the primary DC is available again I'd hit the re-protect option and go back the other way however this time the idea is to essentially shutdown the virtual machines at our DR site and power on the virtual machines at our primary DC (we're not copying any changes back this time). I'm sure this part isn't an issue but I'm thinking of the mess that would be left behind in the vSphere replications (outgoing and incoming), and the state left behind in SRM (I.E. recovery plans would still think it's all recovered at our DR site). Unfortunately I cannot test what it's going to look like and why I need to do to clean it up however I'm assuming the clean up would look something like this (if anyone knows please advise):

Clean up (When both sites can once again talk with each other):

Obviously servers at our Primary DC have been powered back on and the servers at our DR will be powered off

  1. Delete the recovered virtual machines at our DR site
  2. Stop replication (possibly with the Force option). It's current state would be in a recovered mode. I don't think I can reuse them without first Stopping them which delete them.
  3. Delete the recovery plans.

At this point I'd re-configure vSphere replication again and re-create any recovery plans to point back to our Disaster Recovery DC.

Has anyone done anything similar???


2 Replies

Doing this might very well require rebuilding SRM at both sites as the SRM servers and replication will be in 2 different states.

I would not recommend this as you've outlined it. You would be better served by reprotecting the VMs and then failing back.

Other options: If you want to test your DR plan, I suggest you run an SRM test (if your SRA supports it you can run these with your network disconnected).

Either of these 2 methods would show you that your DR plan works, and leave you much less exposed to risk.



Thanks for your reply, ... Unfortunately I cannot re-protect the SRM recovery plans and failover the services back to their original site for this DR day (the decision isn't with me on this one). I'm hoping it doesn't require a full rebuild of the SRM however I'm thinking it will require the deletion of all my Recovery Plans and vSphere replications.


0 Kudos