VMware Cloud Community
Almero
Enthusiast
Enthusiast
Jump to solution

Using vSphere Replication to TEST/Trial Recovery while keeping Primary site online .

Hi Guys . niot sure if I am missing something really obvious .

From what I can gather tha basic "Recovery process" in vSphere Replication5.1 involves the following workflow

However , our business demands we test the ability to recover selected VMs at DR site while NOT impacting primary site .

Is this possible without SRM / Array based replication / VEEAM replication  ?

Our Setup

PROD VCenter ( dedicated VLAN x )

PROD VR appliance ( dedicated VLAN y )

PROD HA and DRS Cluster

PROD Guest Networking ( VLAN z )

DR VCenter ( dedicated Vlan X )

PROD VR appliance ( dedicated VLAN y )

PROD HA and DRS Cluster

PROD Guest Networking ( VLAN a , isolated , cannot route back to primary site )

No SRM , and No possibilty to use preverred array based replication , no Budget for VEEAM replciation.

MUST BE DONE while primary site remains full alive >

Basic process to test recovery at DR and come back as I understand it >

1 Setup Replication jobs ( done )

2 Web access in DR VCENTER > Pause/Stop Replication on VMs we want to recover.

3 Web access in DR VCENTER > Recover DR VMs using "recover with latest changes" ( Show stopper as Primary/Source VM needs to go down )

3 Assuming there is a way past step 3 , we were planning to switch the recovered guests to DR VLAN port groups a and reconfigure Guest Ips using Powershell mass process

4 Active Directory and DNS will be altered in DR TEST Site to make recovered Guests usable .

5 Users test at DR on revovered VMs in DR VLan a , so this should be route back or affect PROD site .

6 Fail recovered VMs back to Primary site . This once again required Pimary VMs to be down / EVEN be removed from inventory !!!

Procedure here

VMware vSphere 5.1

So looks like there is no way to use vSphere Replication to TEST the abilty to fail over , unless you can share secrets with me .

1 Solution

Accepted Solutions
Biliana
VMware Employee
VMware Employee
Jump to solution

If you are using vSphere Replication without SRM on top of it, Test Recovery is not available. The process you could follow in case you would like to keep your primary site online:

1) Perform Recovery using the second option (Recover with latest available data). In this way you would not need to power off source VMs and your primary site will be online.

2) After recovery is complete, stop the replications (which will be in Recovered state)

3) Power off recovered VMs, unregister them from VC inventory, but keep the disk files intact.

4) Manually configure all replications using the disks that were left over as initial seeds. This will cause only changes to be synced.

To ease step 4, you could use vSphere Replication multi-vm configuration wizard. Just make sure that on the target datastore each disk that will be used as initial copy is put in the folder with VM name. Then, you could try to configure all VMs at once, performing searching for seeds and confirming to use them.

View solution in original post

0 Kudos
4 Replies
RubyIvy
Enthusiast
Enthusiast
Jump to solution

Hi,

Currently Test Failover + Test Cleanup feature, without having to pause or stop replication works in VR, only if using SRM on top of it.

For VR without SRM, after performing the failover, the only possible option is to stop the replication. Performing the failover consolidates the replica disks at the target datastores and prepares the vm configuration files.

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful.
Almero
Enthusiast
Enthusiast
Jump to solution

Hi Ruby , thats what I understood , however my client simply refuses to purchace the super expensive SRM utility . SRM test failover is the only supported way to accomplish this without having to stop and re-seed replicas .

ANY idea where i can find documentation about how you can manually ( CLI and POWERcli ) prepare Target Datastores ( where replicas currently go )?

As you correctly stated , target replica VMFS contains the HBR files in stead of conventional virtual machine files , but certaily , just as SRM prepares the files for normal operation , there must be a manual , "upsupported " way  .

I want to test this unsupported process with 490 vSphere replicas running to DR site .

1> Cut the LAN between PROD and DR , just for VLANs used by vSphere , and VR.

2> PROD is now technically not "aware" of DR , and DR thinks PROD is down .

3> Recover DR VMs in Web access . Keep them powered down , and leave vNICs disconnected

4> In my home lab tests , this leaves the DR VMs in conventional VM format , stock standard VMDKs and VMXs. No HBRCFG and BBRDISK files.

5> Our storage team now creates a "flash" copy of Fiber channel luns , using IBM SVC and essentially creates a copy of all the VMFS datastores . This take an hour.

6> After this , the "flashed luns" are presented to a new isolated set of ESXi hosts .( hosts are zoned in advance , and they only have fake networks on isolated Cisco siwtch . ( we now have VMs usable in isolated cluster , that we will use later on )

7 > Remove DR VMs from inventory without "deleting from disk "

8 > The VLANS blocked in step 1 between PROD and DR are now restored .

9 > vSphere replication devices can now  " talk ".

10 > Stop replication jobs as VMs should be in recovered state .

11 > Redo vSphere replication from PROD side but use DR VMs as seeds . ( this will most likely cause MASSIVE CPU storm on ESXi hosts are checksum in done ( i saw this in tests using vim-cmd hbrsvc/vmreplica.getState vmid )

Last Step > Use the flashed luns from step 5 to mount snapshot luns in isolated cluster , add to inventory , and power up with " i moved it " parameter.

I know this is complex , but I cannot provide a better solution without SRM .

Any better ideas would be greatly appreciated  , stuck between I.T. and Finance departments here !!!

The only thing I could find a super hack based on Duncan Eppings atricle that suggests there might be a way .

0 Kudos
Biliana
VMware Employee
VMware Employee
Jump to solution

If you are using vSphere Replication without SRM on top of it, Test Recovery is not available. The process you could follow in case you would like to keep your primary site online:

1) Perform Recovery using the second option (Recover with latest available data). In this way you would not need to power off source VMs and your primary site will be online.

2) After recovery is complete, stop the replications (which will be in Recovered state)

3) Power off recovered VMs, unregister them from VC inventory, but keep the disk files intact.

4) Manually configure all replications using the disks that were left over as initial seeds. This will cause only changes to be synced.

To ease step 4, you could use vSphere Replication multi-vm configuration wizard. Just make sure that on the target datastore each disk that will be used as initial copy is put in the folder with VM name. Then, you could try to configure all VMs at once, performing searching for seeds and confirming to use them.

0 Kudos
Almero
Enthusiast
Enthusiast
Jump to solution

Biliana , YOU ARE THE BESSSSST !!!!

Just tested this , and have to say , nice "ouside the the PDF " thinking .

You just saved me from weeks of manaul configs .

Best of all is , once recovered , the VM is in classic format ( only vmx and vmdks etc ) , so I can flash copy this to my isolated DR cluster.

You made my day !