VMware Cloud Community
ServerMonkey
Contributor
Contributor

SRM 4 Failback

After having SRM 4 and RecoverPoint 3.2 up and running for a while now I decided to run a sample recovery plan to ensure everything was fine, the failover worked perfectly and only took a minute to run.

The problem is when I go to failback, its not working as I'd expect.

The first thing I did was confirm the host was up and running, then removed it from the protection group in production and removed the recovery plan that I used to run it.

I've logged into RecoverPoint admin to swap the replication around and I find that the system has done it for me, the diagram shows traffic flowing in the reverse direction already.

So at this point I've gone to reverse things and setup a protection group on the DR cluster so I can start the process of moving the system back onto the production array, the problem is I"m not able to click the create button as it's greyed out.

Doing a scan with the array manager results in the following warning "Unable to display datastore groups. Replicated devices could not be matched with datastores in the inventory" and not disks are listed.

Has anyone come across this problem before, any suggestions out there?

Thanks in advance.

Reply
0 Kudos
5 Replies
galday
Contributor
Contributor

One question, did you ever configure successfully any recovery plan from site B to site A?

I mean, if you set up a new LUN at site B, replicate it and rescan, do you see it and enables you to create a protection group?

BTW, I need to rescan and enter again to see any changes.

I would try to break the replication, delete replicated LUNs to original site, and re-establish again replication.

Make LUN presentations and try again.

Reply
0 Kudos
Smoggy
VMware Employee
VMware Employee

forgive me for asking but can I just sanity check something.

when you say you getting the "Replicated devices could not be matched with datastores in the inventory" message which SRM server are you logged into to see this?

if we assume you started with SiteA and SiteB with replication going A -> B then you would have had for example ProtectionGrp1 at SiteA and RecoveryPlan1 at SiteB

You then failed over to SiteB and recoverpoint under the covers reverses the replication in a dynamic swap fashion so your replication is now going from B -> A

It sounds like your hitting the "Rescan Arrays" button still logged into the SRM server at SiteA, I just need to check if that is the case as that is incorrect.

What you need to do is login to the SRM server at SiteB, configure the array managers (SRAs) at SiteB but obviously now your at SiteB the information your enter for "Protected Array" will be that of the RPA environment at SiteB since that is now in effect the protected array as the replication is going from B->A now. The "Recovery Array" entry screen in SiteB's SRM server would then be populated with the RPA details for SiteA's array. Once done and datastore groups visible you then create a new protection group in SiteB's SRM server and then a new recovery plan in SiteA's SRM server.

apologies if I am preaching to the choir here but it that error message just sounded like you might have been running "Rescan Arrays" from either the wrong side or have entered the SRA config information the wrong way round at SiteB

ensure that you also "Rescan Arrays" for devices for ALL ESX hosts at SiteA before you do anything in SRM at SiteA in terms of recovery as you need to "refresh" ESX's view of the storage layer.

cheers

Lee

Reply
0 Kudos
bladeraptor
VMware Employee
VMware Employee

Hi

I take it that you are using the' SRM will manage' the replication option in the policy tab for your consistency group?

When you found that the replication had already reversed - did you allow the two sides to get fully synched before failing back or are you just trying to test a recovery plan on the old production side?

Clearly the disk signatured will be different and you will need to bring those into the VMware environment to ensure that both side production and recovery can see the volumes at the HBA level and on the 'production side' you have a valid VMFS datastore registered and writeable by the ESX host

Can you explain in detailed bullet points what you have done since failing over and the state of the RecoverPoint device as you did the various steps

Many thanks

Alex Tanner

ServerMonkey
Contributor
Contributor

Think I found the problem, testing it now.

The issue is we haven't upgraded one host to ESX4, on cluster member is still on ESX 3.5 and sure enough the LVM.EnableReSignature setting wasn't enabled.

I'm sure people here know but if not ,ESX4 performs this function automatically whereas in version 3.5 you had to enable it in the advanced settings section of the host configuration.

A silly mistake at the end of the day, thanks to all who replied.

Reply
0 Kudos
ServerMonkey
Contributor
Contributor

Enabling the LVM.EnableResignature advanced setting did the trick.

Reply
0 Kudos