VMware Cloud Community
dcoz
Hot Shot
Hot Shot

Restoring SRM DB

Hi Guys,

I have been testing recovering SRM and VC in the event of a full DR failover.

The steps i followed were the normal recovery steps for VC, and i followed almost the same steps for SRM.

Then i though is this the correct way to get SRM backup and running on the protected side in this scenario?

In the event of a full DR failover would you restore the SRM database on the protected site, to get it back up and running?

The other thing is if your using credential based authentication would you even worry about SRM certificates?

Thanks for any help

DC

Tags (1)
0 Kudos
6 Replies
dcoz
Hot Shot
Hot Shot

Anyone got any comments?

I would like to hear the communities thoughts on this.

Thanks

DC

0 Kudos
pauljawood
Enthusiast
Enthusiast

Hi,

The database holds the protection group information that has been specified at the protected site. If the protected site has failed then you will be running on the recovery site which would mean that the protected site would not be available.

If you are in a situation where you need to then fail back from the recovery site to the protected site you would reverse the settings on SRM. You would create go through steps to reverse the failover from recovery to protected. If you had lost your site completely then you would have the SRM server setup and installed at the protected site but would only connect it (paired) from the recovery site and once you have setup the protection groups up at the recovery site then setup the recovery plan at your protected site.

There are many steps that you need to have in place to reverse the failover from recovery site back to the main protected site. These are simple to follow and will allow you to failback.

When you are in a failback scenario you have to delete the protection groups anyway so if you have lost the protected site and are making a new one to failback to then as long as you have records (written down on file) of the protection groups you had setup then you should not have to worry about the SQL database restore as you can recreate it a matter of min's and pair your sites together again.

-


If you found this helpful then please leave some points.

If you found this helpful then please leave some points.
0 Kudos
dcoz
Hot Shot
Hot Shot

Hi Paul,

Thanks for the reply.

I am still having issues with re-installing SRM in the now recovery side.

Just to give a run down of what i have done so far.

  1. wiped esx host, SRM server, VC DB server and VC server (to simulate the protected site being a smoking crater)

  2. created new VC DB server and restored VC DB, and resolved ophaned SQL users

  3. reinstall VC pointing to restored DB

  4. Created a new SRM DB

  5. Install SRM onto new server and point to new DB

The install goes fine. But its when i try to connect the now protected site to the now recovery site, the recovery site SRM server performs a mini dump and stops the SRM service.

I have also tried restoring the SRM DB and running through the SRM install saying dont overwrite the DB, but i still get the same error when trying to connect the now protected site to the recovery.

I have attached the logfiles. I know im missing something in the restore process but at the moment i'm not sure what.

Any help would be appreciated.

Thanks

DC

0 Kudos
pauljawood
Enthusiast
Enthusiast

Hi,

I have just pulled down the logs and will get back to you later today.

-


If you found this helpful then please leave some points.

If you found this helpful then please leave some points.
0 Kudos
pauljawood
Enthusiast
Enthusiast

Hi,

Just to check a few things: (to confirm I fully understand)

1. The present state is a failed over situation with your production running on your recovery site

2. A new host, vc, srm have been installed fresh with new empty database for srm

3. The pairing of the sites is being setup from the Recovery site to the Production site

If the above works then carry on.

4. The shadow/stub files have been deleted from the Recovery site (these would be either on local or SAN storage and created when you setup the protection group from the old Production environment to the Recovery site)

5. On the Production site if you have the vCenter DB restored you will need to remove from the Inventory listing the vm's that you will be failing back

6. The storage has been connected and the replication has been reversed so it is now replicating from the Recovery site to the Production

7. The Inventory mapping has been setup on the Recovery site

8. The Array management has been setup on the Recovery site

9. The recovery plan has been setup on the Protected site (This will now create entries on Production Inventory)

If everything above has worked then you should be in a situation where you can run 'Test' failovers before running a proper failover (failback)

The log file you sent over seems that the Production SRM site is failing to connect or discover the vCenter or the SRM on your Recovery site. This is why I would like you to try and pair in reverse.

Please can you also clarify that you are using vSphere U1 (patched) and the latest version of SRM (if you could confirm build versions this would also help with clarification)

-


If you found this helpful then please leave some points.

If you found this helpful then please leave some points.
0 Kudos
dcoz
Hot Shot
Hot Shot

Hi Paul,

Thanks for the replies.

Just to answer your questions:

  1. The present state is a failed over situation with your production running on your recovery site
    Yes i have failed over to the recovery side and am running at the recovery side

  2. A new host, vc, srm have been installed fresh with new empty database for srm
    A new ESX has been created, and virtual center restored. For SRM i created a new database and installed SRM as if it were a new installation

  3. The pairing of the sites is being setup from the Recovery site to the Production site
    Yes i am trying to establish reciprocity from the recovery side (side im currently running from)

  4. The shadow/stub files have been deleted from the Recovery site (these would be either on local or SAN storage and created when you setup the protection group from the old Production environment to the Recovery site)
    yes above completed

  5. On the Production site if you have the vCenter DB restored you will need to remove from the Inventory listing the vm's that you will be failing back
    All previous vm inventory objects removed

  6. The storage has been connected and the replication has been reversed so it is now replicating from the Recovery site to the Production
    This bit i haven't done so i will get this running and try again

I like to test as many scenarios as possible with SRM to get a better understanding of how to resolve issues

again i appreciate the help on this.

cheers

DC

0 Kudos