First SRM test

First SRM test

Another Saturday given up (partially at least) in order to test SRM with the various Protection Groups and Recovery Plans. This was mainly a confidence building exercise for ourselves and our management. A good clean test puts us in a good place for doing ad-hoc tests of plans as we continue to virtualise physical systems. It also proves that I've configured the recovery parameters of each VM correctly and that the plans don't just fall over.

So we started off moving a Domain Controller into the test bubble. This test failed almost immediately which was somewhat unexpected and disappointing. My manager then turned up so seeing all that red stuff in the Recovery plans steps wasn't all that promising. I worked out fairly quickly from the error message that the hosts at the Recovery Site could not see the required datastore. We use Recoverpoint to replicate the data and generally you don't present the destination LUNs to the destination Storage Group, this however prevented SRM from making the storage properly available. A 2 minute config change on the CX4-480 and we ran the test again. Success! There's something a bit eery about seeing Recoverpoint make changes to the Consistency Groups apparently by itself, then vSwitches appear as if by magic, VMs appear and start up and hey presto, inside 5 minutes we have our Domain Controller running in its own "bubble". We then ran a cleanup to make sure that functionality would work, it was if anything even more impressive. In under a minute all had been returned to normal.

For the second test I added one of the business server protection groups to the test Recovery Plan and we ran through again. Slightly longer due to starting more VMs but again successful. This gave us the opportunity to attempt logins on the business server. Our failure to login highlighted that a single Domain Controller with no FSMO roles is not much use, so after a bit of work in NTDSUTIL we were logging in and this test was confirmed as successful.

Tests continued in this fashion to ensure each plan worked and each VM received the correct IP address and started without issues. Each was successful until the very last test. This failed due to there being no compatible host for the most important VM (typical!!). It was immediately apparent that the issue is the difference in licencing in production compared to DR.

In order to host this particular VM we had to upgrade the prod cluster to Enterprise+ as we are required by the business to present 8 vCPUs. Since our DR site is only Enterprise this machine would not start. A little experimenting with Powercli produced a script which could change the number of vCPUs but here I ran into a problem with the placement of the script in the Recovery Plan. As an individual step in the RP it was either too early so the VM had not yet been registered or tool late as the startup attempt had already occurred. Delving into the properties of the Recovery Config for the VM I could add the script as a Pre Power on step, but again this was not correctly placed as it runs just before the main power on but AFTER the inital startup for IP reconfig. Therefore once again the Plan failed for this machine.

All is not lost, we can buy E+ licences or we can document that this server will come over from Prod and be manually reconfigured. We could run Powercli to reconfigure the vCPUs, start it and inject the correct IP config into the running VM. There are options and we are aware of the issue.

Whichever option we choose we will be able to recover this VM and we can move forward with SRM happy in the knowledge that running tests like these does not interfere with our Prod environment at all and leaves the DR environment in a clean state when finished.

What a superb piece of software! Even more so when combined with Recoverpoint....

Anyway, onwards, preparing for the commissioning of our Celerra in London and then a Commvault installation utilising VNX-5100s as Disk Libraries. Never a dull moment!

Version history
Revision #:
1 of 1
Last update:
‎10-22-2012 07:09 AM
Updated by: