We have SRM up and fully functioning and so far the few tests we have done have gone wonderfully. I was wondering if anyone, during their testing, came across a doc or whitepaper with different testing scenarios? We are wanting to document and test different recovery plans including various configurations and storage parameters of virtual disks attached to the VMs and recovery times.
Just wondering if anyone has or has seen such a document?
RROCK and rest of community we are almost in a similar boat and I didn't want to post a seperate question so here is mine along with RRock
My question is more how do you validate that a SRM test was successful? Just because I ran test and it shows me that the VM's have come online at the remote site in the 'bubble' doesn't suffice for me or my CIO.
We want to be able to test access inside and from outside the test bubble.
So for example, I will test bringing online our AD server, a File Server, a SQL server with a specific application, and our Exchange Server. After they are online I wish to perform the following
1) Bring online inside the test bubble a Windows XP workstation that will allow me to to verify AD authentication, file service access, SQL application access and sending an e-mail to myself.
2) Bring online inside the test bubble a Terminal Server access Server configured to allow External traffic into the bubble.
3) From a XP workstation on my existing network I want to log into the bubble network and perform all the tasks I performed in step 1
4) From a remote site connect via our ISP connection into the TSE server inside the bubble and run similar test
I understand the Firewall and switch implications here but want to hear back if you feel this is proper way to validate the SRM test?
For those of you who have conducted a successful SRM test what has been your validation methodologies?
When I talk to customers about SRM / DR / Networking I always position the bubble network for what it essentially is, a safety valve. As you have correctly stated the bubble network is very secure, so secure in fact that you cannot get traffic in or out so what do I mean by safety valve? imagine the alternative. If there was no "auto"aka "bubble" network selected as the default for SRM tests we would expose customer to this risk of accidentally running SRM tests that they had possibly misconfigured which could have resulted in them incorrectly letting VM's connect to production subnets during a test.
The bubble network is therefore the safety net so that even if a customer creates a recovery plan and runs it in test mode without having ever read any of the documentation the VM's in the plan will by default be safely brought online inside your isolated bubble network.
The reality of real world testing is that the bubble network will only get you so far. another scenario, you want to run an end to end test but the rdbms tier of your application is not in a VM its on say a large AIX box, you cannot access that rdbms via the bubble network and you cannot put the rdbms inside the bubble as its not a virtual machine, your going to need an actual test/isolated set of vlans creating that you can not only connect your test VM's to but also other platforms (mainframe/unix etc). The whole concept of DR networking / test networking is not really an SRM discussion its an architecture decision that customers make and the same decisions would have to be made were you using physical machines or virtual machines, it comes up in SRM converation a LOT though so what kind of questions are customers revisiting:
- whats our strategy for failover? change ip addresses / keep same address?
- do we stretch layer2 vlan?
- do we failover layer 3?
- how do we update DNS if we make changes? is that scripted?
- do we need a secure test set of vlans created at the test site that can be used for DR testing? do these exist?
- if we do re-ip do we have an existing mechanism/solution/set of scripts/dhcp reservations that do this for us (if you do then these same techniques will almost certainly work unaltered against your VM's)
- does SRM offer any other options for re-ip if we don't have an existing mechanism (yes it does!)
once you can answer the above you then will have a better idea of what your solution should look like. customers i am working with now that are deploying SRM and have answered this usually then come back with a simple addition to their network setup. they create the required vlans at the network layer, ensure they have the correct routing / ACL's set etc and then they present these down the same trunk ports that the recovery site ESX hosts are using. once this is done they can then create portgroups for these new test vlans and finally in their SRM recovery plans they now select these "test" portgroups as the ones to use for the test network rather than the "Auto" default.
hope that helps.
Yes Smoggy's response was excellent.
So how many of you have "extended down the trunk" to allow a more thorough testing of you Disaster Recovery testing. Any other methods of validation?
I'm scheduled to perform a test on the 19th of December. We are doing just this to try to really capture that we have a valid solution in place.
David Abowitt, VCP