VMware Cloud Community
jrj8541
Contributor
Contributor

Help with DR test

We are headed to our DR center for a test in the next few months. In our environment we have Active Directory virtualized running on ESX 3.5 servers. We also have a colocated ESX box at our DR center that is housing a domain controller from each of our domains (7 domains in total).

When I set this colocated box up, I set it up for a true disaster, assuming we would march in and immediately begin working off those domain controllers and seize FISMO roles. I (admittedly stupid) did not assume we would randomly begin real DR tests (we didn't previously). Now I am in a bit of a pickle. We will need to make use of the domain controllers on site and not rely on our company HQ site for anything. However, because these domain controllers need to be current when the DR test is over we will reconnect the link to compant HQ and begin replication again. I now have to seriously rethink some strategy. Here are my ideas so far..

1) Copy hosts to a second ESX server and run DR test off that ESX box. When DR is over, simply delete hosts. Reconnect original ESX box to houston site immediately after copy and break link between DR site and that box.

Pros: no cleanup afterwards

Cons: will require initial setup time to load ESX box and copy data (~150GB). ESX configuration will be required (bad news as I am the sole person with any sort of ESX knowledge)

2) Break connection to Houston and use current domain controllers for disaster recovery test

Pros: Super fast: AD will be up and running for DR test in less than 30 minutes once on site.

Cons: Dirty AD. Once DR test is finished I will need to orphan the colocated domain controllers and rebuild them from scratch.

So, those seem like my best 2 options I suppose. If you have a third I would certainly appreciate it. I don't want to orphan servers but I also don't want to wait around on data to copy and hold other people up in their tests. The colocated box is running on internal storage (I tried but no budget for external) so any cool technology like iSCSI snapshots and such are a no go. Also, the ESX box I would have to load in scenario 1 is rented equipment from HP so I can't require them to have any specific hardware, only amounts of hardware under our contract.

Any help would be greatly appreciated. If you can't think of another scenario, a response as to which scenario you would pick and why is also helpful to me.

Regards, James

0 Kudos
4 Replies
Rodos
Expert
Expert

I am no windows expert (they are two partitions over), hopefully you are.

Is a third option to do a snapshot on them. Isolate them from the production controllers and initiate a snapshot. When you have completed the DR testing revert to the snapshot hence rolling them back in time.

The problem here would be if the production domain controllers did not like the other ones going back in time. As long as you kept them isolated for the entire duration it should behave like they were just unavailable for the duration.

Maybe a windows person here has tested this or you could do some testing yourself. Post back and I can throw the question over the partitions if needed.

Rodos {size:10px}{color:gray}Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/{color}{size}
0 Kudos
Rodos
Expert
Expert

I just confirmed with the windows guys. Works fine, we have used this technique for lab testing plus DR tests. You now have three options.

Considering awarding points if this is of use

Rodos {size:10px}{color:gray}Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/{color}{size}
0 Kudos
ejward
Expert
Expert

Sounds like the situation I have. We have no real DR solution ... only tests. What we had to do was to seperate out DR test from a real disaster. They're two different things. For us, in a real disaster, we plug back into a remote location for AD. In a disaster test, plugging into AD would be a disaster in itself.

We have an ESX server at DR with "Test" DC's on them. Our production DC's are physical. Just before our yearly test, we refresh them. This essentially involves doing a P2V of our production DC's. We have 6. We bought Platspin Power Convert to do scheduled P2V's daily but we realized that we were only going to replicate them once a year so that was overkill.

We've tried snaphotting DC's in our lab and have very mixed results. Unless you snapshot them all at the exact same time, when you revert back, you'll be doing a lot of forced replication and re-syncing. If you buy lab manager, you can snapshot them all at once and theoretically, they won't be out of synce when you revert them all back.

0 Kudos
mike_laspina
Champion
Champion

I don't see an issue. You should perform this like a real DR test.

1 Break the network link to DR

2 Shut down all the DC's at DR and snapshot them. (must be down to ensure ESE integrity)

3 Bring em up.

4 Do the tests

5 Bring em down

6 Revert to snapshot

7 Bring em up

8 Reconnect to the production network.

The important thing is to make sure that the DC's can not talk to production until you have reverted to the snapshots.

http://blog.laspina.ca/ vExpert 2009
0 Kudos