Re: AD behavior in DR with different IP segment

epaz1000 · ‎02-28-2016

Hi all

I got source site with VMware and VMs. I got VMs with AD and Windows servers. I want to replicate all my servers to DR site in deferent location that got deferent segment from the source. If I replicate a DC that mean in Fail over I will change it IP in target site and it may cause unexpected results and problems. In a scenario I keep a secondary DC in target and use DC replication as most do, if I do test fail over by join it to bubble network with all failed over server and they may change an attribute (for an unknown reason) and then I UNDO Fail Over and connect the DC back to production. It will replicate back the attribute change to source prod environment and it may cause issues I want to avoid.
How do you think I should handle this? How does other organizations handle this in manual test DR where they need the target secondary DC connected to DR? Does they take the chance?

Thanks

ThompsG · ‎02-28-2016

Hi epaz1000,

Before we start the first thing to note is that failing over Domain Controllers via SRM is not recommended or supported : Site Recovery Manager 6.1 Documentation Center

I believe the reason for this statement, besides the fact that AD has its own replication technology, is that there are a multitude of replication technologies available and some have the potential of introducing inconsistencies if they are not fully understood. Given AD is naturally highly available (assuming multiple DCs) and is native within the application, it is easier and better to be done there. In our case, we have arrays that are based on synchronous block-level replication so do actually failover our DC's via SRM even though we shouldn't - we also have a stretched layer 2 network so this helps.

Getting back to your questions - changing IP's on domain controllers is not a big event as long as you have DNS resolvers configured correctly and don't island your DNS, i.e. is relying on an IP that no-longer exists to do DNS lookups against.

For "bubble testing" or test failovers - this should be used exactly as it suggests, I.e. only for testing. Your "bubble" test should not see production especially if you choose to failover Domain Controllers. The network the test failover VMs connect to should be completely isolated and guaranteed to not have a route to production - for me this is imperative as the impact to AD could be quite disastrous if this is allowed to happen. When you have finished the SRM test, the VM's are destroyed so changes made within the VM's will be lost and therefore not make it to production.

However if you do have network communication between production and the "bubble" environment then a change to an AD attribute has a high chance of being replicated back to production but this will likely be the least of your issues at this point

As we have multiple hosts at our DR site (Recovery site) and we would like the recovered guest to communicate with each other but NOT the VMs at the DR site, we have configured a blackhole VLAN (998) which doesn't route outside itself but is available to all the DR hosts. This means when we test the failover, these VMs run in isolation. Before don't this we use to just allow SRM to automatically create the "bubble" network per host which allowed us to confirm that replication was good but didn't give us a full test of inter VM communication.

Kind regards.

TBKing · ‎02-29-2016

We have AD DCs permanently living at our recovery site as well.

The past few tests we shut down the DCs and created clones to stand up during the test.

When the test was over, we destroyed the clones, along with the VMs created.

This last time around, because we have so many other systems living permanently at the recovery site (PSC, vCenter SRM, other systems) that also need AD, we didn't do anything with the DCs.

The site was isolated from production, we stood up recovered servers via SRM, while leaving the other systems online. When the test was over, the recovered systems were destroyed, link re-established and all seems well.

We did not have to make any changes within AD (password changes, add accounts/machines, change permissions) so we did not have to move/seize any FSMO roles.

AD seems pretty resilient to disconnects and isolation events.

All

AD behavior in DR with different IP segment