VMware Cloud Community
ssloh
Contributor
Contributor

VMware EMC SRDF cross site-standby

Hi folks,

Trying to draft our a DR solution for VMware, hope to get some clue here.

Environment Setup

Site 1

ESX server1 with vm1 and vm2 virtual machine running on LUN1 (EMC box1)

Site 2

ESX Server 2 with vm3 and vm4 virtual machine running on LUN2 (EMC box2)

Site1 LUN1 is replicated to Site2 using SRDF

Site2 LUN2 is replicated to Site1 using SRDF

Questions

1. Solution for VHA ESX1 to ESX2 if ESX1 failed ?

2. Solution for VMotion vm1 from ESX1 to ESX2 ?

3. Solution if Site1 failed, will Site2 overtake all vm and up automatically ?

Thanks.

0 Kudos
5 Replies
nzsteve
Enthusiast
Enthusiast

If I understand correctly youre looking to use vmotion and HA accross different sites? unless you have great network between them then probably not the way to do it.

If your config is bigger than 2 ESX servers and 4 VMs youve oulined (I'm guessing you've simplified your config), you should look at the upcoming site recovery manager. http://www.vmware.com/vmworldnews/srm.html.

steve

0 Kudos
kjb007
Immortal
Immortal

With your setup:

I'm assuming that both hosts are using the same VC.

Also, as I understand, the replicated SRDF LUN is not read-write, but read-only, so you will need intervention to mark the alternate site copy as read-write.

1. HA requires good network connectivity between the two sites as HA keeps constant communication between the nodes in the custer. If you lose connectivity, the node which can not communicate with the other nodes will by default shut down its vm's, which in your case would be both ESX hosts, and all your VM's will be shut down waiting for remaining hosts to pick them up. In this scenario, you will have to use alternate addresses for HA to ping to verify you're still up and running, but then you still have not overcome the shared storeage read-write problem.

2. VMotion requires gigabit network, so unless you have an OC-24 or better, then you will not be able to successfully VMotion, which means DRS is also out of the picture. If you do have a fast enough pipe, then as Steve mentioned, your network will also have to be similar, or else named similarly, which may require you to setup NAT'ing and or RE-ip'ing.

3. You can still use your scenario, but without Site Recovery manager, it will not be automated. You can have individual ESX hosts in both sites, and have those LUNs available to the standby nodes. Those LUNs should be made available in read-write upon a recovery scenario, which the ESX hosts can rescan and find in that situation. At that point, you will have to have a script or some other means to register the VMs with the new ESX host and change networking as required.

Pretty much all of the manual tasks in step 3 is what Site Recovery Manager does in an automated fashion.

Hope that helps.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
dtux101
Enthusiast
Enthusiast

So if I understand, ESX1 has LUN1 in R1 mode and LUN2 in R2 mode and ESX2 sees LUN1 in R2 mode and LUN2 in R1 mode?

The easiest solution to all your questions below (except the VMotion question) is to have a VC server at each site, and have the ESX servers at each site have the value LVM.DisallowSnapshotLUN set to 0. Makle sure that LVM.DisallowSnapshoLUN is also set to 0 on each host.

In this manner, the ESX at the DR site will be able to mount the VMFS on the failed-over device (formerly R2) and all you will need to do isregister the VM again. For failing back to the primary site, the process will be identical. This is essentially what SRM will do (when it appears).

Let me know if you require any help with this and I'll see what I can do.

Also, I saw a fine presentation from VMworld Europe concerning 'cross-country' VMotion - it would be worth a look if you need this feature.

HTH

0 Kudos
ctfoster
Expert
Expert

Using SRDF in these circumstances will create an R2 that is 'read only' As part of the failover you'll either have suspend the SRDF association and make the R2 writable or 'reverse' the R1 and R2.

Remember if you are working with a DR scenario you have to assume the R1 site is lost so some actions like the reverse will not be possible. I assume you are planning to use SYMCLI to script the failover and failback actions ?

0 Kudos
bernworx
Enthusiast
Enthusiast

Hi

We have a requirement same as this, We have four exisiting VMware ESX servers, two servers in each site. Our DR requirements are for the VM server/s to be able to failover during and failback after a disaster. We have an existing SRDF facility which we are currently using in physical servers and a DMX storage that has mirrorring. We tried doing the existing procedures of failover using virtual machines in VMware but we encountered errors during the testings. Even with the storage in a "split-mirroring" status, the error states that the other VM attached to the storage is inaccessible. We tried changing the default value of "0" into "1" in the hosts' LVM (enable resignature) settings but still the testings were not successful. What we have now in DR site was a snapshot of the storage, tried to browse the datastore, but was no VM folder existed.

Is it possible to have only one VC server for this setup? We have existing VC at R1 site.. please advise.

thanks.

0 Kudos