VMware Cloud Community
jacc
Contributor
Contributor

Disaster Recovery Site with SAN replication almost done. Need suggestions

Hi Friends, this is my escenario.

I have 2 sites. This sites are connected in the same subnet. subnet A for service console and subnet B for VM. Each site has one storage (IBM-DS8000) and they are replicating with metro mirror (both storage has the same information all the time).

In the site A I have 4 ESX in a cluster (for local DRS and HA) with 20 VM and in the site B have 4 ESX in another cluster, in this cluster I have created 20 VM without disks and later add the respective replicated disk to each vm. Each VM has his own DataStore, so I have 20 DS and some VM has RDM disks. I have 1 virtual center server for each site.

I have to test the remote site, so the process to move each VM to site B will be manual.

  1. Shutdown guest OS

  2. stop replication from site a to site B and begin replication from site B to site A.

  3. Power ON VM in site B.

The problem: between step 2 and 3, I have to make a rescan of all HBAs in the remote site and later a refresh of datastores. This operation take 2 minutes for ESX, so it take al least 10 minutes to take ready the data stores in the remote site. This operation is necesary because the target disk in the replication always have read only permisions.

The goal: Minimize the time to switch to the remote site. (actually I have MSCS with geographicaly dispersed nodes on physical servers and the time to swich is less than 1 minute), besides, I need scripts to automate the process.

If someone know suggestions or another eschema to disaster recovery site without third party applications, please tell me.

thanks in advance

jacc

0 Kudos
8 Replies
happyhammer
Hot Shot
Hot Shot

Take a look at VMware Site Recovery Manager (SRM) which is due for release soon, if your storage gets certified the whole process is a single click(+ confirmation) from within Virtual Centre plug in, it also has a test facility to test DR in an isolated network

0 Kudos
jacc
Contributor
Contributor

Thanks for your answer, but I need an inmediate solution, As I say, I have the disaster recovery site almost done. It works, but so manually and the switch take a lot of time.

0 Kudos
EcioBNI
Contributor
Contributor

Yesterday I read in some vmware documentation that you can disable the support of old VMFS-2 (of course if you dont use it) and that this could speed up datastore (re)scan.

You could do it manually (in order to verify if it's useful) using the command

vmkload_mod -u vmfs2

"You should see a significant increase in the speed of certain management operations, such as refreshing datastores and rescanning storage adapter"

Cited from "vi3_san_design_deploy.pdf"

I think you can make it permanent by editing /etc/init.d/vmware

NOTE: I've never tried it so use it at your own risk!!

0 Kudos
jacc
Contributor
Contributor

I am goint to test it.

About the disaster recovery implementation. Any suggestion? there is a better way to make a disaster recovery site with vmware in both sites?

Thaks again.

0 Kudos
jacc
Contributor
Contributor

Ecio, do you know how to reverse this commnad if it does not work?

Besides, I think that vmware make a rescan of hbas every X time. Do you know any configuration to disable this automatic rescan?

0 Kudos
EcioBNI
Contributor
Contributor

jacc, i have no access to an ESX server, but i think that "-u" stands for "unload" so maybe "vmkload_mod vmfs2" could work to reload it...ofcourse i reboot of the machine will fix the situation too

you can also try the command without parameters to see other options...

i dont know how to disable the automatic scan you're talkin about Smiley Sad

0 Kudos
rpartmann
Hot Shot
Hot Shot

Hi,

if your LUNs have low id´s you could limit the maximum LunNumber to be scanned with

esxcfg-advcfg -s 50 /Disk/MaxLUN # (default == 255)

esxcfg-rescan

hth,

Reinhard.

ps: Award points if you find answers helpful. Thanks.

ps: Award points if you find answers helpful. Thanks.
0 Kudos
mreferre
Champion
Champion

The problem is that the scenario you are picturing is generally referred to as a DR scenario not a Continuos Availability scenario. 100% of the customers I work with are fine with a recovery windows of hours ..... 10 minutes would be a dream for them.

This might be another viable solution ....... (see my post at the end): http://communities.vmware.com/message/840977 .... but obviously it has some costraints in terms of connectivity etc etc.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info