VMware Cloud Community
Gaddy
Contributor
Contributor
Jump to solution

VMware site disaster recovery

We have two fully (almost) redundant data centers, bladecenters, SANs, CX3-40, ...

What I'm looking for ist to implement a failover scenario if one of the sites failes completely. All data (VMFS-LUNs, real data LUNs, etc.) are mirrored (synchronous) between the sites.

While it is very is to move one VM from site A to B using VMotion, it is not as easy to get the WHOLE thing running on B if A fails completely.

All the EMC stuff could be done with scripting, but this is getting more and more impossible, if the number of VMs and Applications will increase rapidly.

So the storage part of the failover should be as transparent as possible and I'm looking for some kind of SAN virtualization, where the availabilty of one of the mirrored disk pools becomes transperent for the OS.

Has anybody done things like this? Possible? Secure/Stable? Supported?

< Thread http://www.vmware.com/community/thread.jspa?messageID=514046 has pointed in this direction, but seems to be a bit outdated ... >

Any hints are welcome. Thx -Gaddy-

Reply
0 Kudos
1 Solution

Accepted Solutions
Mike_Fink
Enthusiast
Enthusiast
Jump to solution

The quick answer you your question is that No, there is no easy way to VMotion between disk arrays (which is what it sounds like to me you are asking). There are ways to do it, but none of them are going to be "supported" by VMware in the way it seems you are asking.

As you pointed out, it is very simple to VMotion between data centers, the problem is that the disks all still reside in the primary data centers, while all the VMs now run in the secondary data center. That is not very helpful because the systems will still go down if the primary data center fails.

To directly answer your questions:

Possible: Yes, but likely to be incredibly expensive and complex/difficult

Secure: Yes; you would almost definately need FC connectivity between sites/dark fiber. As long as you follow normal security measures, I don't see any problem.

Stable: Probably not. Again, VMotion from Site A to Site B, no problem. VMotion + SAN failover? A whole different story.

Supported: Probably not.

Now, all this said, there are ways to do this. However, I would caution that you better have a VERY good reason to go down this path. You are thinking down the right path, you need storage virtualization that presents the SAME LUNs on both sides of the storage link at the same time. There are ways to do it; and I will be happy to go into more detail, but suffice to say, it's not going to be easy. Smiley Happy

A MUCH more typical configuration is to have full time mirroring from one SAN to the other; if you have site 1 fail you break the mirrors and reboot all the VMs at site 2. Minimal downtime, much more stable/supported, and will cost far less. Most applications can deal with 10-15 minutes of downtime during a full data center failure; in my experience. For those applications that cannot; even VMotion between disk arrays and sites will not help you. You need a stretched cluster or another application level technology to ensure 100% uptime between 2 sites; VMware/HA/VMotion is never going to provide that kind of uptime (because, no matter what you do, you still will have to reboot the systems in the other site after a primary site failure).

View solution in original post

Reply
0 Kudos
9 Replies
Cloneranger
Hot Shot
Hot Shot
Jump to solution

Our DR solution is complete set of cold clones of all of our production server's C: drives.

Our SANs are also mirrored so that in the event of a total site A failure, - Fire/Flood/Terrorist Attack,

We would simply verify the integrity of the 2nd SANs LUNs, RDM them to my clones, and power them up.

I guess it kind of depends on the apps you run, but MS SQL server and Exchange are happy to run like this.

The only thing we keep live on the DR site is a third domain controller.

Reply
0 Kudos
esiebert7625
Immortal
Immortal
Jump to solution

Lots of good DR links that might help you out...

Vmware users explore disaster recover options - http://searchservervirtualization.techtarget.com/originalContent/0,289142,sid94_gci1253386,00.html

Vmware ESX Server and Storage Architecture Best Practices for Performance, Backup, and Disaster Recovery - http://download3.vmware.com/vmworld/2006/adc9591.pdf

Using Disaster Recovery and Business Continuity Planning to Drive Virtualization in the Production Data Center - http://download3.vmware.com/vmworld/2006/adc9732.pdf

An Aggressive Approach Using P2V to Address Disaster Recovery and Business Continuity Planning - http://download3.vmware.com/vmworld/2006/adc9938.pdf

How Management Learned to Stop Worrying and Love Virtualization: A Brush with Disaster Leads to a Virtualization-Based Disaster Recovery Plan at the Las Vegas Valley Water District - http://download3.vmware.com/vmworld/2006/bct0046.pdf

Leveraging VMware ESX Server in Disaster Recovery Solutions - http://download3.vmware.com/vmworld/2006/bct5070.pdf

Implementing Effective Backup Strategies For Disaster Recovery - http://download3.vmware.com/vmworld/2006/bct9502.pdf

VMware Infrastructure 3 Capabilities for Improving Disaster Recovery - http://download3.vmware.com/vmworld/2006/bct9552.pdf

Using Virtual Infrastructure as a High Availability Platform for Physical Production Servers - http://download3.vmware.com/vmworld/2006/bct9560.pdf

RepliStor: Disaster Recovery and Data Migration Solution for VMware Environments - http://download3.vmware.com/vmworld/2006/bct9636.pdf

VMware ESX Server as a Foundation for High Availability and Disaster Recovery for the Microsoft Server Platform - http://download3.vmware.com/vmworld/2006/bct0107.pdf

Migrating Server Operations from Remote Sites to the Data Center for Disaster Recovery and Protection -

http://download3.vmware.com/vmworld/2006/bct0893.pdf

Innovative Approaches for High Availability and Disaster Recovery in Your VMware Infrastructure Environment - http://download3.vmware.com/vmworld/2006/bct9708.pdf

Vmware Consolidated Backup for Disaster Recovery - http://download3.vmware.com/vmworld/2006/labs2006/vmworld.06.lab01-VCB-PRESENTATION.pdf

HA/DR of Physical and Virtual Environments Using VMware ESX Server and Double-Take for Virtual Systems - http://download3.vmware.com/vmworld/2006/bct9468.pdf

Platespin P2V DR - http://www.platespin.com/p2vdr/

Double-Take for virtual systems - http://www.doubletake.com/products/virtual-systems/default.aspx

Double-Take Replication in the VMware Environment - http://www.vmware.com/pdf/vmware_doubletake.pdf

Replistor - http://software.emc.com/products/software_az/replistor.htm

ESX 3 Disaster Recovery site options - http://www.vmware.com/community/thread.jspa?messageID=473674

Disaster Recovery Plans - http://www.vmware.com/community/thread.jspa?messageID=517701

Disaster recovery site VM Startup - http://www.vmware.com/community/thread.jspa?messageID=514046

Disaster Recovery for Vms - http://www.vmware.com/community/thread.jspa?messageID=502276

Mike_Fink
Enthusiast
Enthusiast
Jump to solution

The quick answer you your question is that No, there is no easy way to VMotion between disk arrays (which is what it sounds like to me you are asking). There are ways to do it, but none of them are going to be "supported" by VMware in the way it seems you are asking.

As you pointed out, it is very simple to VMotion between data centers, the problem is that the disks all still reside in the primary data centers, while all the VMs now run in the secondary data center. That is not very helpful because the systems will still go down if the primary data center fails.

To directly answer your questions:

Possible: Yes, but likely to be incredibly expensive and complex/difficult

Secure: Yes; you would almost definately need FC connectivity between sites/dark fiber. As long as you follow normal security measures, I don't see any problem.

Stable: Probably not. Again, VMotion from Site A to Site B, no problem. VMotion + SAN failover? A whole different story.

Supported: Probably not.

Now, all this said, there are ways to do this. However, I would caution that you better have a VERY good reason to go down this path. You are thinking down the right path, you need storage virtualization that presents the SAME LUNs on both sides of the storage link at the same time. There are ways to do it; and I will be happy to go into more detail, but suffice to say, it's not going to be easy. Smiley Happy

A MUCH more typical configuration is to have full time mirroring from one SAN to the other; if you have site 1 fail you break the mirrors and reboot all the VMs at site 2. Minimal downtime, much more stable/supported, and will cost far less. Most applications can deal with 10-15 minutes of downtime during a full data center failure; in my experience. For those applications that cannot; even VMotion between disk arrays and sites will not help you. You need a stretched cluster or another application level technology to ensure 100% uptime between 2 sites; VMware/HA/VMotion is never going to provide that kind of uptime (because, no matter what you do, you still will have to reboot the systems in the other site after a primary site failure).

Reply
0 Kudos
Gaddy
Contributor
Contributor
Jump to solution

@Mike

A MUCH more typical configuration is to have full

time mirroring from one SAN to the other; if you have

site 1 fail you break the mirrors and reboot all the

VMs at site 2. Minimal downtime, much more

stable/supported, and will cost far less.

This is already in place and doing well

(no, not really, because getting the VMs rebooted is

the "tricky part" with scripting all the EMC stuff).

Operation & Mgmt has asked for an easier (?)

(no scripting, no human intervention, lights out) solution.

$$$ is not the "primary" focus, because we are talking about banking/insurance apps here. Sure $$$ is definitly the "second primary" focus Smiley Happy

So from my POV SANvirtualization (we do a little bit StorAge and Falconstor testing here) would be very "elegant" (if not "smart").

But I've not seen anyone doing it. So your answer is rather what I've expected: Could be done, but to "sophisticated".

... you need storage virtualization that presents the SAME LUNs

on both sides of the storage link at the same time.

There are ways to do it; and I will be happy to go into more detail ...

Would you please lift the quilt a bit?

Thanks for your reply -sg-

Reply
0 Kudos
Mike_Fink
Enthusiast
Enthusiast
Jump to solution

To lift the quilt a bit:

Take a look at this product from NetApp (other companies offer similar technology; however, it will depend soley on the SAN vendor to provide support for this kind of technology) called MetroCluster:

http://www.netapp.com/products/enterprise-software/data-protection-software/high-availability/metroc...

Basically, this product stretches the distance between SAN controllers while at the same time mirroring all the data from one SAN controller head to another. In the event of a hardware failure the failover between heads and sites is seamless, permitting a SAN failure without any interruption of processing (well, assuming only one side fails, which, in this case, is a VERY safe assumption to make). The reason is that the LUNs will always be presented to VMware from both sides of the array, VMware is aware that it has additional paths to the LUN; but is blisfully unaware that the LUNs are located miles apart. In this case, you could have a failure in almost any portion of your infrastructure without any processing interruption. To truly provide maximum uptime, I would suggest that you setup a cluster between the sites (Microsoft Cluster, for example) and provide not only disk level redundancy, but also OS/app level redundancy. In this kind of environment; especially with clusters between the sites, you could provide a "no touch" failover from one site to the other with minimal (<1000MS for the clusters, <5 mins for unclustered systems) downtime.

You can do similar things with storage virtualization switches; put a SAN at both sides and have the switch present LUNs back to VMware. The switch will then handle the mirroring between the sites; if either site goes down the failover is seamless between the LUNs (as, once again, VMware is blisfully unaware of any mirroring going on at the storage level; it just sees more paths to the same LUNs with no visiblity into the fact that they are seperated by distance).

Hope this helps!

SnowCanada
Contributor
Contributor
Jump to solution

The problem with the NetApps solution is that it locks you in to NetApps storage. If you want to look more deeply at FalconStor solutions you will find that site to site continuous data replication or mirroring can be done and even across mismatched hardware. I mean why put high priced high performance FC storage at a DR site, stick with lower cost SATA. FalconStor let's you go agnostic on the storage hardware side. Adding to the CDR you could implement periodic snapshotting to get accurate data pointers as well. Just in case you had corruption that crept up slowly then replicated, the SnapShot give you known good points in time to rollback your storage systems.

The FalconStor CDP Virtual Appliance for VMWare is now certified and downloadable. The SnapShot director for ESX is also available to tie up te loose ends.

Craig

Reply
0 Kudos
Mike_Fink
Enthusiast
Enthusiast
Jump to solution

I don't want to make this into a SAN religious war, but I do want to clear up one misconception about NTAP you mentioned below.

NTAP can replicate between FC and SATA disks (and between iSCSI and FCP connectivity) with no problem. However, as stated by SnowCanada, to use SnapMirror, NTAP storage is required on both sides of the link. However, it is very common to see the primary side NTAP with all FC disk replicating to the DR side NTAP with all SATA diks.

Reply
0 Kudos
Gaddy
Contributor
Contributor
Jump to solution

@Craig

If you want to look

more deeply at FalconStor solutions you will find

that site to site continuous data replication or

mirroring can be done and even across mismatched

hardware.

Craig, IPstore is on my radar as well as SVC, StorAge, DataCore and the like ...

Although they are technically "interesting" none have been "CERTIFIED" by VMware, which is a huge? constraint for some commercial sites.

I mean why put high priced high performance

FC storage at a DR site, stick with lower cost SATA.

1st: IPstore doesn't come for free Smiley Happy

2nd: I'm not quite sure if it's a good idea to run some apps (except for file services) from SATA, even in the case of DR (and mixing disks can be done in almost all relevant arrays)

3rd: but YES, I'd go for SAN virtualization if it will make things easier (and more favourable)

4th: NO, I wouldn't swap out CX for NetApps just to receive DR ability (who would?)

Craig, do you run an IPstore + VMware environment currently?

Can you share your experiences?

The FalconStor CDP Virtual Appliance for VMWare is now certified

and downloadable.

The SnapShot director for ESX is also available to tie up te loose ends.

How would CDP (data protection) fit directly into DR site protection (my goal was to failover completely transparent and not to loose any I/O with sync mirror).

Thanks for your comments -sg-

Reply
0 Kudos
James-D
Contributor
Contributor
Jump to solution

Gaddy,

I have been looking over VM Comms for people who are using VMWare with IPStore. I came accross your posts from last year. DId you take the plunge with IPStore and vmware? If you did i would love to find out how if it works as i am looking at implementing it here.

James

Reply
0 Kudos