Zero downtime with a site failure

So I have a situation where we'd like to have zero downtime between sites.  We currently have a production site and DR site and use SRM to recover in the event of a disaster, but now management would like to look a the options for having zero downtime in the event of a failure at the production site.

This isn't possible with SRM, so I've been looking at other options.  It seems it might be possible with Fault Tolerance now with VSphere 6 since we don't need shared storage and have 1Gb connection between sites that could be used just for this.  I was also wondering if HA might work - we have an EMC SAN and have a meeting set with them to learn more about VPlex which looks like it would allow us to have shared storage across sites..  My thinking is if we flattened the network (which we'd do for FT as well) that we could have a single cluster across the sites running HA?

Any thoughts on this or suggestions?  We're in the early stages of trying to figure out what our options are.


1 Reply
VMware Employee
VMware Employee

FT isn't a solution for cross-site protection as that currently isn't supported.

If you're looking for zero downtime HA isn't an option either as that technology will restart your VMs if a host fails. If used in conjunction with vMSC (vSphere Metro Storage Clusters - stretched storage like for example EMC VPLEX) you can get cross-site recovery with HA (hosts at site A fail, VMs recover at site B) however this would not be zero-downtime (as the VMs will still have to boot at the recovery site, and the bias of the stretched cluster may have to be changed as well).

This does provide zero-downtime disaster avoidance through the ability to vMotion VMs across sites, but this is not zero-downtime DR. Also note that comparing SRM and vMSC, vMSC doesn't provide the ability to non-disruptively test and it also doesn't provide the ability to orchestrate recovery.

Zero downtime (also referred to as zero RTO) DR would likely have to be accomplished at the application and DB level and I'd guess would be very expensive.

Does this answer your questions?