VMware Cloud Community
shaw22
Contributor
Contributor

Server mirroring and disaster recovery using vmware

Greetings:

I currently have 11 Physical servers that hosts 23 Virtual servers.  We are using Hypervisor but planning to move away to Vmware to achieve the following  -

(1) Server mirroring -  Setup a secondary location with a fiber link (less than a mile away) and have identical Physical/virtual servers

Both primary and secondary locations are on the same subnet.  If a server in primary location fails, Vmware should sense it and then bring up the mirror server to take over.

Currently we have Direct Attached Storage on the servers, we are exploring the SAN option

Option 1:

Primary location -  Server with DAS    /  Secondary location - identical servers with DAS.   The data partition from primary location is sync'd to data partition of the secondary location (on each servers)

Option 2:

Primary location - Servers with SAN / Secondary location - identical server  NO SAN. Secondary servers will connect to the SAN in primary location through the fiber link.  It is a shared pipe - the performance will be lower than the servers running from primary location  

Option 3:

Primary location - Servers with SAN / Secondary location - identical servers with SAN.  Primary SAN is sync'd to secondary SAN.  When a primary location server fails and secondary location server starts and run off the secondary location SAN  -  added complexity in managing the sync?.

[ Another option is - use the secondary SAN only on a total facility loss of the primary location, but for single server loss, use it as in option 2 ]

What is the Vmware product(s) that will let me achieve this goal ?

Thanks

Ken

0 Kudos
7 Replies
HeathReynolds
Enthusiast
Enthusiast

Do you have a defined Recovery Point Objective? How much data are you willing to lose during this process? If the RPO is zero you need expensive array based synchronous replication.

VMware has a product called vSphere Replication that can replicate from virtually any storage to any storage. The best RPO for this product is I believe 15 minutes, and the RPO of any sync product is going to depend on the change rate and bandwidth available.

My sometimes relevant blog on data center networking and virtualization : http://www.heathreynolds.com
0 Kudos
shaw22
Contributor
Contributor

15 to 30 minutes is acceptable.

Total capacity 24TB (18 TB in Use)

Data partitions are kept synchronized

- For flat files (microsoft office documents)

- For sharepoint and SQL server -  how does the vmware replication work ? 

- For exchange - how does the vmware replication work ?

thanks

0 Kudos
sajal1
Hot Shot
Hot Shot

Hello shaw22,

There are couple of options which you can explore.

Option 1:

In  this option I note the availability of Fiber link connectivity between sites and since both the sites are less than 1 mile so I assume the connection would be very good and ping latency would be less than 10 ms (max).

In this case you do not need identical servers to be on both the sites. What you do is create a single Cluster of ESXi hosts (hosts in primary and in secondary) all are in same cluster managed a single vCenter server. Enable HA on the cluster.

What this lets you do is to have a high availability environment. If one server goes down in the primary site loads will be distributed over other servers. This depends on the available servers or the number of the server failure the environment can sustain. If you go ahead with N+N model (equal number of servers in both sites) then if all the servers goes down in primary then all the workload will essentially be moved to secondary site.

Also to protect vCenter server I would suggest to have Heartbeat ready and configure Primary vCenter in Primary site and secondary vCenter in Secondary site.

But note that this is not a DR solution, at best this is a stretched cluster configuration which takes advantage of the case that if primary site goes down then everything will essentially move to secondary and this solution is possible only because of the assumption (fiber link and 1 mile distance)

Requirement:

In this case all you require is

1.VMware vSphere (ESXi), VMware vCenter Server, vCenter Heartbeat (optional but highly recommended to make the solution full proof)

2. You would need shared storage for this (SAN). All the servers in the primary and secondary needs to have access to the same SAN so that VM data would be available all the time.

NOTE: If the SAN is in primary site only. Then if the SAN goes down everything goes down. To solve this, ideally you should have SAN in the secondary site as well (albeit lower type) and if the SAN at primary goes down then secondary would be available with all data. So in this case there should be a data sync between the SAN.

Option 2:

You have the local DAS data in all the servers at the primary location. You configure vSphere Replication and setup a replication for the VM data from primary to secondary. In that case if a VM goes down in primary then it can be restored to the secondary site.

This is a pure replication and DR at VM level solution. Since there is no shared storage between servers then this can be taken care of.

You would need , VMware vSphere and VMware vCenter only

Option 3:

You use Site Recovery Manager to setup a proper DR solution. In this case you would servers and VM's in the primary site and they would be replicated and protected at the secondary site. If the primary site goes down then you do a failover at the secondary site.

You can use vSphere Replication for the underlying data replication or you can use SAN based replication (in this case SAN needs to be there in both the sites).

This is a proper DR solution.

You would need

VMware vSphere, VMware vCenter, Site Recovery Manager

Please note the Option 2 and Option 3 are DR solutions.

What you asked for is "if one server goes down in primary another server needs to be up at secondary", is not a DR solution that you are looking at. What you are asking can be achieved by using option 1 (stretched cluster).

In stretched cluster mode it is simple High Availability mode that we are taking advantage of. You can configure VM's affinity so that if a server in primary site goes down all the VM's running on that server would be restarted on a server which belongs to secondary site (thus achieving what you are asking for).

In any case it is always recommended to use SAN instead of local DAS simply because if you want to take full advantage of virtualization (HA, DRS, Storage DRS, etc.) you should have a shared storage. Basic idea behind virtualization is to remove dependency on the hardware. If you keep data in local storage if the server goes down all data goes down. If it is in SAN then it is available and any server can run the VM.

Copying data from local storage to the local storage of a remote site is never a viable solution (not impossible) and will never be recommended.

Hope this clears the idea. Please let me know if you need more clarification. Please go through the following documents in the meantime to get more idea.

http://www.vmware.com/files/pdf/products/vCenter/VMware-vCenter-Server-Heartbeat-Datasheet.pdf

http://www.vmware.com/files/pdf/products/SRM/VMware-vCenter-Site-Recovery-Manager-with-vSphere-Repli...

http://www.vmware.com/files/pdf/products/SRM/VMware_vCenter_Site_Recovery_Manager_5.5.pdf

0 Kudos
HeathReynolds
Enthusiast
Enthusiast

Here you go :

https://www.vmware.com/files/pdf/techpaper/Introduction-to-vSphere-Replication.pdf

My sometimes relevant blog on data center networking and virtualization : http://www.heathreynolds.com
0 Kudos
shaw22
Contributor
Contributor

Thank You Sajal1 for the detailed response. I will review and come back with more questions

0 Kudos
shaw22
Contributor
Contributor

Thank you HeathReynolds,  I will review and come back

0 Kudos
aleph0
Hot Shot
Hot Shot

VR replicarion is the way to go.

you can also save on FC CAPEX using LAN

\aleph

\aleph0 ____________________________ http://virtualaleph.blogspot.com/ ############### If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
0 Kudos