VMware Cloud Community
grob115
Enthusiast
Enthusiast

ESXi across data centres

Am aware that VMware has great features such as vMotion or Fault Tolerance (keeping two synchronized ESXi host in sync and have the backup to take over immediately if the primary one goes haywired).  However, from what I understand both these require shared storage of some sort.

Is there a way I can have two ESXi and the VMs running on it synchronized across datacentre?  I want to add an additional host for resiliency but don't want to have both in the same location.

Reply
0 Kudos
16 Replies
Troy_Clavell
Immortal
Immortal

just a small correction, FT is for guests (virtual machines) not Hosts (ESX/ESXi).  As to your guestion, you may want to look at VMware vCenter Site Recovery Manager

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

Solutions could be use storage replication and SRM.

Or VM replication with tools like Veeam Backup & Replicator or Quest vReplicator.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
grob115
Enthusiast
Enthusiast

Am looking at the vCenter Site Recovery Manager now.  Looks like it does require some kind of enterprise SAN level type of replication based on page 2 of the following PDF.

http://www.vmware.com/pdf/srm_40_gettingstarted.pdf

As for the Veem product I've tried their Veem Free SCP a few times but it fails.  I get better speed by Putty to the ESX host and do a SCP.  However, this is still too slow as the best I got was only 3-5MB/s rather than close to 100Mbps / 8 = 12.5MB/s.  This is also dangerous as my connection to the source ESXi got disconnected and when I tried to remove the destination's copy before starting the SCP again, I inadvertently deleted my source!

Reply
0 Kudos
mmarzotto
Contributor
Contributor

Yes, you are correct about the replication product. For all intensive purposes, SRM is simply a tool to automate steps to recover VMFS/VMs so you dont have to do all the work. The main underlying component of SRM is the replication tool that you choose and whether it has a support Storage Replication Adapter for it (should you go down that route) -- this will depend on what type of array you currently have and what size of pipe you have from data center A to data center B.

Reply
0 Kudos
bulletprooffool
Champion
Champion

grob - these are enterprise grade tools . .with enterprise grade price tags - and unfortunately, do require SAN level replication of some sort,

to be honest - companies that can afford SRM, generally already have San replication in place.

One day I will virtualise myself . . .
Reply
0 Kudos
LeftHandVSA
Contributor
Contributor

This might be fun:

In your secondary site, put a Linux VM running OpenFiler (both free) on an ESX host. Create a NFS mount and add it as a datastore to the main site's hosts. Then, use powershell to snapshot and copy your VM's to the NFS mount. Then you could schedule the script say for once a day.

Smiley Happy

Reply
0 Kudos
grob115
Enthusiast
Enthusiast

I've heard about Open Filer, and there's also another one (can't remember name).  Both are free.  However, what am I missing of using these free SAN software as opposed to a real SAN?  And what are the risks of them vs an internal RAID 1?

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

Think about how large your VMs are (VMDK sizes) and consider how long it will take to copy your data across your WAN connection. SAN replication generally occurs at the byte or sector level but even that can require a fairly large WAN connection. The replication may be synchronous as changes occur on the main site SAN to the remote location or asynchronously at some defined interval. That sophistication requires a somewhat high end SAN.

There are software solutions that work inside the VM that can replicate a running VM across the WAN but again these will be high end (read expensive) products.

-- David -- VMware Communities Moderator
Reply
0 Kudos
grob115
Enthusiast
Enthusiast

Okay basically here's the background.  I have a ESXi host with a few VMs running on it and the largest one has a VMDK file of 250GB.

The ESXi host has a RAID 1 configuration but one of the SAS disks failed and even after swapping a new one in, the RAID controller failed to rebuild.

Hosting company recommended migrating everything to a brand new server.  Great.

But then the question is how.

Turns out the easiest and "the only way" to do this is to Putty to the ESXi and do something like the following for each of the VM.

cd /vmfs/volumes/datastore1/

scp -r '<VM name>' root@<new ESXi IP>:/vmfs/volumes/datastore1/

Even with a direct cross cable connection this is going at a crawling rate of 3MB/s.  In other words, the 250GB VMDK should take about 23 hrs.

Needless to say we tried a few times and each time it failed.

Veem's Fast SCP also doesn't work from where I'm at.  It's even slower as it's not responding at all.

And also during the process of transfer I inadvertently deleted two of my VMs because I thought I was removing them from the new ESXi after a stalled scp.

The bottom line is, what can I do to improve on:

1) Recovery time

2) Data protection

Any options, please throw on the table.  But it has to be cheap (ie getting an extra server is fine, but getting a SAN is not).  Best is to be able to save the backed up data off site also just in case.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

None of this is likely to be cheap. If you have two different locations -- different datacenters -- you will probably have a very large data transfer fee on both ends. If you have both servers in the same location you will avoid the transfer fee but it isn't necessarily going to solve your problem. I would suggest that the servers may not have a battery backed caching RAID controller. Also if the failed RAID controller didn't finnish rebuilding it will drastically slow copies.

http://www.neverfailgroup.com

http://www.veeam.com/ESX-Replication

http://www.doubletake.com/

-- David -- VMware Communities Moderator
Reply
0 Kudos
pinnerblinn
Contributor
Contributor

Another possibility is to use VMware Converter (free) to export the VM to Workstation format, stick it on a large USB drive. Bring the drive to the other data center and use vconverter again to bring the VM back in. Just a note that if the VMDK is 250GB but the used data is only say 100GB, it will only export 100GB worth of vmdk's. It's old school sneaker net but hey, sometimes ya gotta go back to go forward 🙂

Reply
0 Kudos
grob115
Enthusiast
Enthusiast

Thanks guys.  Comments:

1) I was told by the guys at the datacentre that the ESXi running on Dell R210II isn't able to recognize USB drives.  It's either the case doesn't have USB (unlikely) or ESXi can't accept hot plug hard drive via USB (more likely).  Can't remember which one of the two.  They even tried plugging in an eSATA drive but failed to have it mounted.

2) "I would suggest that the servers may not have a battery backed caching RAID controller."

    Sorry what's the issue with not having a battery backed cache aside from an un planned power outage with some information not written to disk?

Am seriously considering in looking into adding an extra box to run one of those SAN software.  This way I can have better resiliency in within the SAN box.  Any issues, concerns, or risks?

http://www.openfiler.com/

http://www.starwindsoftware.com/

http://www.datacore.com/

http://www.open-e.com/

http://www.dataplow.com/

Reply
0 Kudos
mmarzotto
Contributor
Contributor

grob115 wrote:

Okay basically here's the background.  I have a ESXi host with a few VMs running on it and the largest one has a VMDK file of 250GB.

The ESXi host has a RAID 1 configuration but one of the SAS disks failed and even after swapping a new one in, the RAID controller failed to rebuild.

Hosting company recommended migrating everything to a brand new server.  Great.

But then the question is how.

Turns out the easiest and "the only way" to do this is to Putty to the ESXi and do something like the following for each of the VM.

cd /vmfs/volumes/datastore1/

scp -r '<VM name>' root@<new ESXi IP>:/vmfs/volumes/datastore1/

Even with a direct cross cable connection this is going at a crawling rate of 3MB/s.  In other words, the 250GB VMDK should take about 23 hrs.

Needless to say we tried a few times and each time it failed.

Veem's Fast SCP also doesn't work from where I'm at.  It's even slower as it's not responding at all.

And also during the process of transfer I inadvertently deleted two of my VMs because I thought I was removing them from the new ESXi after a stalled scp.

The bottom line is, what can I do to improve on:

1) Recovery time

2) Data protection

Any options, please throw on the table.  But it has to be cheap (ie getting an extra server is fine, but getting a SAN is not).  Best is to be able to save the backed up data off site also just in case.

If you are trying to protect the VMs that are running on a degraded host, creating a cluster with multiple hosts configured with HA/DRS and shared storage (Openfiler is nice free program -- I believe the cost comes into play when licensing support --- I could be wrong, I havent used Openfiler in years).

With the shared storage and cluster setup, you can utilize HA for host failures and storage VMotion for moving VMs without the pesky SCP -- of course license costs come into play with this setup but some license cost up front is a lot cheaper than losing production servers and data, in my opinion.

Reply
0 Kudos
grob115
Enthusiast
Enthusiast

Yeah I think too this is probably the best route.  Even the Star Winds and Open E are about $800-$900 USD only as an up fron investment.  Only running cost will be the extra storage server.  Am only going to run one ESXi host on one storage server, not 2 ESXi hosts on one storage server.  The cost is too much for now.

However, with everything on the storage server, not sure of the risks involved.  Assuming I can only have one RAID controller inside (not sure if I can have more than one RAID controller) the storage server, and if the RAID controller screws up, I'd still suffer data loss (maybe even complete data loss).  Right?  Not sure how these SAN software can mitigate this.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

A RAID controller without write caching enabled will be a very poor performer. The only way to do write caching at least safely is to have battery backed or flash backed cache. If I saw correctly this is an R210 which would not have come with a higher end controller unless you specifically ordered it that way so you may be stuck without write caching. I would check into it though.

Unless you had multiple ESXi hostrs connected to the same SAN it wouldn't be especially useful for normal operation. The SAN would be useful as a backup destination using something like ghettoVCB http://communities.vmware.com/docs/DOC-8760 to clone the VMs to the SAN. The SAN is connected to the ESXi host as a datastore. In the event of a disaster on the locally attached disks the backup VMs could be added to inventory and you could be back up and running in a few minutes.

-- David -- VMware Communities Moderator
Reply
0 Kudos
grob115
Enthusiast
Enthusiast

Found some commercial off the shelf storage solution.  Take a look at the following.

Correct me if I'm wrong.  SAN allows you to boot the host or guests directly off it, and NAS can't, right?

Wondering if it's possible to have the VM guests' OS installed off a thick provisioned disk from the ESXi's host's local storage, and then mount a NAS as a mount point with all the application and business data stored on this NAS mount point.  Would there be a performance hit compared with SAN or local attached storage?

SAN for $7000

http://www.snaphq.com/Snap-Server-SAN-S1000.asp

NAS for $1423

http://www.snaphq.com/Snap-Server-410.asp

Reply
0 Kudos