Replication to remote (public) site

Historically, I have replicated to a second site on my own LAN, but now would like to replicate to a site that is mine, but that is not on my LAN (datacenter cloud site).

While I have a VPN between the sites, I do not want the replication traffic to flow through the VPN because of VPN hardware traffic limitations.  (Turning on replication encryption checks the security box for me.)

I have been working with VMware support, but have had some struggles with them understanding what I want to do so I was hoping the community might have some insight.

First, is it possible to replicate to a non-LAN site?  If a colleague and I entered into an agreement to be replication targets for each other, could this be done?  We would not have access to each other's LAN, so we would have to rely upon NAT'ing things out to public IPs, source restricting firewall rules, and of course, replication encryption.  If this is NOT officially possible, then that explains my struggles, but I thought I could do this.  Maybe this requires some vmware 'cloud' pieces though to work properly?  I thought I could NAT out vCenter and the replication appliance and be golden (with appropriate firewall rules to source restrict the traffic of course).

The issue that I am having with this setup is that while I can get replication set up and configured on both sides, can pair my sites, and can create a replication for a VM, the system sets the replication target to the private LAN IP of the replication appliance on the target side so replication traffic doesn't start.  When I was replicating on the same LAN, this would make sense, but in the example of replicating to a colleague site, I would never have visibility to the remote LAN IP, only the public NAT IP.

I have DNS entries set up to point to the IPs that I want to use, but replication seems to be ignoring these DNS entries and is getting the LAN IP of the target appliance somehow on its own.  This seems to be the one piece of VMware software that does NOT rely upon DNS.  I have confirmed on every host and appliance that the host names I use resolve to the IPs that I want to use, but no matter what, the vmx file always ends up with the local LAN IP of the target replication appliance.

I have previously used the 'isolate replication traffic' technique, but I don't need replication traffic to be on a new IP (from the host or appliance perspective) in this case.  Traffic on both the source and target sides needs to be on the one original network adapter and IP in the end, the target side just needs to be contacted through the NAT'd public IP on the way to the local LAN IP.  It appears the isolate technique would not help me here anyway, because it appears as though the target appliance is sending its local IP to the system to use as a target and any IP I would put on extra adapters would be a private LAN IP that would not be directly accessible to the source side.  Unfortunately, I am not able to directly assign a public IP to the target appliance.  If this is indeed by design, that the target replication appliance 'announces' its local IP as the target that must be used for replications, then I would have to somehow assign a public IP directly to the appliance to fix this.

I have been able to 'hack' this into working by editing the vmx file for a replicated VM directly after setting up a replication job.  The system will fill in a 10.x.x.x IP for the hbr_filter.netEncryption.destination variable, I shut down the VM, edit the vmx file to reflect the public IP of the target appliance, reload the VM config, power on the VM, and viola, replication picks right up and runs how I want it to run!  I have 4 VMs currently working this way.

This is not a permanent fix however, as the config gets changed back to the LAN IP at unknown times (maybe with vmotion?  Or part of standard maintenance tasks?).  Also, it isn't ideal to have to power off a VM to get replication going.  There is also a mystery as to whether this 'hack' would break SRM, which we use but I haven't configured yet until I have replication solid.

Worst case, I can do this 'hack' for initial replications and let the system move future replications back to the LAN IP over the VPN whenever its going to do that and at least get the benefit of faster throughput for the initial replication, but it would be wonderful if I could make this 'hack' permanent to keep replication traffic on the faster path.

(And yes, upgraded VPN hardware is in the future budget, but just not today)

0 Kudos
1 Reply

VMware tech confirmed that replication is NOT designed to work between two sites that are not on the same LAN.  So the example of me replicating to a colleague's site (that is NOT a cloud provider) is an example they say should not work.

They pointed to this article that has a note that says NAT is not supported:

As far as I can tell though, the only thing preventing this is that the vmx config file stores the IP address of the target site instead of the host name / FQDN.  If it used the hostname / FQDN, then it would get the IP from a DNS lookup and work just fine.

In my case, unless I find another solution, this is what 'works' for me.  (use at your own risk - NOT fully vetted, NOT tested with SRM yet)

1. Power off VM
2. Create new replication
3. Wait 10 seconds (for config to be updated)
4. Edit vmx file (vi vmname.vmx)
  a. MUST be SSH onto host that 'owns' that VM right now.
  b. Put IP I want on this line: hbr_filter.netEncryption.destination = x.x.x.x
5. vim-cmd vmsvc/getallvms | grep -i {vmname}
6. vim-cmd vmsvc/reload {vmid}
7. Power on VM

Replication will take off using the new IP from the vmx file.  In my case, this means that traffic is sent out our public internet connection and then follows a NAT on the target site from a public IP to the actual private IP of the replication appliance.  I have encryption turned on and source restrict the NAT on the target side in the firewall.

At some point, vSphere WILL change the IP in the vmx file back to the private LAN IP of the replication appliance on the target site.  This may occur with vmotion events or 'reset' type events like pausing replication or reconfiguring replication options.

I am doing this because my private LAN link to the target appliance is bandwidth limited by a VPN appliance.  Traffic through the VPN is limited to 200 Mbps, while traffic through the public internet can reach 500 Mbps.  So to speed up initial replication time, I am taking this risk.

Once all initial replications are complete, I will leave the vmx file alone and let ongoing replications use the VPN LAN path at the slower bandwidth.

0 Kudos