VMware Cloud Community
monarch684
Contributor
Contributor

vSphere Replication Setup

I have been tasked with setting up vSphere Replication in our environment.  I have the appliance installed and configured.  However, I am receiving the following error when trying to replicate a VM.

ERROR

Operation Failed

Synchronization monitoring has stopped. Please verify replication traffic connectivity between the source host and the target vSphere Replication Server. Synchronization monitoring will resume when connectivity issues are resolved.

Operation ID: 5924d2f3-6423-4346-a884-29f906a3d9f4

From what I have been reading, this is a network configuration issue.  I am unable to figure out where the misconfiguration is.  What do I need to post here so that someone can help me identify the issue?

Basic Details

=================

vSphere Client version 6.7.0.44000

vSphere Replication Appliance 8.3.0.10045 Build 16284275

Thank you in advance.

Reply
0 Kudos
8 Replies
bbalido9
Contributor
Contributor

Hi,

Follow steps below:

SSh to esxi host where Vm is running

get vm id:

Example:

vim-cmd vmsvc/getallvms |grep -i (vm name)

Then get remote replication service IP where VM is sending replication data to.

Example:

vim-cmd hbrsvc/vmreplica.getConfig (VMID)

Then check if ESXi host can ping to replication server IP.

Check if replication is being used by default management vmkernel (vmk0) or dedicated replication vmkernel.

PS: check on web client GUI for host vmkernel adapters if replication is enabled ( box tick)

Then just attempt to ping target ip.

example:

vmkping -I vmk0 x.x.x.x

If the host cannot ping then troubleshoot with your network team.

PS: if the host can ping but VM replication is still not working then try to add the route to ESXi host.

example:

esxcfg-route -a (dr Vrms ip address/32) (production gateway actual )

once completed just then check vm replication:

example:

vim-cmd hbrsvc/vmreplica.getState VMID

If still no activity then try to sync vm on command:

vim-cmd hbrsvc/vmreplica.create VMID

PS: this should do the trick.

If still not working then try to stop replication and VMotion VM to another ESXi host and check if the host can ping replication IP and reconfigure VM replication.

Hope this helps.

Reply
0 Kudos
ashilkrishnan
VMware Employee
VMware Employee

Hi

Adding to what user 'bbalido9had already mentioned:

Once you have established that the source ESXi host can ping the target VR appliance, ensure host can reach to target VR appliance on port 31031. This port is required to initiate replications.

If there is a port connectivity issue, /var/log/vmkernel.log on source ESXi host will clearly report the error

Reply
0 Kudos
monarch684
Contributor
Contributor

I am able to ping from both hosts from the VMkernel adapters.  There is also already a route.  I did find this in the logs:

2020-06-19T10:36:00.525Z cpu2:2144144)WARNING: Hbr: 554: Connection failed to 10.1.x.x (groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout

2020-06-19T10:36:00.525Z cpu2:2144144)WARNING: Hbr: 4813: Failed to establish connection to [10.1.x.x]:31031(groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout

The 10.1.x.x is the management IP on the replication appliance.  I am thinking I need to open that port on the appliance, but I am at a loss of how to do so.  I tried the following to get a list of open ports but received an error.

root@VMReplication [ ~ ]# esxcli network firewall ruleset list

-bash: esxcli: command not found

Reply
0 Kudos
ashilkrishnan
VMware Employee
VMware Employee

It does seem like a port connectivity issue between the host and VR appliance. vSphere replication appliance does not have any configurations/firewalls for ports and the esxcli commands are for ESXi host.

Please check the following:

1. If you are using a dedicated vmkernel instead of management vmkernel, ensure vSphere replication traffic is enabled on it.

2. Ensure ESXi host has HBR service enabled on firewall  VMware Knowledge Base

3. Test port connectivity from ESXi host to target VR appliance:  nc -z  VR_IP  31031

VMware Knowledge Base

If the host still fails to establish a connection on port 31031, it's either a physical network device or firewall application that might be blocking the port

Reply
0 Kudos
monarch684
Contributor
Contributor

This is what I have setup on both hosts:

The hosts are directly connected to each other.  No physical switch/firewall between.

Port properties

Network label Replication

VLAN ID None (0)

TCP/IP stack Default

Enabled services vSphere ReplicationvSphere Replication NFC

IPv4 settings

DHCP Disabled

IPv4 address 172.16.1.2 (static) <- The second host is 172.16.1.3

Subnet mask 255.255.255.224

Default gateway 172.16.1.1

DNS server addresses 10.1.x.x 10.1.x.x

NIC settings

MAC address

MTU 1500

Security

Promiscuous mode Reject

MAC address changes Accept

Forged transmits Accept

Traffic shaping

Average bandwidth --

Peak bandwidth --

Burst size --

Teaming and failover

Load balancing Route based on originating virtual port

Network failure detection Link status only

Notify switches Yes

Failback Yes

Active adapters vmnic8

Standby adapters vmnic9

Unused adapters --

Results from nc:

[root@localhost:~] nc -z 10.1.x.x 31031

Connection to 10.1.x.x 31031 port [tcp/*] succeeded!

Reply
0 Kudos
ashilkrishnan
VMware Employee
VMware Employee

If you have configured routing on ESXi host, can you check if similar routing is configured on target VR appliance. This has to be manually configured and you can use following document as a reference(Refer step.4):

Configure Routing on the vSphere Replication Appliances in Each Region

Reply
0 Kudos
monarch684
Contributor
Contributor

I have set the routes as follows:

Replication Appliance

----------------------------------------------

Destination  Gateway    Genmask

0.0.0.0          10.1.x.x     0.0.0.0

10.1.x.0        0.0.0.0       255.255.255.0

10.1.x.0        10.1.x.x      255.255.255.0

Host

----------------------------------------------

Network     Netmask                  Gateway

default        0.0.0.0                     10.1.x.x

10.1.x.0      255.255.255.0         0.0.0.0    

172.16.0.0  255.255.255.224     0.0.0.0

Still have the following error

2020-06-22T08:21:46.043Z cpu14:2144144)WARNING: Hbr: 554: Connection failed to 10.1.x.x (groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout

2020-06-22T08:21:46.043Z cpu14:2144144)WARNING: Hbr: 4813: Failed to establish connection to [10.1.x.x]:31031(groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout

Reply
0 Kudos
monarch684
Contributor
Contributor

I ran Wireshark and it appears that the VMs are not talking to each other.  The only traffic I see is from vCenter to Host and vice versa.  How else could I check this?

Reply
0 Kudos