I have been tasked with setting up vSphere Replication in our environment. I have the appliance installed and configured. However, I am receiving the following error when trying to replicate a VM.
ERROR
Operation Failed
Synchronization monitoring has stopped. Please verify replication traffic connectivity between the source host and the target vSphere Replication Server. Synchronization monitoring will resume when connectivity issues are resolved.
Operation ID: 5924d2f3-6423-4346-a884-29f906a3d9f4
From what I have been reading, this is a network configuration issue. I am unable to figure out where the misconfiguration is. What do I need to post here so that someone can help me identify the issue?
Basic Details
=================
vSphere Client version 6.7.0.44000
vSphere Replication Appliance 8.3.0.10045 Build 16284275
Thank you in advance.
Hi,
Follow steps below:
SSh to esxi host where Vm is running
get vm id:
Example:
vim-cmd vmsvc/getallvms |grep -i (vm name)
Then get remote replication service IP where VM is sending replication data to.
Example:
vim-cmd hbrsvc/vmreplica.getConfig (VMID)
Then check if ESXi host can ping to replication server IP.
Check if replication is being used by default management vmkernel (vmk0) or dedicated replication vmkernel.
PS: check on web client GUI for host vmkernel adapters if replication is enabled ( box tick)
Then just attempt to ping target ip.
example:
vmkping -I vmk0 x.x.x.x
If the host cannot ping then troubleshoot with your network team.
PS: if the host can ping but VM replication is still not working then try to add the route to ESXi host.
example:
esxcfg-route -a (dr Vrms ip address/32) (production gateway actual )
once completed just then check vm replication:
example:
vim-cmd hbrsvc/vmreplica.getState VMID
If still no activity then try to sync vm on command:
vim-cmd hbrsvc/vmreplica.create VMID
PS: this should do the trick.
If still not working then try to stop replication and VMotion VM to another ESXi host and check if the host can ping replication IP and reconfigure VM replication.
Hope this helps.
Hi
Adding to what user 'bbalido9 had already mentioned:
Once you have established that the source ESXi host can ping the target VR appliance, ensure host can reach to target VR appliance on port 31031. This port is required to initiate replications.
If there is a port connectivity issue, /var/log/vmkernel.log on source ESXi host will clearly report the error
I am able to ping from both hosts from the VMkernel adapters. There is also already a route. I did find this in the logs:
2020-06-19T10:36:00.525Z cpu2:2144144)WARNING: Hbr: 554: Connection failed to 10.1.x.x (groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout
2020-06-19T10:36:00.525Z cpu2:2144144)WARNING: Hbr: 4813: Failed to establish connection to [10.1.x.x]:31031(groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout
The 10.1.x.x is the management IP on the replication appliance. I am thinking I need to open that port on the appliance, but I am at a loss of how to do so. I tried the following to get a list of open ports but received an error.
root@VMReplication [ ~ ]# esxcli network firewall ruleset list
-bash: esxcli: command not found
It does seem like a port connectivity issue between the host and VR appliance. vSphere replication appliance does not have any configurations/firewalls for ports and the esxcli commands are for ESXi host.
Please check the following:
1. If you are using a dedicated vmkernel instead of management vmkernel, ensure vSphere replication traffic is enabled on it.
2. Ensure ESXi host has HBR service enabled on firewall VMware Knowledge Base
3. Test port connectivity from ESXi host to target VR appliance: nc -z VR_IP 31031
If the host still fails to establish a connection on port 31031, it's either a physical network device or firewall application that might be blocking the port
This is what I have setup on both hosts:
The hosts are directly connected to each other. No physical switch/firewall between.
Port properties
Network label Replication
VLAN ID None (0)
TCP/IP stack Default
Enabled services vSphere ReplicationvSphere Replication NFC
IPv4 settings
DHCP Disabled
IPv4 address 172.16.1.2 (static) <- The second host is 172.16.1.3
Subnet mask 255.255.255.224
Default gateway 172.16.1.1
DNS server addresses 10.1.x.x 10.1.x.x
NIC settings
MAC address
MTU 1500
Security
Promiscuous mode Reject
MAC address changes Accept
Forged transmits Accept
Traffic shaping
Average bandwidth --
Peak bandwidth --
Burst size --
Teaming and failover
Load balancing Route based on originating virtual port
Network failure detection Link status only
Notify switches Yes
Failback Yes
Active adapters vmnic8
Standby adapters vmnic9
Unused adapters --
Results from nc:
[root@localhost:~] nc -z 10.1.x.x 31031
Connection to 10.1.x.x 31031 port [tcp/*] succeeded!
If you have configured routing on ESXi host, can you check if similar routing is configured on target VR appliance. This has to be manually configured and you can use following document as a reference(Refer step.4):
Configure Routing on the vSphere Replication Appliances in Each Region
I have set the routes as follows:
Replication Appliance
----------------------------------------------
Destination Gateway Genmask
0.0.0.0 10.1.x.x 0.0.0.0
10.1.x.0 0.0.0.0 255.255.255.0
10.1.x.0 10.1.x.x 255.255.255.0
Host
----------------------------------------------
Network Netmask Gateway
default 0.0.0.0 10.1.x.x
10.1.x.0 255.255.255.0 0.0.0.0
172.16.0.0 255.255.255.224 0.0.0.0
Still have the following error
2020-06-22T08:21:46.043Z cpu14:2144144)WARNING: Hbr: 554: Connection failed to 10.1.x.x (groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout
2020-06-22T08:21:46.043Z cpu14:2144144)WARNING: Hbr: 4813: Failed to establish connection to [10.1.x.x]:31031(groupID=GID-540795df-d726-48d9-b7ac-25a372cdcac5): Timeout
I ran Wireshark and it appears that the VMs are not talking to each other. The only traffic I see is from vCenter to Host and vice versa. How else could I check this?