vSphere Replication Add-On Server flooding network

burchell99 · ‎01-17-2018

We have deployed vSphere Replication 6.1.1 and it has been working with approx 60 replications for 6 months without any issues

Last week we deployed additional add on servers in a remote branch (managed by the same vCenter) and now all hosts within the vCenter globally are sending hundreds of MB per hour to the Add-On appliances. This is causing network bandwidth issues due to the volume of traffic from the 50 hosts globally.

I believe an initial mapping exercise takes place for the appliance to understand which hosts have access to which datastores but its sending 3GB per day from each host and has no sign of ending!

Has anyone experienced this before? I have a P2 support ticket open but am reaching out to the community to see if this is expected behavior / a bug or something else entirely

Any advice greatly appreciated

vbrowncoat · ‎01-17-2018

Why did you add additional VR Server(s) to your remote site? If you configured replication of VMs to or from the remote site did you select the VR server at the destination when you configured them?

The traffic flow for VR is this: Source VM > Source Host (host VM is running on) > Destination VR appliance or server > Destination Host > Destination Storage

What this means is that to replicate to a remote site you want the VR server deployed at the remote site and replication configured to use it.

Please provide some additional details about your topology and configuration so we can get a better idea of your traffic flows.

burchell99 · ‎01-19-2018

Thanks for the reply

additional VR servers were added to remote sites because the replication is happening in that country which is thousands of miles away from the primary data centres

I configured the replication using the add-on VR server and it works as expected

We have the embedded primary vSphere replication servers both in the UK and working fine with 60+ replications. The add-on appliances are in a remote country with currently 1 replication configured to use the add on.

my problem is the network traffic being sent to discover all hosts and datastores mappings within that vCenter (not replication traffic, just initial discovery)

support note the logs have: "750+ repetitions for the same host-datastore combination" and have asked me to increase CPU and RAM resources on the add on appliance which I can do but find it hard to believe its the root cause

Anyone seen this before? I am continuing to work with support. We previously had vSphere rep 5.5 setup in the same topology and configuration and had no experience like this

here is an extract of the log analysis (host names have been changed)

---------------------------------------

Searching for the references for specific hosts show 750+ repetitions for the same host-datastore combination:

e.g.

grep "accessible true" _var_log_vmware_hbrsrv*log | awk '{print $7,$9}'| grep host-87| sort |uniq -c

765 host-87 /vmfs/volumes/59302ddb-6aec7dc4-4b6f-0025b56210be

765 host-87 /vmfs/volumes/59369879-25e5f99c-9f65-0025b5621000

765 host-87 /vmfs/volumes/5936989f-c24c1d19-5832-0025b5621000

765 host-87 /vmfs/volumes/593698d0-435294c6-e47a-0025b5621000

765 host-87 /vmfs/volumes/59369909-177fb6d1-f680-0025b5621000

765 host-87 /vmfs/volumes/59369957-19dd6dd8-2326-0025b5621000

765 host-87 /vmfs/volumes/5936999d-dc17bf42-b1e4-0025b5621000

765 host-87 /vmfs/volumes/593699e2-d8eefcf2-f053-0025b5621000

765 host-87 /vmfs/volumes/59369a2e-de9ebc18-f769-0025b5621000

765 host-87 /vmfs/volumes/59369a7a-602c5e98-6a21-0025b5621000

765 host-87 /vmfs/volumes/59369acb-bf8cad98-0401-0025b5621000

765 host-87 /vmfs/volumes/59369cd7-24490508-1d64-0025b5621014

765 host-87 /vmfs/volumes/59369d15-0947c92c-5c02-0025b5621014

765 host-87 /vmfs/volumes/59369d5d-3d62e08a-2502-0025b5621014

765 host-87 /vmfs/volumes/59369d95-85aea7f9-0059-0025b5621014

765 host-87 /vmfs/volumes/59369dcc-22a7847b-7463-0025b5621014

765 host-87 /vmfs/volumes/59369dfc-38dc64dc-b2b5-0025b5621014

765 host-87 /vmfs/volumes/59369e28-eead2b5c-c8e7-0025b5621014

765 host-87 /vmfs/volumes/59369e53-df715bb8-9c97-0025b5621014

765 host-87 /vmfs/volumes/59369e85-f865fd52-37e3-0025b5621014

765 host-87 /vmfs/volumes/59369ed9-f78276b8-4d27-0025b5621014

765 host-87 /vmfs/volumes/598d610a-cce9c00a-5462-0025b5621154

765 host-87 /vmfs/volumes/59e1d92e-ec292b0d-f16c-0025b5621014

765 host-87 /vmfs/volumes/59e1d980-188f20ae-6d3b-0025b5621014

765 host-87 /vmfs/volumes/59e1da19-34f9854e-6e0a-0025b5621014

765 host-87 /vmfs/volumes/59e1da4e-e5d7659b-1bb4-0025b5621014

765 host-87 /vmfs/volumes/59e1da7e-fea1c702-f398-0025b5621014

765 host-87 /vmfs/volumes/59e1dab2-3261a01f-f7de-0025b5621014

765 host-87 /vmfs/volumes/59eba7e6-fdbf9d18-17e6-0025b56210dc

765 host-87 /vmfs/volumes/59eba81f-482d1508-256a-0025b56210dc

765 host-87 /vmfs/volumes/59eba842-a5935f28-29c7-0025b56210dc

765 host-87 /vmfs/volumes/5a09f7e7-95af91e1-3811-0025b5621168

765 host-87 /vmfs/volumes/5a09f94d-ee363e1d-d5fc-0025b5621168

765 host-87 /vmfs/volumes/5a0a20ad-be9563f0-739f-0025b5621168

Entries repeat in this kind of pattern:

_var_log_vmware_hbrsrv-216.log:2018-01-16T16:03:23.692Z verbose hbrsrv[7F4E524DC760] [Originator@6876 sub=Host opID=hs-init-75f6efae] Host: host-87 Datastore: /vmfs/volumes/59302ddb-6aec7dc4-4b6f-0025b56210be -> 59302ddb-6aec7dc4-4b6f-0025b56210be (name esx01-localstorage) accessible true

_var_log_vmware_hbrsrv-216.log:2018-01-16T16:03:27.607Z verbose hbrsrv[7F4E4B123700] [Originator@6876 sub=Host] Host: host-87 Datastore: /vmfs/volumes/59302ddb-6aec7dc4-4b6f-0025b56210be -> 59302ddb-6aec7dc4-4b6f-0025b56210be (name esx01-localstorage) accessible true

_var_log_vmware_hbrsrv-216.log:2018-01-16T16:03:32.498Z verbose hbrsrv[7F4E4B42F700] [Originator@6876 sub=Host] Host: host-87 Datastore: /vmfs/volumes/59302ddb-6aec7dc4-4b6f-0025b56210be -> 59302ddb-6aec7dc4-4b6f-0025b56210be (name esx01-localstorage) accessible true

_var_log_vmware_hbrsrv-216.log:2018-01-16T16:03:37.162Z verbose hbrsrv[7F4E4B7BD700] [Originator@6876 sub=Host] Host: host-87 Datastore: /vmfs/volumes/59302ddb-6aec7dc4-4b6f-0025b56210be -> 59302ddb-6aec7dc4-4b6f-0025b56210be (name esx01-localstorage) accessible true

_var_log_vmware_hbrsrv-216.log:2018-01-16T16:03:41.706Z verbose hbrsrv[7F4E4B4B1700] [Originator@6876 sub=Host] Host: host-87 Datastore: /vmfs/volumes/59302ddb-6aec7dc4-4b6f-0025b56210be -> 59302ddb-6aec7dc4-4b6f-0025b56210be (name esx01-localstorage) accessible true

_var_log_vmware_hbrsrv-216.log:2018-01-16T16:03:45.897Z verbose hbrsrv[7F4E4B268700] [Originator@6876 sub=Host] Host: host-87 Datastore: /vmfs/volumes/59302ddb-6aec7dc4-4b6f-0025b56210be -> 59302ddb-6aec7dc4-4b6f-0025b56210be (name esx01-localstorage) accessible true

beefy147 · ‎02-01-2018

This is now resolved. Working with support it was a storage IO control (SIOC) bug

disabling SIOC on all datastores (luckily we didn't need to use it) resolved the issue

"Why did ESX send so many datastore events? It is SIOC issue which is fixed in

vsphere60u3.

As a workaround, you can ask customer to disable SIOC in all datastores

Finally restart the vsphere replication appliance"

This work around was suitable for us! Hopefully this helps someone else one day

All

vSphere Replication Add-On Server flooding network