vSphere/ESXi 6.0, all patches applied. Multicast performance test is failing with 0.00 received bandwidth. All other health checks including the multicast assessment and proactive storage performance tests are passing.
What can I check to determine the problem here?
The switches are: Netgear XS712T ProSafe 12-Port 10 Gigabit Ethernet
All ports are configured with: VLAN ID 1 (default VLAN) as untagged, VLAN 70 (VSAN) tagged.
The VLAN for VSAN, 70, has Multicast forwarding mode "Forward all" activated.
As the VSAN vmkernels are the only entities active on these interfaces and in this VLAN, Multicast snooping is disabled.
As these switches do not support stacking (and LACP) you will want to be using an active/standby config with them.
I suspect the switch to switch multicast is not working/performing properly. While I have used these switches, I always used them with A/B configurations which avoids this.
I would make sure the VMkernel's active path is all on the same switch (They should support LLDP, and if you switch the vDS to LLDP from CDP and set it up on the switch you can use this to see which path is going to which switch).
It looks like you are multicast flooding the VLAN (This works fine with these switches, just don't try to scale it across multiple VLAN's subnets).
Are you are only using one vmKernel per host for VSAN?
I turned off "forward all" in the meantime.
Setup: there is 1 "classic" vSwitch and 1 DvS.
- The management interface is bound to vmnic0 and vmnic1 and they go to two physically separated Gigabit switches belonging to the management network.
- There are 10 DvS Portgroups. DvS is connected to the 10Gig network.
- One DvS Portgroup is called VSAN, has the corresponding vmkernel port and is exclusively bound to vmnic2 on every host (with vmnic3 on standby). These are all physically linked to the same switch.
- Another one is called vMotion, is bound to vmnic3 (with vmnic2 on standby) and does what the name implies.
- The other 8 are VM-networking portgroups for the VM's. They are all bound to vmnic3 (with vmnic2 on standby).
- There is a 2x 10G LACP Inter-Switch-Link (ISL) between the two Netgear switches. In case of a switch-outage, the dvPortgroups can fail over to the other switch.
- The VSAN and vMotion VLAN's are not routable.
- All VLAN's are available as tagged on all ports of these ESXi servers and on the ISL's LACP LAG.
- VSAN multicast traffic goes to both ports of all 4 nodes as this VLAN is available as tagged on both switches (but only on the ESXi servers) so it travels the ISL.
- VSAN unicast traffic stays inside the same chassis as the destination mac-addresses exist only on the ports of that same switch (where all the "vmnic2" are connected to) unless the vmkernel-port fails over to the other vmnic and thus the other physical switch.
Summary: one switch is "active" for VSAN traffic only and the other switch is "active" for all other traffic incl. vMotion. Management traffic is isolated on it's own physical network.
In both Netgear switches, under "switching -> Multicast -> Bridge Multicast Forwarding", all VLAN's are on the default setting of "forward unregistered" (used to be "forward all" for the VSAN VLAN but not anymore).
I have not activated IGMP Snooping either at the moment. A networking guy told me it's not needed because the multicast traffic coming from the VSAN vmkernel interfaces stay within the unroutable VLAN (as it's a single L2 network).
I have experienced the exact same symptoms, VMs seem to be running as expected, however, Network test fails with 0.00 MB/s
Fresh install from latest binaries.
I am now officially calling it a bug. No matter how many networking-experts i talk to and no matter how I configure the networking-components, VSAN runs like a dream but this test ALWAYS fails completely.
I've stopped trying as it's a friggin waste of time.