VMware Cloud Community
Bucha7
Contributor
Contributor

VSAN Split into network partitions

Hello, I am having issues with creating a vsan in the environment I am administering.  We are running 5 hosts in a cluster running vcenter 6.5 and ESXi 6.5 (sorry can't upgrade).  When creating the vSAN it is separating the hosts into 5 separate network partitions.  This is causing the vsan to only show the storage equaling the amount of 1 host (when creating it we see all green checks and the correct storage prior to it configuring itself).

We are using a distributed switch with dedicated vmk on each host for vsan

All hosts can ping each others vmk ip that is dedicated for the vsan

igmp snooping is disabled on the switch

If anyone has run into a similar issue I am looking for any ideas to try, we have deployed the exact same environment a multitude of times and this is a new issue to me.

Thanks!

Moderator edit by wila: Moved to vSAN discussions

0 Kudos
3 Replies
Lalegre
Virtuoso
Virtuoso

Hey @Bucha7,

I recommend you to follow the next troubleshooting guide for Multicast traffic:https://blogs.vmware.com/vsphere/2014/09/virtual-san-networking-guidelines-multicast.html. I used it many times in the past.

However some issues could be:

  • Having more than one VMkernel with vSAN Traffic selected wrongly.
  • Having different VMkernesl in different subnets.
  • Specifying wrong netmasks in the VMkernels.
  • The VLAN used for vSAN not correctly tagged to all the physical ports where the pNICs for the portgroup are connected (Check Teaming and Failover policy)

I know you already did some of those checks but it could be useful for us if you confirm that you did all of those.

Also one tricky solution sometimes is to disable vSAN from the cluster (I am presuming you are still not using it), delete the VMkernels, recreate them with the vSAN VMkernel traffic tagged on them and then enable vSAN again. With this you are providing vSAN the traffic before it starts to configure everything.

0 Kudos
TheBobkin
Champion
Champion

@Bucha7, Specifically what build of ESXi and vCenter 6.5 is in use? This is potentially relevant here as 6.5 0d (build: 5310538) was when vSAN support for Unicast was introduced (and thus no need for configuring IGMP etc.) - if you are on an earlier build of 6.5 (e.g. GA or U1) and updating to 6.7/7.0 is not possible (but updating withing 6.5 is possible), I would advise updating to latest 6.5 U3 for numerous reasons.

 

"All hosts can ping each others vmk ip that is dedicated for the vsan"
Specifically how are you testing this? This should be checked with vmkping, specifying the vsan-enabled vmk and correct MTU e.g.:
Get the vsan-enabled vmk:
# esxcli vsan network list
Get the MTU and IP of the vmk:
# esxcfg-vmknic -l
Get the above settings from at least 2 nodes and ping from one node to anothers IP using for example (and change MTU accordingly e.g. 1472 if 1500 on the vmk, 8972 if 9000, change the vmk to the one in use):
# vmkping -I vmk1 -s 8972 -d <otherNodesvSANvmkIP> -c 5
If the vmks are set at 9000 but only 1500 (1472 -d) passes then you have the MTU on the vmks misconfigured.

 

If using a build 6.5 0d or later, you can check whether the nodes are using Unicast or not from 'esxcli vsan cluster get' - if they are then they should have unicast lists 'esxcli vsan cluster unicastagent list' if their lists are blank and/or incomplete, but they are set to use Unicast and vmkping is fine then this is likely the issue and the lists need to be populated (either via vSAN Health UI 'vCenter is Authoritative' remediate hosts or via manual addition of entries https://kb.vmware.com/s/article/2150303).

0 Kudos
Bucha7
Contributor
Contributor

Hello, thank you very much for the response,

-ESXi and vCenter are at the latest patches for 6.5

-I will attempt the communication that you described today, I had only logged into the hosts and ran a ping to the others vsan ip address so that may have not given me the information that I was trying to obtain

0 Kudos