Sorry for the long subject line, but I wanted to be clear on the issue.
I have deployed vCenter 6.7, and I have two HP DL360 Gen 10 boxes that I'm installing vSphere 6.7 on. These 2 boxes have hardware that is all on the HCL, and I'm using 4 SSDs in each of them for vSAN. This includes the 2 port 10Gb adapter that is direct connected between the two.
I have logged into each of the hosts and set the management interface to also carry the vSAN witness traffic. The vSan witness box is in a different VLAN and in a different Site, but the connection is definitely sufficient according to the documentation, it's got 100Mbps with rtt averaging around 10ms.
I just setup my vSAN cluster, and everything on the cluster in health is passing checks, however shortly after the build the witness server throws two warnings:
Host cannot communicate with one or more other nodes in the vSAN enabled cluster
Host with vSAN service enabled is not in the vCenter cluster
I can ping the witness server from either of the two nodes in the cluster, and from the same vlan I can ping back to the two servers, so I'm not sure why these warnings are showing up.
Also I can't put the hosts in the cluster in maintenance mode since the witness server can't communicate with them.
I thought it might be to do with routing, even though the default routes in my network should handle it. I created routes on the two esxi servers to direct their traffic to the witness host through the default gateway, but this had no effects.
Run the command esxcli vsan cluster get on one of the ESXi hosts and check if Sub-Cluster Member has a value of 3 (2 nodes and witness) members.
Also can you verify in which networking mode vSAN traffic is operating? Since version 6.6, vSAN relies on unicast traffic.
Are you also getting the warning regarding "All hosts have a vSAN vmknic configured" for the witness node?
Can you SSH to the witness node and paste the output of "esxcli vsan network list". If there are 2 then please check if they are on the same subnet or not.
On your vSAN Witness Appliance, are vmk0 and vmk1 both on the same subnet?
If so, traffic that is tagged "vSAN Traffic" will go out vmk0, not vmk1. This is a multi-homing issue: https://kb.vmware.com/kb/2010877
I wrote a blog post about this that goes into more detail specific to the traffic configurations for 2 Node Direct Connect here: https://blogs.vmware.com/virtualblocks/2018/05/16/witness-host-traffic-tagging/
Also, if using ping to check connectivity, be sure to use vmkping -I vmkX when pinging an interface to connectivity
example pinging from vmk1 on a data node to vmk1 on a vSAN Witness Appliance: vmkping -I vmk1 192.168.110.23
I just had a slightly unconnected question to this which I hoping you might be able to advise on.
I am building a 2 node cluster with similar hosts, the only difference is they are DL380. The 10Gb adapters can I ask if they are the HPE Eth 10Gb 2p 562FLR-T Adapter?
Can I ask the way you have performed the direct connect using Cat6A cross over cable. How is it connected at the rear of each box, is it node 1 port 1 to node 2 port 1 or does it even matter which port they are connected to?
Also if you go to physical adapters for each host in vCentre do you see each 10GB card in there? For me they seem to be missing but I know I can see them when I run some commands on each ESXI cli.
Any help on this would be most appreciated.
I really have this same question.
I setup on the vSAN Witness vmk0 to have both Managment and vSAN traffic and vmk1 to have no tagged traffic like your blog post says. The issue still persists.
The weird thing is that when i disconnect the Managment interface on the vSAN witness the errors go away yet it becomes disconnected from the cluster.
Does an in-depth guide exist for 2-Node vSAN cluster with witness traffic using the same subnet as the VM managment?
"The issue still persists."
What ESXi and vCenter build numbers? (e.g. '8169922' not '6.7')
Can you show a screenshot of the error/health-check(with all drop-downs and sub-windows) and your Hosts&Clusters inventory in the vSphere Client?
"The weird thing is that when i disconnect the Managment interface on the vSAN witness the errors go away yet it becomes disconnected from the cluster."
That's not weird as you are disconnecting the only vmk with vsan traffic enabled and thus isolating the Witness.
"Does an in-depth guide exist for 2-Node vSAN cluster with witness traffic using the same subnet as the VM managment?"
Jase referenced his blog post mentioning it above and also here: