We are seeing issues with our 2-node ROBO cluster after one of the host failed and came back online.
Specifically, it is being reported that the previously failed host is in a separate partition. The VSAN Datastore is only showing half of the capacity that it should be showing.
Many cluster heath checks are failing. The nodes can ping each other and the witness over the VSAN networks.
There was an issue on the physical switches with multicast
Hello,
Check that the host is not in vSAN Maintenance Mode (It can be in this regardless of the MM state in vCenter):
#cmmds-tool find -t NODE_DECOM_STATE -f json
This should be "{\"decomState\": 0, \"decomJobType\": 0 for all hosts.
What version of vSAN are you using?
Check Multicast connectivity if using 6.5 or lower:
What responses are you seeing when you run these commands on each site?:
tcpdump-uw -i <VMk used for vSAN> -s0 udp port 23451
tcpdump-uw -i <VMk used for vSAN> -s0 udp port 12345
Check Unicast traffic if using 6.6 and have this configured.
Bob
-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-
There was an issue on the physical switches with multicast