I recently built a 5 node vSAN cluster using a Dell PowerEdgeMX7000 chassis and M740c sleds with 1 SSD and 2 SAS drives each. The chassis has 1 pair of M5108 ethernet switches that uplink to the main core. vSAN traffic does not leave the chassis switches.
The vSAN cluster with the 5 nodes is healthy and all skyline health checks pass. i am currently running Horizon View on this cluster.
I am running esxi 6.7u3+ on these nodes as well.
I am trying to add 3 more vSAN nodes. they are configured identically to the existing 5 nodes. Same model, disk config, network, cpu/mem etc. I am strictly using a vSphere distributed switch with MTU of 9000 configured on the vSAN vmk port group.
All physical switch ports in the chassis are configured identically, each has an MTU of 9216, flow control is all the same,etc.
When i attempted to add one new node, the health check immediately flagged the vSAN : Basic unicast connectivity check and vSAN : MTU check (ping with large package size) tests on vmk2 which is the vSAN vmk. I checked using ping tests that connectivity between all the nodes is good. I can ping with jumbo frames, although i cannot ping with a packet size of 9000 with no packet fragmentation, but I can't do that on the original 5 nodes that still show ok. As soon as i put the host back in maintenance mode, the health status goes back to healthy.
Then i tired adding a different node and over the course of 3 hours or so, i retested the health several times and it always showed green. Sometime overnight, something happened with that node that caused all kinds of vSAN issues. Horizon desktops would not power on, existing sessions were kicked off. The vSAN cluster status showed partitioned. I had to forcefully power off that new node, then power it back on and do a full data migration and remove it from the cluster to get the cluster healthy again. As before, with the 5 original nodes, it shows healthy.
What am i missing here? If the 3 new nodes are on the same ethernet switches, same config, etc. why am I having this problem?
I'm lost on this one.