Do I understand that there is one 10 Gb switch to which all 3 hosts connect via one uplink each? And that you then have each host connected via one uplink to separate 1 Gb switches? And... and the 10 Gb is active and the 1 Gb is standby?
If that's true and I imagine a scenario where a single 10 Gb fails on one host, that host will then start using its 1 Gb connection and not be able to communicate with the other hosts still using their 10 Gb connections. It sounds like you're trying to anticipate the failure of the entire 10 Gb layer. In this scenario, I don't know how it would be possible with the 1 Gb uplinks all going to separate switches.
From my experience with vSAN and posts to this forum, this seems like the scenario where we get into trouble. Using vSAN to save $$$ and not configured in an optimal way. I would stop and either re-design or abandon vSAN in this scenario. We procured two Del 4032 10 Gb switches for around $8,000. Thank you, Zach.
Yes - The customer has a single 10 GbE switch so all three hosts are connected there. For 1 GbE they have two top of rack switches and each server is placed in separate racks putting them on different switches.
For a three node cluster with one server failed to 1 GbE will probably slow the system down which is better than taking the thing completely offline.
I was reading that it could be an IGMP issues with multicast. I'm going to investigate that further.
Are the 1 Gb switches uplinked to the 10 Gb switch? If so then I could see this working, but as you say it would slow down and vSAN traffic to the host on the 1 Gb network. If not then the host with the failed 10 Gb uplink would be isolated from the rest of the vSAN cluster until the 10 Gb link was brought back online.
As for IGMP and multicast. If you have dedicated switches for vSAN you can do the easy config and just enable multi-cast en mass. But if the switches will carry other any other traffic than vSAN you will need to configure IGMP snooper instead of enabling multicast en mass, doing so could be problematic.
Will you be able to test failover scenarios or is some of this already in production and therefor cannot be taken offline?
Thank you, Zach.
You might want to focus on establishing the 1Gbe connectivity for VSAN first. Get that all hunky dory. Then make sure your uplinks on the switches carry the proper VLAN and have IGMP Snooping enabled for that VLAN, also your ports the hosts. Verify VSAN connectivity and start the migration to 10Gbe. First do the same VLAN tagging for your VSAN 10Gbe ports, enable IGMP snooping for the VLAN, and the corresponding uplinks to your 1GB Switch. Setup your VDS VSAN port group to use the the 10GB nic as a standby for each host. Then put the first host in maintenance, and turn the 1Gbe port off on the 1G switch. See if it fails over to the 10GB, and verify the connectivity. Re-activate that port, and re-run the process for each host. VSAN does not support "beacon failover" so failover should be tested at the link level. Once all the hosts failover properly, on your VDS VSAN port group teaming move the 10Gbe uplink to active, and the 1Gbe to standby. Once again verify functionality.
Hope this helps in some way.
So... any success?