VMware Cloud Community
nyjz1298
Contributor
Contributor

VSAN with Active 10 GbE and Standby 1 GbE - Resource allocation issue

I've been working with a customer getting VSAN up and running on a 3 Node cluster.  Due to their budget they could only allocate a total of 3 10 GbE ports and wanted to use 1 GbE ports as Standby nics.  I have a couple concerns.

1.

On the Resource Allocation tab for the VDS these hosts are connecting to shows "Total bandwidth capacity of 1.00 Gbit/s."

Is VSAN or the other connections actually limited to only 1 Gbits or is this just a worse case like the minimum link speed shows?

vsan-resource-allocation.JPG

2.

When testing a failure situation of the 10 GbE (pulling the fibre) things would all flip over to the 1 GbE connection.  The VSAN vmk would also flip over and even respond to vmkping commands from the other servers, but the host would go into a different VSAN Network Partition Group.  You could see the Datastore size shrink on all drives and repeating the process with the other hosts until they're all on 1 GbE would show them all in different Partition Groups.  They'd then all list just the ammount of storage on the datastore that they were providing.  Obviously this is a  

There's only one 10 GbE switch that's in use so all three 10 GbE connections are going to it.

With the 1 GbE, each server is connecting to different physical switches but should still have "layer 2" from my understand as they're on the same dedicated VSAN VLAN.  Vmkping works fine, but I know that VSAN requires multicast to function.  Is it possible or common that multicast isn't spanning these switches on the same VLAN? 

Let me know.  Thanks!

Joe

Reply
0 Kudos
5 Replies
zdickinson
Expert
Expert

Do I understand that there is one 10 Gb switch to which all 3 hosts connect via one uplink each?  And that you then have each host connected via one uplink to separate 1 Gb switches?  And... and the 10 Gb is active and the 1 Gb is standby?

If that's true and I imagine a scenario where a single 10 Gb fails on one host, that host will then start using its 1 Gb connection and not be able to communicate with the other hosts still using their 10 Gb connections.  It sounds like you're trying to anticipate the failure of the entire 10 Gb layer.  In this scenario, I don't know how it would be possible with the 1 Gb uplinks all going to separate switches.

From my experience with vSAN and posts to this forum, this seems like the scenario where we get into trouble.  Using vSAN to save $$$ and not configured in an optimal way.  I would stop and either re-design or abandon vSAN in this scenario.  We procured two Del 4032 10 Gb switches for around $8,000.  Thank you, Zach.

Reply
0 Kudos
nyjz1298
Contributor
Contributor

Yes - The customer has a single 10 GbE switch so all three hosts are connected there.  For 1 GbE they have two top of rack switches and each server is placed in separate racks putting them on different switches.

  For a three node cluster with one server failed to 1 GbE will probably slow the system down which is better than taking the thing completely offline.

I was reading that it could be an IGMP issues with multicast.  I'm going to investigate that further. 

Reply
0 Kudos
zdickinson
Expert
Expert

Are the 1 Gb switches uplinked to the 10 Gb switch?  If so then I could see this working, but as you say it would slow down and vSAN traffic to the host on the 1 Gb network.  If not then the host with the failed 10 Gb uplink would be isolated from the rest of the vSAN cluster until the 10 Gb link was brought back online.

As for IGMP and multicast.  If you have dedicated switches for vSAN you can do the easy config and just enable multi-cast en mass.  But if the switches will carry other any other traffic than vSAN you will need to configure IGMP snooper instead of enabling multicast en mass, doing so could be problematic.

Will you be able to test failover scenarios or is some of this already in production and therefor cannot be taken offline?


Thank you, Zach.

Reply
0 Kudos
jonretting
Enthusiast
Enthusiast

You might want to focus on establishing the 1Gbe connectivity for VSAN first. Get that all hunky dory. Then make sure your uplinks on the switches carry the proper VLAN and have IGMP Snooping enabled for that VLAN, also your ports the hosts. Verify VSAN connectivity and start the migration to 10Gbe. First do the same VLAN tagging for your VSAN 10Gbe ports, enable IGMP snooping for the VLAN, and the corresponding uplinks to your 1GB Switch. Setup your VDS VSAN port group to use the the 10GB nic as a standby for each host. Then put the first host in maintenance, and turn the 1Gbe port off on the 1G switch. See if it fails over to the 10GB, and verify the connectivity. Re-activate that port, and re-run the process for each host. VSAN does not support "beacon failover" so failover should be tested at the link level. Once all the hosts failover properly, on your VDS VSAN port group teaming move the 10Gbe uplink to active, and the 1Gbe to standby. Once again verify functionality.

Hope this helps in some way.

Cheers

Reply
0 Kudos
m0ps
Enthusiast
Enthusiast

So... any success?

best regards, m0ps
Reply
0 Kudos