Solved: Re: Seperate VSAN / Mgmt / Vmotion Traffic on diff...

FatBob74 · ‎09-10-2015

HI

i want to Build a 4 Node VSAN Hybrid Cluster ( 6.x ) .

I have 2 x 10GE Switches dedicated for VSAN Taffic,

also 2 * 1GBE Switches for Management / VM / Vmotion TRaffic

Each Host has 2 10GBE Nics, 1 Uplink to each of the 10GBE Switches

Each Host also has 4 * 1GBE Uplinks , 2 to each of the 1GBE Switches

My Idea ist to use 10GBE just for VSAN TRaffic ( Seperate DVSWitch , one active, one standby) on each 10GBE Switch

Also to use 2 Ports on Management traffic ( Seperate DVSWitch , one active, one standby) on each 1GBE Switch

And to use 2 Ports on VM traffic ( Seperate DVSWitch , one active, one standby) on each 1GBE Switch

VMotion Traffic would be on the same uplinks as management, but active / standby vice versa

Question: is it possiuble / usefull to connect vmotion ports to the 10GBE Uplinks ( active standby vice versa VSAN Traffic )

i appreciate any usefull comments on my planned setup ( i am no networking guy )

JohnNicholson · ‎09-10-2015

Couple thoughts. You can run multiple vKernels for VSAN and vMotion (allowing one on each switch). For VSAN use separate VLAN/Subnets (One on each switch gives you a nice A/B gap) while vMotion needs to have all vkernel's on the same layer 2 subnet ideally. This allows for maximum resiliency, in theory faster failover, and access to throughput. This also has the benefit of if your cluster isn't large (bigger than say 48 ports) you can avoid having multi-cast leaving the switch (generally requires more wrangling with the network cats) as each TOR switch will be its own VSAN network. You then use NIOC to throttle and protect traffic from each other. Now NIOC isn't 100% as effective as SR-IOV (but its good enough) and will let you maximize your 10Gbps investment without flooding things out.

Management Network VMkernel interface = Explicit Fail-over order = P1 active / P2 standby 10.1.1.1/24 VLAN 100
vMotion VMkernel-A interface = Explicit Fail-over order = P1 active / P2 standby 10.1.2.1/24 VLAN 101
vMotion VMkernel-B interface = Explicit Fail-over order = P2 active / P1 standby 10.1.2.2/24 VLAN 101
Virtual Machine Portgroup = Explicit Fail-over order = P1 active / P2 standby 10.1.2.1/22 /VLAN 102
Virtual SAN VMkernel-A interface = Explicit Fail-over order = P1 active / P2 Standby (or do not use)* 10.1.3.1/24 VLAN 103
Virtual SAN VMkernel-B interface = Explicit Fail-over order = P2 active / P1 Standby (or do not use)* 10.1.4.1/24 VLAN 104

In this case VLAN 100, 101, and 102 would need to be on all switches.

*103 and 104 could be setup to only exist on each switch (A/B isolation, set failover not to fail over and don't configure on both) or exist on both (and use Standby) as described. This design will focus on keeping host to host communication on the same switch (Lowers complications with multicast, lowers latency as VSAN traffic will not have to hop to another switch unless you run out of switch ports, although now with dense 40Gbps switches using 10Gbps break outs in theory you could hit the 64 node limit on a single switch).

I'm curious anyone's thoughts on just disabling failover and forcing it to each kernel to stick to its switch (and accepting loss of communication) on that vKernel in the event of a switch failure. I'd like to do some lab tests with both and test switch/path failover between both of these configurations (vs. a single vkernel configuration).

Some people though prefer a "simpler" setup though (and I'm not opposed to that).

Duncan mapped out an active passive failover configuration with single vKernel's for each host.

In theory not being as dependent on NIOC for storage isolation should help latency for the short bursts it takes for NIOC to kick in vs the active active design.

Virtual SAN and Network IO Control‌

Management Network VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
vMotion VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
Virtual Machine Portgroup = Explicit Fail-over order = P1 active / P2 standby
Virtual SAN VMkernel interface = Explicit Fail-over order = P2 active / P1 standby

View solution in original post

zdickinson · ‎09-10-2015

We actually did exactly what you're proposing. 10 Gb NIC_a active, NIC_b passive for vSAN, and swapped for vMotion. Just knowing that in a failure you might want to hold off on vMotions.

vMotions of active VMs with 32 GB of RAM and 4 vCPU, happen in seconds. It's pretty sweet. Thank you, Zach.

JohnNicholson · ‎09-10-2015

Couple thoughts. You can run multiple vKernels for VSAN and vMotion (allowing one on each switch). For VSAN use separate VLAN/Subnets (One on each switch gives you a nice A/B gap) while vMotion needs to have all vkernel's on the same layer 2 subnet ideally. This allows for maximum resiliency, in theory faster failover, and access to throughput. This also has the benefit of if your cluster isn't large (bigger than say 48 ports) you can avoid having multi-cast leaving the switch (generally requires more wrangling with the network cats) as each TOR switch will be its own VSAN network. You then use NIOC to throttle and protect traffic from each other. Now NIOC isn't 100% as effective as SR-IOV (but its good enough) and will let you maximize your 10Gbps investment without flooding things out.

Management Network VMkernel interface = Explicit Fail-over order = P1 active / P2 standby 10.1.1.1/24 VLAN 100
vMotion VMkernel-A interface = Explicit Fail-over order = P1 active / P2 standby 10.1.2.1/24 VLAN 101
vMotion VMkernel-B interface = Explicit Fail-over order = P2 active / P1 standby 10.1.2.2/24 VLAN 101
Virtual Machine Portgroup = Explicit Fail-over order = P1 active / P2 standby 10.1.2.1/22 /VLAN 102
Virtual SAN VMkernel-A interface = Explicit Fail-over order = P1 active / P2 Standby (or do not use)* 10.1.3.1/24 VLAN 103
Virtual SAN VMkernel-B interface = Explicit Fail-over order = P2 active / P1 Standby (or do not use)* 10.1.4.1/24 VLAN 104

In this case VLAN 100, 101, and 102 would need to be on all switches.

*103 and 104 could be setup to only exist on each switch (A/B isolation, set failover not to fail over and don't configure on both) or exist on both (and use Standby) as described. This design will focus on keeping host to host communication on the same switch (Lowers complications with multicast, lowers latency as VSAN traffic will not have to hop to another switch unless you run out of switch ports, although now with dense 40Gbps switches using 10Gbps break outs in theory you could hit the 64 node limit on a single switch).

I'm curious anyone's thoughts on just disabling failover and forcing it to each kernel to stick to its switch (and accepting loss of communication) on that vKernel in the event of a switch failure. I'd like to do some lab tests with both and test switch/path failover between both of these configurations (vs. a single vkernel configuration).

Some people though prefer a "simpler" setup though (and I'm not opposed to that).

Duncan mapped out an active passive failover configuration with single vKernel's for each host.

In theory not being as dependent on NIOC for storage isolation should help latency for the short bursts it takes for NIOC to kick in vs the active active design.

Virtual SAN and Network IO Control‌

Management Network VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
vMotion VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
Virtual Machine Portgroup = Explicit Fail-over order = P1 active / P2 standby
Virtual SAN VMkernel interface = Explicit Fail-over order = P2 active / P1 standby

jonretting · ‎09-10-2015

"I'm curious anyone's thoughts on just disabling failover and forcing it to each kernel to stick to its switch (and accepting loss of communication) on that vKernel in the event of a switch failure. I'd like to do some lab tests with both and test switch/path failover between both of these configurations (vs. a single vkernel configuration)."

If i have this right you are suggesting relying on fail-over at the interface team level. Where a single vKernel is connected to this team. Instinct tells me there might be ARP cache issues, and a very unpredictable time to a solid link. The added complexity of the unique multiple vKernels networks is required in order to make a failure event predictable.

O yah, and great post! Perfect.

FatBob74 · ‎09-10-2015

Thank you, we are going to separate it / try this.

Do you know if there is anywhere a standard setup / best practice Layout for VSAN Host Network connections?

I tried the best practice documents but there seem not to be any clear simple draft of a "normal" setup with management, vmotion and vsan traffic. I need that as a base.to discuss my idea with the network people vs. a "standard setup"

All

Seperate VSAN / Mgmt / Vmotion Traffic on different Switches