VMware Cloud Community
ewy
Enthusiast
Enthusiast

vsan stretched cluster network design

I need some help validating this Network config for a stretched cluster.

Ill summarize whats on the attached, not so good, diagram.

- cluster is running 6.7u1

- Each data site has Nexus vpc Pairs as cores.

- L2 between data site. the vsan vlan has an EIGRP advertised SVI which is part of an HSRP group with 4 members (1 SVI per nexus, so 2 per site) (this is one of the key things i want to validate) 

- Each host has static route which uses the vsan SVI (10.10.1.0      255.255.255.0        10.10.0.0   vmk2       MANUAL) (want to make sure this is recommended or if i should use something else)

- das.usedefaultisolationaddress = false

- HA advance setting is configured with the following das.isolationaddress(1-4): IPs 10.10.0.2(Site A),  10.10.0.4(SiteB), 10.10.0.3(SiteA),  10.10.0.5(SiteB) in that order (want to validate this too)

- As you can see vsan witness traffic always flows through the primary site (sub optimal routing, not big deal unless HSRP fail-over time is too high and could cause vsan issues after a site failure while 10.10.0.1 becomes available again.

Questions:

1- Should i advertise (route) the vsan SVIs.

2- Can i use a different route for witness traffic to avoid traversing the ISL

3- Is the HA advanced settings (Isolation addresses) configured properly.

What would you recommend to improve this design. Dont be shy with the details Smiley Happy.

I understand this is very networking "heavy" but that is what we have to deal with by using Ethernet based distributed storage systems Smiley Happy which HCI is. As we always hear.. network reliability is key to HCI.

Thanks in advance folks!!

Looking forward to your responses.

5 Replies
MikeStoica
Expert
Expert

vSAN Stretched Cluster Guide | VMware here you have a Design Considerations section.

Reply
0 Kudos
GreatWhiteTec
VMware Employee
VMware Employee

The Stretched Cluster guide is a good start.

Isolation Address look ok.

I would send the witness through the primary site as that defeats the purpose in case of a network outage. You can use L3 for witness as well as WTS (Witness Traffic Separation) to send witness traffic using other network. Witness traffic is not very heavy, just metadata and the requirements are low IMO.

ewy
Enthusiast
Enthusiast

@Great_White_Tec

I think you meant you wouldn't send it through the primary site. I agree, but in my case the gateway used to send witness traffic to the witness site is part of an HSRP group  which means that the gateway will (should) always be available even after the primary site fails.

I have been reading about implementing WTS since it is available on the version I am running, but have a couple of things I want to confirm first.

If I understand it correctly I would add another vmkernel to each host  tagged for witness traffic (I could use vmk0 but rather not).

For instance:

Site A Hosts vmkWitness 10.10.3.0/24 (vlan 103) , gw 10.10.3.1 only located on Site A

Site B Hosts vmkWitness 10.10.4.0/24 (vlan 104) , gw 10.10.4.1 only located on Site B

and then add static routes to each host as follow:

- Site A hosts:

esxcli network ip route ipv4 -n 10.10.1.0/24 -g 10.10.3.1

- Site B Hosts:

esxcli network ip route ipv4 -n 10.10.1.0/24 -g 10.10.4.1

- Witness Host

To reach Site A hosts:

esxcli network ip route ipv4 -n 10.10.3.0/24 -g 10.10.1.1

To reach Site B hosts:

esxcli network ip route ipv4 -n 10.10.4.0/24 -g 10.10.1.1

10.10.1.1 is the witness host subnet gateway located on the witness site.

Something very similar to this

Setup Step 5: Validate Networking | vSAN Stretched Cluster Guide | VMware

Reply
0 Kudos
Jasemccarty
Immortal
Immortal

Your example is using the same gateway for the vSAN Witness Host to communicate with either site.

Routing to the vSAN Witness Host is required to be done per site.

It is important that the sites communicate with the vSAN Witness Host independently.

If the HSRP gateway is only available in Site A, but Site A has been isolated, then vSAN won't be able to have more than a single site contributing, resulting in inaccessible data.

  • Site A isolated
  • Site B & vSAN Witness Host can't communicate because neither can access the gateway, which resides in the isolated site.
  • Data inaccessible


An HSRP gateway that resides in a single site isn't going to allow for proper failover.

Alternatively, if Site A and Site B can communicate with the vSAN Witness Host independently,

  • Site A isolated
  • Site B & vSAN Witness Host can communicate
  • Data accessible but out of Storage Policy Compliance.
Jase McCarty - @jasemccarty
Techstarts
Expert
Expert

If I understand it correctly I would add another vmkernel to each host  tagged for witness traffic (I could use vmk0 but rather not).

This is 100% correct. I think you forgot but must have taken care. That is the segregation of Witness Node traffic. By default, it gets attached to vmk0. Unless you wish to keep that way everything looks okay. I have created similar design. In our case, (we have leaf and spine) we have kept Leaf01 and Leaf02 as Isolation addresses for Primary site and Leaf03 and Leaf04 as isolation addresses for Secondary site. Leaf01, Leaf02 are in DC1 and Leaf03 and Leaf04 in DC2.

I have attached reference Architecture from my design

pastedImage_1.png

With Great Regards,
Reply
0 Kudos