NSX-T 3.1 Multi-site with failure domains

Petersaints · ‎10-21-2021

Hello all,
Following de Admin Guide (https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/nsxt_31_admin.pdf), for multi-site i create two failure domains: FD1A and FD2A.
FD1A have Edge1 + Edge2 and FD2A have Edge3 + Edge4.

I have two doubts:
1 - On Admin guide page 430, it says:"In case of a full primary site failure, the tier-0 standby and tier-1 standby in the secondary site
automatically take over and become the new active gateways." What does it means? Do i need create a second t0, connected to the same edge cluster?

2 - On the failure domains scenario, if for some reason edge1 is down (during an upgrade process), edge2 will be the active edge or all the North/South traffic will be forward by FD2A (edge3 + edge4)?

Thanks.
Regards.

p0wertje · ‎10-22-2021

Hi,

1. No you do not need to create a second T0. In the document, there is a T0 A/S config. The Active T0-part on FD1a and the Standby T0-part on FD2A. When site FD1 fails the standby T0 on FD2 becomes active. But keep in mind that (see page 425) you need stretched layer2 between the two sites. T0 Active and Standby need to be connected to the same layer2 networks to be able to work correctly.

2. The answer is on page 430:

In case of a full primary site failure, the tier-0 standby and tier-1 standby in the secondary site
automatically take over and become the new active gateways. In case of a failure of one of the
Edge nodes in the primary site, the same principle applies. For example, in the diagram below,
assume that Edge node 1B hosts Tier-0-Test and Tier-1-Test, Edge node 2A hosts the Tier-0-Test
standby and Edge node 2B hosts the Tier-1-Test standby. If Edge node 1B fails, the standby
Tier-0-Test on Edge node 2A and standby Tier-1-Test on Edge node 2B take over and become
the new active gateways.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

Petersaints · ‎10-22-2021

Hi @p0wertje

Thanks for the response.

So, for question #1, i only create one T0. The same T0 will be the standby one if FD2A become active, right?

When you say that L2 need to be stretched between sites, you mean only the management network of the edge nodes or also the TEP networks and the BGP peer networks?

about question #2, i also saw the answer on page 430. But what i understood was that in case of failure of only one edge in the primary site, automatically the second site will become active. The second edge will not be use. Is that?

Thanks!

Regards.

p0wertje · ‎10-22-2021

Hi,

First of all, it all depends on your needs and what your (physical) network looks like now. There are different approaches for each situation. I am assuming that you are in a situation that is similar to the one described in the document you mention. If that is not the case I would advise you to look at
NSX-T 3.1 Multi-Location Design Guide (Federation ... - VMware Technology Network VMTN
VMware® NSX-T Reference Design - VMware Technology Network VMTN

#1

Yes. You create one T0 in A/S. Active will be on FD1 and standby on FD2 (both the active and standby parts need to be connected to the same l2 networks)

#2
When you would have, let's say, 20 T1's in A/S, the Active T1's will be spread over the two edge nodes in FD1, with the standby counterpart spread over the two edge nodes in FD2
An edge node can only have one T0. To leverage all four edge nodes when using a T0, you should think about T0 A/A with ECMP.

Please take a look a the two design docs to see what fits your setup best.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

All

NSX-T 3.1 Multi-site with failure domains