MinoDC
Enthusiast
Enthusiast

NSX-T and 2 Sites Standalone with same vCenter

Jump to solution

Hello to all...

I should design an NSX-T integration on the existing infrastructure.

The existing infrastructure consists of 2 Sites (Site-A and Site-B) in Active-Active mode, without shared storage, but only L2-Stretched network and L3 HA network between the sites.

Site-A is primary site; Site-B is secondary site.

Each site has its own Cluster and both clusters are managed by the same vCenter.

Similar to this one:
NSX-T Active Active deploy

I've NSX-T Professional license...

Workload VMs are replicated between sites with Veeam and VCSA is in HA on the secondary site.

Is it possible to integrate NSX-T in this architecture, so that in case of Site-A Failure, everything works on Site-B and vice versa?

I've read some of documentation on the Internet, but have not found a solution...

Can you help me in this hard work for me ?

Thanks for any suggestions.

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
shank89
Expert
Expert

First 5 dot points look good, what is the network you are stretching?

  • 2 Manager Node in Site-A and 1 Node in Site-B --- keep in mind while you only have 1 manager up and running, NSX-T will be in read-only mode.  You need at least 2 up for write access, but best to get all nodes up and running ASAP.
  • For the segments to exist at both sites, yes same TZ for easiest DR process.
  • All Edges will be in the same TZ so they can route traffic for the segments in those TZ's.
  • 2 Edges can be in each site, make sure you understand the traffic flow from hosts to edges, it will be balanced using all paths available to the T0SR;s (I linked you to this earlier).
  • Correct, route maps, ,prepends, peering, BFD as required.
  • A/S is your choice for T1, what is your reason for this?
  • Segment in Overlay TZ, this TZ will be linked to hosts and edges
  • T0DR, T1DR and Segments works in active site, because Host and Edge are in the same TZ -- not exactly, see my note above about datapaths and ECMP.
Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au

View solution in original post

19 Replies
p0wertje
Hot Shot
Hot Shot

Hi,

 

Have you looked at NSX-T 3.1 Multi-Location Design Guide (Federation ... - VMware Technology Network VMTN ?
This is a good guide for multi-site.

 

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
Please kudo helpful posts and mark the thread as solved if solved
0 Kudos
MinoDC
Enthusiast
Enthusiast

Hi,

yes, I read it, but my issue is that I've Professional license and it not include Multisite and Federation feature.

for this reason I am looking for a solution, which may be good.

for this reason I am looking for a solution, which may be good.

For example, I was thinking about replicating the Edge VM with Veeam and restoring NSX-T, but does this solution works/is supported?

Or other solutions ...
I accept suggestions 😁

Thanks.

 

 

0 Kudos
p0wertje
Hot Shot
Hot Shot

Hi,

Just an idea:

Have 4 edge vm's (2 in each DC for local redundancy)
Create t0 in active-active ecmp to your core.
Deploy a T1 A-S for DC1, with active on DC1 and standby on DC2 (use failure domains)
And for DC2 vica-versa.
Use a stretched L2 for the vtep network. (you could do it routed, but you have to add some routing somewhere to it)

In this case you will have all your segment on both dc. And still benefit having the T1 in the correct datacenter.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
Please kudo helpful posts and mark the thread as solved if solved
0 Kudos
MinoDC
Enthusiast
Enthusiast

Hi @p0wertje ,

thanks for your reply...

I am not very familiar with the functionality of the fault domain, so I try to explain what I understand.

MinoDC_0-1621957479590.png

I install and configure 3 node NSX-T Manager.

I create 2 Edge for each site and then create an Edge Cluster with all Edges.

I create a T0-Gw (Act/Act) with 8 Uplinks (two for each Edge), enable ECMP and configure Route Maps for correct routing in case of site failure.

I create one T1-Gw (Act/Stb) for each site (with Only DR or also SR ?)

For each Edge, I configure Failure Domain (Edge1 and Edge2 in Failure Domain-A ;Edge3 and Edge4 in Failure Domain-B), in this way T1-A Std will position on the Edge of Site-B and T1-B Std on those of Site-A, right?

Now, I've some questions... 

Is it possible to create 2 NSX-T Manager nodes in Site A and 1 in Site B if the hosts are in two different clusters? (This way I can avoid restoring NSX-T in the event of a site failure)

Does T0-Gw use Failure Domain function like T1-Gw when I implement it in Act / Act mode? If, NO ... how will the T1 traffic be forwarded to the edge if the T0 is not present in the event of a site fault?

 

Sorry and Thanks again...🙏

 

 

 

0 Kudos
shank89
Expert
Expert

You can split your NSX-T manager cluster as long as they meet the RTT and other requirements.

The T0 gateways do not support failure domains.

The T1DR (if you have no SR component), uses ECMP paths to the T0DR, each edge acts as a path to a prefix.  https://communities.vmware.com/t5/VMware-NSX-Documents/NSX-T-and-ECMP/ta-p/2840738

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
p0wertje
Hot Shot
Hot Shot

Hi,

The 'downside' in this design is because of the t0-ecmp to the outside world, that you have 4 paths incoming. And that is over two datacenters.
I don't know if that is acceptable for you. You might be able to steer it with route-maps,I have not tested that, so i don't know the result.

If you really need incoming and outgoing to be on one datacenter, you could go with 
4 edge nodes, but in two edge clusters. One t0 active-standby on the node on DC1, standby on the node on DC2. and visa-versa.
You can only run one t0 per edge node.
The downside of this is the upgrading of you edge nodes, because the traffic goes over the standby node and thus the other DC when you upgrade the active node.

 

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
Please kudo helpful posts and mark the thread as solved if solved
shank89
Expert
Expert

You can steer traffic to the T0 or edges, but T1SR to T0DR uses 2 tuple load balancing with the paths it has available to active SRs.

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
MinoDC
Enthusiast
Enthusiast

Thanks @shank89  & @p0wertje ...

I don't want to complicate the design with the T1SR ... the professional license does not have the LB feature, so for now let's leave the T1SR out ... thanks and sorry if I wrote it, it was just to understand better.

Both the site are Active, so both are receiving N/S traffic.

Honestly what I haven't been able to understand is how to configure T0 when a site goes into fault. (Because the critical point of this design seems to be precisely the routing of traffic from T1DR to T0 in case of fault)

We said that:

  • I can distribute NSX-T Manager on both site, even are in different cluster on the same vCenter. (This solves the NSX-T problem in case of DR)
  • I can use Failure Domain on T1DR Atc/Stb (This solves the T1DR problem in case of DR)

We said that T0 doesn't support Failover Domain. So how can the T0 of Site-A be deployed on the Edge of Site-B, if I create 2 Edges Cluster?

For the T0 and Route Map I saw this link: https://www.lab2prod.com.au/2020/09/nsx-t-active-active-multisite-part2.html

Is there a specific configuration on the T0 side that I have to do in order not to have problems in case of DR?

Thanks a lot , again 🙂

 
 

 

 

0 Kudos
shank89
Expert
Expert

To clarify, failure domains are to predictively place the SR component of the T1 gateways.  The DR component is meant to be distributed and does not have an active and standby component.  Here are a couple of links for that;

For the T0, you can have it in Active/Active or Active/Standby, that is up to you.  If you have 4 edges and have them placed at either site or not, you steer the traffic using prepends and local preference as shown in the link you added in your previous response.  If you would like faster failover and are using BGP, consider using BFD.

As with anything, test failover, ensure the behaviour is what you expect and predicted.

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
MinoDC
Enthusiast
Enthusiast

Thanks @shank89 for clarifying the T1SR Failure Domain.

I know that the T0DR and T1DR is deployed within each ESXi host belonging to the cluster where the Edge VM is present to which the T0, and consequently the T1, is connected, right?
If so, the T0DR of Site-A is not present in the hosts of Site-B.
So how will the traffic work in the event of a fault?

If this is not the case, when I create a T0DR and a T1DR, these are distributed in all hosts prepared with NSX-T, it means that in the event of a Fault I will not have problems as the T0DR and T1DR are already present on the hosts of the other site.

How exactly does it work?

 

 
 

 

 

0 Kudos
shank89
Expert
Expert

This will come down to how you prep the environment.  If you need segments etc available in the second site, you will need to have them all part of the same overlay transport zone.  If this does not happen, the transport nodes in Site-2 will not see the networks you want them to have.

There may be very manual methods of DR to get around this or scripted if you want to (connect the T0 to an edge cluster on the failure site once Site A goes down), but my general recommendation is to simplify DR to avoid any human failures.. I mean if it is a true DR, there's enough going on anyway.

You should find what you need from slide 26 onwards. https://www.dropbox.com/s/tvwqhjhbwd7hy4j/Multisite_NSX-T_3.1-v1.0.pptx?dl=0.

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
MinoDC
Enthusiast
Enthusiast

Of course, I will create segments to connect to T1DR.
All segments of the two sites will be on the same TZ Overlay.
All hosts and Edges from the two sites will be on the same TZs.

Excuse me if I insist, but what I don't understand is if the T0DR and T1DR are distributed on all the hosts of the cluster where there is the Edge to which the T0 is connected and consequently the T1, or on all the hosts prepared with NSX-T, regardless of the cluster where the Edge is positioned to which the T0 and consequently T1 is connected.

Because based on how the T0DR and T1DR are distributed there will be different considerations for the DR ... right?

I saw the ppt on the DR and NSX-T Multisite, thanks.

0 Kudos
shank89
Expert
Expert

T1DRs and T0DRs exist on all transport nodes that are prepared for NSX-T.

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
0 Kudos
MinoDC
Enthusiast
Enthusiast

ah okk ... sorry I didn't understand / read correctly ...

Excuse me again ... I try to summarize everything to see if I understand correctly ....

I have two sites like in the drawing above:

  • servers and storage are dedicated for each site
  • each site has its own vmware cluster
  • both the vmware clusters are managed by the same vCenter
  • between the two sites there is an L2-Stretched network
  • The connectivity between site is 10Gb/s and RTT is <150ms

I create:

  • 2 Manager Node in Site-A and 1 Node in Site-B
  • All TZs are the same for both sites
  • All ESXi and Edge node are in the same TZs
  • 2 Edge in each Site but all in the same Edge Cluster
  • 1 T0 in Act/Act mode, with ECMP/Route Map/BGP/BFD , at each site
  • 1 T1 in Act/Stb mode at each site
  • n Segment Overlay connected to T1 for each site

I don't use Fault Domain because I don't have T1SR (if I had T1SR then I would also use FDs)

In the event of Site failure, everything works (or should 😅) , because:

  • NSX-T Manager Node (2 or 1) is active in active site
  • T0DR, T1DR and Segments works in active site, because Host and Edge are in the same TZ

 

I hope to understood correctly...

Thanks again @shank89  🙏 for you patience 😇

0 Kudos
shank89
Expert
Expert

First 5 dot points look good, what is the network you are stretching?

  • 2 Manager Node in Site-A and 1 Node in Site-B --- keep in mind while you only have 1 manager up and running, NSX-T will be in read-only mode.  You need at least 2 up for write access, but best to get all nodes up and running ASAP.
  • For the segments to exist at both sites, yes same TZ for easiest DR process.
  • All Edges will be in the same TZ so they can route traffic for the segments in those TZ's.
  • 2 Edges can be in each site, make sure you understand the traffic flow from hosts to edges, it will be balanced using all paths available to the T0SR;s (I linked you to this earlier).
  • Correct, route maps, ,prepends, peering, BFD as required.
  • A/S is your choice for T1, what is your reason for this?
  • Segment in Overlay TZ, this TZ will be linked to hosts and edges
  • T0DR, T1DR and Segments works in active site, because Host and Edge are in the same TZ -- not exactly, see my note above about datapaths and ECMP.
Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au

View solution in original post

p0wertje
Hot Shot
Hot Shot

Hi,

 

Sounds correct.
And if you do not need T1-SR, you do not have to deploy it. you just have a DR only then.
And keep the points @shank89 mentions in mind.

 

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
Please kudo helpful posts and mark the thread as solved if solved
0 Kudos
MinoDC
Enthusiast
Enthusiast

what is the network you are stretching? I can extend all L2 networks needed

  • 2 Manager Node in Site-A and 1 Node in Site-B --- keep in mind while you only have 1 manager up and running, NSX-T will be in read-only mode. You need at least 2 up for write access, but best to get all nodes up and running ASAP.
    • ok But if NSX-T is in Read-Only, the network traffics works, but i can't change configuration, right ?
  • A/S is your choice for T1, what is your reason for this?
    • No specific reason, but if T1DR is in A/S I can manage ECMP traffic in ESXi (2 tuple) better, not ?
  • Segment in Overlay TZ, this TZ will be linked to hosts and edges
    • Yes... Same TZ Overlay for Segments,Host and Edge
  • T0DR, T1DR and Segments works in active site, because Host and Edge are in the same TZ -- not exactly, see my note above about datapaths and ECMP.
    • If a site is down, the only paths that work are those of the active site. Then the segment should route traffic through the remaining T0-T1-Edge active, right ... what is it that I could not understand, I'm sorry?

Just a clarification ... all this work with Professional license (no Multisite-Federation feature), right ?

0 Kudos
shank89
Expert
Expert

Dataplane still works if the management plane is down / in readonly.

The choice of A/S is up to you, you will have to work out what is best for your scenario.

Yes, if there is active workload on a segment in the remaining site, and only those hosts and edges exist, the traffic will egress that site.

I would say so, as this just comes down to cluster design within a single instance of NSX-T.

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
MinoDC
Enthusiast
Enthusiast

Perfect ...

Thank you very much for your time and patience

 
 

 

 

0 Kudos