Solved: Re: Active/Active T0 : What's the point? When is ...

PhoenixVM · ‎06-23-2021

Hey there,

I have been debating between an Active/Active (w/ECMP) and Active/Standby T0 design in my environment.

After considering my options, I'm left wondering why one would use an Active/Active T0 topology at all. An Active/Active topology means that you can't run stateful services at the T0, which leaves you undesirable workarounds:

1. Don't run stateful services at all (what use is this system without them?)

2. Deploy a T1 to one of your edge clusters and do services there. This would cause all N-S traffic destined to those services to hook through the *single* active T1 SR, which defeats the purpose of load-balancing with an Active/Active T0 + ECMP to begin with. This could also create an inefficient traffic path depending from which T0 traffic enters the environment.

The only scenario that I can conceive where this would be acceptable or desirable, is if you are service provider creating multiple T1s for different tenants (in which case, it's common sense that a tenant's traffic would only enter/leave via their dedicated T1).

Outside of that scenario, is there any reason to run T0 Active/Active at all?

shank89 · ‎06-23-2021

Hi PhoenixVM,

It's definitely something to consider, but sometimes overthought.

To be honest, the only time I have seen customers go with A/S, is for a specific reason, eg. their upstream devices only support A/S etc.

A couple of pointers

A/A is VVD compliant: https://docs.vmware.com/en/VMware-Validated-Design/6.2/sddc-architecture-and-design-for-a-virtual-in...
A/A aside from ECMP, also allows higher throughput
Rather than enabling a stateful service on the Tier-0 which would pin all tenancies / T1's to a specific Edge for ingress and egress. The better option is to configure them on a Tier-1, ensuring only traffic that needs to route through a specific edge does
Depending on your edge cluster design, from the many deployments of NSX-T, the impact of traffic ingressing the Edge not active for the stateful service is non-existent. That is, the impact of traffic ingressing this edge unnoticeable. It is common for this to be over thought and over-engineered. Again, this comment is dependent on appropriate edge cluster design (A/A SRs local and not spanning sites).

At the end of the day, the deployment model is up to the customer and their specific requirements. But if you are talking about a single site, SRs not spanning across WAN links etc, then it shouldn't really be a concern.

Cheers

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3

View solution in original post

shank89 · ‎06-23-2021

Hi PhoenixVM,

It's definitely something to consider, but sometimes overthought.

To be honest, the only time I have seen customers go with A/S, is for a specific reason, eg. their upstream devices only support A/S etc.

A couple of pointers

A/A is VVD compliant: https://docs.vmware.com/en/VMware-Validated-Design/6.2/sddc-architecture-and-design-for-a-virtual-in...
A/A aside from ECMP, also allows higher throughput
Rather than enabling a stateful service on the Tier-0 which would pin all tenancies / T1's to a specific Edge for ingress and egress. The better option is to configure them on a Tier-1, ensuring only traffic that needs to route through a specific edge does
Depending on your edge cluster design, from the many deployments of NSX-T, the impact of traffic ingressing the Edge not active for the stateful service is non-existent. That is, the impact of traffic ingressing this edge unnoticeable. It is common for this to be over thought and over-engineered. Again, this comment is dependent on appropriate edge cluster design (A/A SRs local and not spanning sites).

At the end of the day, the deployment model is up to the customer and their specific requirements. But if you are talking about a single site, SRs not spanning across WAN links etc, then it shouldn't really be a concern.

Cheers

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3

PhoenixVM · ‎06-24-2021

Hey Shashank,

"Depending on your edge cluster design, from the many deployments of NSX-T, the impact of traffic ingressing the Edge not active for the stateful service is non-existent."

This insight is very helpful. I've come across a few blogs now where admins discourage deploying a T1 SR due in order to avoid the flow above. That gives me something to think about if the universe will not collapse on itself. 🙂

"Rather than enabling a stateful service on the Tier-0 which would pin all tenancies / T1's to a specific Edge for ingress and egress. "

This wasn't top-of-mind for me, being that we're not a traditional service provider. That being said, I'm going with a two-tier architecture to give me options for multiple T1s down the road, so this is a consideration worth bearing.

Thanks for sharing your experience and your logic. It's great to get to the 'why's behind the design decisions.

shank89 · ‎06-24-2021

Not a problem, glad I could help.

If you find the answer adequate, please mark the post as resolved 🙂

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3

All

Active/Active T0 : What's the point? When is it appropriate?