Solved: Re: locale-id / local egress question

rikherlaar · ‎06-28-2016

Dear team

Refering to https://pubs.vmware.com/NSX-62/index.jsp?topic=%2Fcom.vmware.nsx-cross-vcenter-install.doc%2FGUID-98...

I have asked the question before a while back - so I try to clarify the gist of it again.

Locale-id and local egress IIRC make sense in a multi-site/multi-vc setup where workload can move between multiple locations insofar they are hooked up to universal LS's .

What I don't 100% grasp here is whether or not we need to heed attention to prefix to site "affinity' - i.e. if I move workload from same subnet from left to right and vice-versa, I can only think of an almost /32 style host-routing underpinning the locale-id capabilities. Even if this would work from a route-controller to ESG (edge router) perspective - how would you guarantee that the physical router up North will understand the fact that we have at times different metrics for the same subnet or host route ?

For plain routing you could argue "who cares" - but if your ESG's are configured for FW'ing - this cannot be ignored obviously.

Can you help me out here ?

Kind regards

rik

yak9 · ‎06-28-2016

Hello,

I was thinking about scenarios like this, discussing them with our network guys and came up only with operational "pinning" of particular subnet to particular site.

So you either configure DRS rules (in case of stretched cluster) and/or write some operational procedures to your staff (maybe even automate VM provision to reduce probability of human error).

For example, you have subnet 192.168.aaa.0/24 (VNI 5aaa) and 192.168.bbb.0/24 (VNI 5bbb) and decide that VMs, connected to VNI 5aaa, should run in site A, and VMs, connected to VNI 5bbb - in site B.

Then you must announce those networks to the external world following way (using OSPF or BGP or anything else):

preferred route to 192.168.aaa.0/24 is via site A border router, non-preferred route is via site B
preferred route to 192.168.bbb.0/24 is via site B border router, non-preferred route is via site A

This way you have routing resilience in case of site outage, but if your operational "pinning" is somehow broken (for example, VM, connected to VNI 5aaa, ended up in site B), you have asymmetrical trafic which most likely will be dropped by your ESGs (as they are firewalls).

I guess, there is just no "easy and clean" solution for this, you have to compromise somehow (maybe leave firewalling on DFW level only, then traffic can be as asymmetric as it happens).

Maybe you should ask yourself first, what is more important in this scenario: traffic locality, ability to freely move VMs between sites or something else.

Another possibility is described in this great blog post http://www.routetocloud.com/2016/02/nsx-dual-activeactive-datacenters-bcdr‌. Look up "Active/Active Datacenter with mirrored ESGs" section. But there is no traffic locality in this solution.

And something about local ingress :smileylaugh: I've stumpled upon another great post some time ago: http://https://networkinferno.net/ingress-optimisation-with-nsx-for-vsphere It is not supported of course, but I hope we will see functionality like this in short time.

View solution in original post

rikherlaar · ‎06-28-2016

To be sure - I understand this feature is geared towards optimizing "egress" traffic in a multi-site topology (locale-id will see to it that kernel forwarding is based on local significant routes only), I have not been able to reads up anything yet on return traffic across FW's/IDPS's - so unless we apply SRC-NAT below the actual state-full inspection (to force outgoing traffic to be "unique" per site - I cannot think of a way to make it work off the bat. Happy to be proven wrong though 😉

regards

Rik

yak9 · ‎06-28-2016

Hello,

I was thinking about scenarios like this, discussing them with our network guys and came up only with operational "pinning" of particular subnet to particular site.

So you either configure DRS rules (in case of stretched cluster) and/or write some operational procedures to your staff (maybe even automate VM provision to reduce probability of human error).

For example, you have subnet 192.168.aaa.0/24 (VNI 5aaa) and 192.168.bbb.0/24 (VNI 5bbb) and decide that VMs, connected to VNI 5aaa, should run in site A, and VMs, connected to VNI 5bbb - in site B.

Then you must announce those networks to the external world following way (using OSPF or BGP or anything else):

preferred route to 192.168.aaa.0/24 is via site A border router, non-preferred route is via site B
preferred route to 192.168.bbb.0/24 is via site B border router, non-preferred route is via site A

This way you have routing resilience in case of site outage, but if your operational "pinning" is somehow broken (for example, VM, connected to VNI 5aaa, ended up in site B), you have asymmetrical trafic which most likely will be dropped by your ESGs (as they are firewalls).

I guess, there is just no "easy and clean" solution for this, you have to compromise somehow (maybe leave firewalling on DFW level only, then traffic can be as asymmetric as it happens).

Maybe you should ask yourself first, what is more important in this scenario: traffic locality, ability to freely move VMs between sites or something else.

Another possibility is described in this great blog post http://www.routetocloud.com/2016/02/nsx-dual-activeactive-datacenters-bcdr‌. Look up "Active/Active Datacenter with mirrored ESGs" section. But there is no traffic locality in this solution.

And something about local ingress :smileylaugh: I've stumpled upon another great post some time ago: http://https://networkinferno.net/ingress-optimisation-with-nsx-for-vsphere It is not supported of course, but I hope we will see functionality like this in short time.

rikherlaar · ‎06-28-2016

Hi Yak,

Yeah that's in-line with my thinking alright - thx for sharing ...

/r

rikherlaar · ‎06-28-2016

Just to confirm that creating site-affinity "above" the ESG layer (regardless "stateless or state-full) can be achieved in different ways, ranging from MED, community based forwarding, prefix-length , AS prepending (BGP) to simple "cost" for OSPF. Your luck may vary based on the situation and it's definitely true that one will want to adhere to multi-site design by making sure one site is always preferred for both egress and ingress flows on a per prefix basis (IPV6 is easier as you can be more generous) . If not... you may as well just deploy a single-site design and accept "wasting" E-W bandwidth (e.g. stretched fabric as underlay with plenty of BW)

rgds

r/

mjovanovic · ‎06-14-2018

It´s an old post, but it seems un-answered, and since I had to handle this topic a lot, here are the 3 options:

- Use GSLB

- Route Injection with /32 that are moved

- Enable Assymetric Routing

There´s no other way, so I guess the idea is to design the architecture having all thiese limitations in mind.

Cheers, www.matscloud.com @matjovanovic VCIX6-NV

All

locale-id / local egress question