VMware Networking Community
rajeevsrikant
Expert
Expert

NSX Edge ECMP + NAT

At present I have 2 NSX Edge Gateways in ECMP mode with OSPF. Attached is the diagram for reference..

I also need to use NAT functionality on my edge gateway. I need NAT to access few networks.

Lets say Network A (10.10.0.0/16) & Network B (20.20.0.16) This network will not be published outside & it requires NAT to be accessed from outside.

Since ECMP is configured I will not be able to use  NAT due to stateful functionality.

Below is my plan would like to know if this is the right approach.

1.Setup a new Edge Gateway in HA (Active - Standby)

2. Configure NAT in the NSX Edge Gateway.

3. Setup new DLR (with Control VM) for the Network A & Network B. The new DLR will be the D.G for the Network A & B.

4. The NSX controllers will be the existing. No need to setup new NSX controllers.

Please let me know if my approach is right.

Mainly point 3 & 4. - Is new DLR (with Control VM) is mandatory.

Reply
0 Kudos
24 Replies
bayupw
Leadership
Leadership

There is no diagram on your attachment.

You can have ECMP on the aggregation and NAT after the ECMP

See below diagram, tenant on the left

pastedImage_0.png

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Reply
0 Kudos
rajeevsrikant
Expert
Expert

Sorry missed it.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Saw the diagram which you have shared.

Few questions regarding this.

  1. So from the DLR, the exist point will be the Tenant NSX Edge with HA. - Is my understanding right.
  2. separate DLR is not required in this scenario.

Also would like to know regarding my proposal , whether is there any demerit to it.

I prefer not to change anything in my existing topology. In the diagram which I have shared I have put box to indicate the current setup. The right hand side is the new setup I am planning for.

Reply
0 Kudos
bayupw
Leadership
Leadership

In the diagram I have shared, it will be one DLR per tenant.

Edge HA tenant (on the left) have different DLR to the Edge ECMP tenant (on the right).

If you would like to have a separate Edge Gateways with different network/function, you would need to create a separate DLR.

Please also note that multiple DLR to single Edge is not supported as per design guide, below is the diagram

pastedImage_1.png

Regarding your diagram, it is possible but I would suggest to create a separate uplink network for Edge#3 & Edge#4 to connect to physical router,

not sharing the same layer 2 network with Edge#1 and Edge#2.

Same with the DLR transit network to Edge, have a separate transit network logical switch for DLR to Edge HA.

The router could be confused if you have same network advertised through different routers with same cost without ECMP

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Reply
0 Kudos
rajeevsrikant
Expert
Expert

Yeah i got it.

I am planning to have separate uplink network for Edge#3 & Edge#4 to connect to physical router, The representation is my diagram is wrong. Will correct it.

Also will have separate transit network to the new Edge from the new DLR.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

One more question:

From the Edge GW which will be setup as HA, it will have uplink which will be logically connected to both my LAN routers.(OSPF between Edge & Physical device)

So from the edge , there will be 2 paths to reach outside (1 from Physical device 1 & the 2nd path from physical device 2)

So is this consider as ECMP & should i enable the ECMP option in my NSX Edge.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

One more question.

With Edge HA & with 2 upstream routers is it recommended to run OSPF or static route is preferred.

Reply
0 Kudos
bayupw
Leadership
Leadership

For your first questions, it depends on your requrements.

Yes you can use ECMP as per design guide below

pastedImage_0.png

But stateful services do not work on ECMP because there would be asymmetrical routing and stateful services will fail.

So if you have stateful services such as load balancer, edge firewall, NAT, don't use ECMP.

You can set the primary physical router as the primary path and the secondary physical router as the backup path and use cost (or administrative distance for static routes) to set the primary router as the preferred path.

Use a different interface for connection to the secondary physical router

pastedImage_3.png

If the two routers are on the same network, you can also use FHRP and peer with the router's virtual ip address

For your second questions, again, it depends on your requirements.

If your environment is pretty static then static should be fine.

But if the environment is dynamic, new networks (logical switches) often need to be added, then you could use dynamic routing such as OSPF/BGP so you don't need to manually add routes everytime you need to advertise new networks

For ECMP setup, you would need dynamic routing so the ECMP routes can be added/removed automatically by the dynamic routing

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Reply
0 Kudos
rajeevsrikant
Expert
Expert

Thanks.

The diagram which you have shown is the setup i am planning to implement.

I need to use NAT, so i will not enable ECMP to my upstream 2 physical routers.

Initial my understanding was if both the Edge Gateway was active, then only NAT should not be enabled.

From your reply  I understand that , even if only 1 Edge GW is active but if it has 2 equal cost uplinks to 2 different routers , NAT should not be enabled.

Regarding the 2nd question regarding to use static or OSPF.

My preference is to have the design which has minimal down time in case failure of either Edge or Control VM

Static:

- If i use only Static route, there will no DLR Control VM. So there is no failure component of DLR Control VM

- If I use only static, If the active Edge GW fails , normally how long time it will take for the traffic to flow to the standby Edge Gateway (including the time the standby GW becomes active)

OSPF:

- I need to use DLR Control VM. If active Control VM fails, there will be down time. In order to avoid this I need to add static route in Edge Gateway along with OSPF & redistribute.

- If Active Edge GW fails there will be downtime till the route is switched to the Standby Edge.

- To reduce the down time the OSPF timers needs to be fine tuned to have minimum Hello/Dead Interval.

So please suggest which is the best option i should choose.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Further to the above , I have attached the static route design.

1. Edge GW in Active -Standby

2. No DLR Control VM

3. Edge GW will form single L2 Connectivity to 2 Physical routers.

4. HSRP will be configured on Physical Routers.

5. Edge GW will be configured with default route with Next Hop IP as HSRP IP Address.

6. NAT will be enabled in the NSX Edge GW.

7. Physical routers will have OSPF with equal cost paths to outside network.

Routing from NSX Edge to outside network will always happen via Physical Router#1 because of HSRP priority.

This is no issue. South -> North Traffic no issue.

What will happen to the route from Physical router#2 to NSX Edge Gateway.

The north -> south traffic comes from Physical Router#2 to NSX Edge. Will this create any problem to NAT.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Attached diagram for reference.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

bayupw​ & others

Need your inputs below.

pastedImage_0.png

I am planning to deploy similar topology in my environment. Edge will be active - standby (HA) to enable NAT.

From Edge there will be 2 uplinks will be enabled to the 2 Physical Routers.

I will disable ECMP in both DLR & Edge.

Question is from Edge there are 2 paths (Edge -> Physical Router#1, Edge -> Physical Router#2)

Option:1

Edge -> Physical Router#1 - OSPF Cost 1

Edge -> Physical Router#2 - OSPF Cost 1

Equal Cost from Edge to Physical Routers.

Will NAT work under this scenario.

Option:2

Edge -> Physical Router#1 - OSPF Cost 1

Edge -> Physical Router#2 - OSPF Cost 10

Always router 1 is preferred because of OSPF Cost.

Which option is the right one which i need to choose.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

My understanding is that

Edge GWY1 -> Equal Costs to Physical Router#1 & Physical Router#2 is not considered as ECMP

NAT should work. The reason I feel NAT should work is because the traffic IN & OUT is via the same Edge GW.

Let me know if my understanding is right.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

any inputs pls

Reply
0 Kudos
tspires
Enthusiast
Enthusiast

You are correct in your thinking here.

So long as all traffic traffic enters/exits the ESG on the same interface, it doesn't matter where it is coming from upstream. ECMP upstream to your physical devices as you desire would work just fine. That being said, it doesn't seem likely that it would have any benefit since throughput would bottle necked at your ESG unless you're connecting to toasters at your edge.

The reason ECMP through multiple ESGs doesn't work with NAT, Firewalling and Load Balancing is because these are stateful services that rely on symmetric routing.  ECMP through multiple ESGs causes asymmetric routing and the state of each individual traffic flow will only be help in the ESG the session initiated through. This isn't an NSX problem, this is an age old TCP/IP and stateful services problem that you often hit in the traditional networking world

Hopefully that makes sense! Let me know if you need any clarification.

-Trevor

Reply
0 Kudos
arahimidris
Contributor
Contributor

your understanding is correct here , run OSPF with physical routers , enable ECMP on ESG to ensure both Physical routers are in data path , ensure to enable GR On Physical routers , DLR Control VM and ESG . Also tune the OSPF Hello/Dead timers to 30/120 secs between DLR and ESG and Between ESG ( NAT ) and Physical Routers.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Thanks.

Below is my proposed diagram which I am planning to implement.

In Edge I have 2 interfaces.

Interface#1 - 10.10.100.128/27

Interface#2 - 10.10.100.193/27

So from Edge Gateway perspective (ECMP Disabled) 2 equal paths to Physical Routers (R#1 & R2#)

So there are chances that the traffic may leave from Interface#1 & may get the return traffic on Interface#2

So in this case if I enable NAT in ESG will it work.

What is the recommended design in HA Mode to have single Edge Outside interface or the below design will work. ( Considering I will be using NAT in Edge GW in future)

pastedImage_0.png

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Any inputs pls

Reply
0 Kudos
tspires
Enthusiast
Enthusiast

This design won't work with NAT from my understanding because the traffic could be hitting the ESG from different interfaces unless you're engineering traffic upstream. You also map NAT to a specific IP that lives in the same subnet as your uplink interface, so even if you engineered traffic on your physical network to come back on the same interface you'd have to NAT each IP twice and you'd have two different external IPs for each internal IP.

You can however make it work if you give the ESG just a single uplink and put the two physical router uplinks in the same subnet. That's what I would do if I were you.

The other option is just to do an active/standby configuration on your edge uplinks, but that still wouldn't resolve the issue of having to configure NAT on each uplink interface. You need a single ESG uplink interface if you want to do NAT effectively.

Reply
0 Kudos