I would like to have your inputs regarding 2 different options of NSX deployment.
Option:1 - NSX with VXLAN
- DLR & NSX Edge GW will be there.
- Micro Segmentation is achieved.
Normal design for any NSX Deployment
Option:2 - NSX with out VXLAN
- DLR & NSX Edge GW will not be there.
- Micro Segmentation is achieved.
- Physical switch will provide the required routing
I understand that without VXLAN i can extend the network to my DR.
But would like to know if I go with option 2 what are the demerits when compared with option 1
What benefits option 1 brings in when compared to option 2.
The first thing that comes to mind is of course that in the first scenario, you have optimized East/West routing and no more unnecessary hairpinning to and from the physical switches. You can also stretch your L2 segments over L3 using VXLAN (great for Cross-vCenter NSX deployments and Disaster Recovery!). Lastly, when you have everything in software, it's very easy to add or remove functionality. Creating new networking constructs (LB, routing, new switches, etc.) has never been easier or faster, and you don't have to change anything on the physical network.
Also, don't forget that you've paid for the routing and switching functionality of NSX. That doesn't mean that you HAVE to use it, but if your environment can handle the extra overhead (NSX Controllers, DLR Control VMs, ESGs, etc.), I would definitely advice you to go with option 1. Personally, I would only suggest the second option to (very) small customers that don't have a lot of ESXi hosts and therefore maybe/probably have no resources available for this extra overhead. These customers can still do micro-segmentation, which for a lot of customers is the biggest win in terms of security and the automation thereof.
Thanks. I am putting my view regarding your reply. Correct if i am wrong.
The first thing that comes to mind is of course that in the first scenario, you have optimized East/West routing and no more unnecessary hairpinning to and from the physical switches.
- In the option with VXLAN, if the work loads are in different hosts the traffic will flow through the Physical switch. The routing will be taken care by DLR , but the traffic still needs to flow through the Physical L3 with VXLAN
- This is same with option 2. The only difference in option 2 is that if the VMs are in same host but in different subnet it still needs to go through Physical L3. This is the only difference i can see.
You can also stretch your L2 segments over L3 using VXLAN (great for Cross-vCenter NSX deployments and Disaster Recovery!). Lastly, when you have everything in software, it's very easy to add or remove functionality. Creating new networking constructs (LB, routing, new switches, etc.) has never been easier or faster, and you don't have to change anything on the physical network.
- I agree to this. But in our requirement we are not going to use Cross-vCenter NSX & DR. So this point does not hold as valid point in my scenario.
One more thing i am considering is that in option 1 : if NSX Edge GW goes down it take nearly 10 ~ 14 seconds interruption in the network with Edges in ECMP
But in option 2 i will be able to reduce it. In option 2 there will be no NSX Edge, the failure could be due to Physical switch. In case of failure of the physical switch the switch over to redundant switch will be faster since the connectivity from ESXi host to the Physical switch is L2 rather than L3 .
Let me know if my above understanding is right.
Any inputs ?
Just would like to share that I have a few customers that do both options.
Option 1 for non-critical workloads and option 2 for critical workloads.
The reason is that they want to get familiar with NSX deployment & operations first before implementing Network Virtualization (VXLAN) on critical workloads e.g. production
How to do day-2 operations with NSX, monitoring, etc.
Then on the phase 2, there will be VXLAN on-boarding for critical workloads which can be migrated per network or per applications with bridging if required.
If you have blade servers, you can potentially avoid the hair-pinning traffic if it's connected to a same blade switch.
Microsegmentationn, as pointed is possible with both 2 options, so this important use case is not a deciding factor as pointed previous post.
About the 2nd Option, there are no DLR or Edge router, and Physical L3 switch is the gateway. With this option how will the Vlan-to-Vxlan conversion take place? Since the Logical switches are Vxlan based Port Groups, the dVS on the host is L2. With this option I think at least bridged DLR functionality is needed as the Default gateway of L3 switch is Vlan based.
One exception may be that the Physical L3 switch acting as a Physical VTEP, similar to VTEP interfaces on the ESXi hosts, but L3 Switch license and compatibility with NSX may be needed. With this option the gateway IP address may be the Physical L3 switch, but all the East-West InterVlan traffic passes through this switch which increases the load, number of hops, delay between VMs, and decreases the throughput. As Pointed in the previous post, there may be suboptimal traffic.
Another option may be not to use the Edge, and use DLR and Physical L3 switch. With this the default gateway of VMs is the DLR, and default gateway of the DLR is the Physical Router. If this option is chosen than there may be difficulties with providing services and tenant isolation, multitenancy. Also North --> South Traffic should pass through the DI (Designated Instance) which would decrease this direction traffic to 10Gbps total.
ECMP Convergence Time 10-14 senconds if one Edge is lost. Is the result observed with default ospf timers or tuned hello and hold timers? If convergence time is important these timers could be reduced to 1 and 3 seconds as below:
http://blog.vcumulus.co.uk/nsx-6-2-edge-services-gateway-choices/#sthash.mip0fGW8.dpbs
Option 1.1: Stateful Active/Standby HA Model
I think this option is not considered due to throughput of single Edge as 10Gbps. With this option "Declare Dead Time" value is important.
The Standby NSX Edge leverages the expiration of a specific “Declare Dead Time” timer to detect the failure of its Active peer. This timer is configured by default to 15 seconds and can be tuned down to a minimum value of 6 seconds (via UI or API call), and it is the main factor influencing the traffic outage experienced with this HA model. - See more at: http://blog.vcumulus.co.uk/nsx-6-2-edge-services-gateway-choices/#sthash.mip0fGW8.dpuf
Option 1.2: ECMP HA Model
Since ECMP HA Model is considered, the timers that should be tuned are the Hello and Hold timers.
The length of the outage is determined by how fast the physical router and the DLR Control VM can time out the adjacency to the failed Edge Gateway. It is therefore recommended to tune aggressively the hello/hold time timers (1/3 seconds).
In contrast to the HA standalone model, the failure of one NSX Edge now affects only a subset of the north-south flows (the ones that were handled by the failed unit). This is an important factor contributing to an overall better recovery functionality with this ECMP HA model.
- See more at: http://blog.vcumulus.co.uk/nsx-6-2-edge-services-gateway-choices/#sthash.mip0fGW8.dpuf
https://networkinferno.net/nsx-compendium
"vSphere HA should be enabled for the NSX edge VM’s to help achieve higher levels of availability during failure. It is recommended also that timers are aggressively tuned. Hello and hold timers should be 1 and 3 seconds respectively to speed up traffic recovery."
There are 3 points to configure these timers, Physical L3 Switch, Edge and DLR Control VM, so the match of these timers on all 3 is important. This is because these timers should match in order to form an OSPF neighborship and exchange routing information:
https://learningnetwork.cisco.com/thread/7745
Sending Hello packet
Hello packets are sent out each functioning router interface.
They are used to discover and maintain neighbor
relationships.[6] On broadcast and NBMA networks, Hello Packets
are also used to elect the Designated Router and Backup
Designated Router.
The format of an Hello packet is detailed in Section A.3.2. The
Hello Packet contains the router's Router Priority (used in
choosing the Designated Router), and the interval between Hello
Packets sent out the interface (HelloInterval). The Hello
Packet also indicates how often a neighbor must be heard from to
remain active (RouterDeadInterval). Both HelloInterval and
RouterDeadInterval must be the same for all routers attached to
a common network...
https://networklessons.com/ospf/ospf-hello-and-dead-interval/
Hold timer is generally 4xHello Timer according to the Network Type.
http://packetlife.net/blog/2008/may/14/hello-timer-behaviors/
Adjusting the HelloInterval changes the interval that Hello messages are sent. When you configure the HelloInterval, the RouterDeadInterval is automatically set to four times the HelloInterval. If you want to speed up OSPF convergence, you can reduce the HelloInterval. The RouterDeadInterval will automatically be reduced as well. You will still need to miss 4 Hello messages from your neighbor, but the process of recognizing a down neighbor will be faster.
Changing these timers on the Physical L3 switch may differ according to type and model of the L3 switch, but logic is the same. For some changing hello timer automatically updates Hold timer, so no configuration is necessary for Hold timer.
One option is to configure with REST API, and other is through the GUI
https://vmtechie.wordpress.com/2016/11/27/learn-nsx-part-13-configure-ospf-for-dlr/
During Ospf Configuration steps Hello and Hold timers are configured on the "New Area to Interface Mapping" window.
The same place as the Edge, there is no difference as the GUI is the same. Only point is that these timers should match.
Also May be helpful Troubleshooting Ospf on Edge:
Thanks for the clear explanation.One of the key reasons we are considering option 2 is because of the failure of one of the NSX Edge gateway devices.
I had posted the same on different topic (Please refer it as below)
When the NSX Edge goes down there is down time ( this is acceptable) & when the Edge comes up then there is again down time for 5 ~ 6 seconds.
Not able to understand why there is down time when the edge is coming up. ( The OSPF timers are very well fine tuned to the max)
Any inputs please
Any inputs please
The Edge Gateway can be deployed in HA configuration - so no downtime.
Thanks. But the below says that there will be downtime with HA.