Skip navigation
oziltener Novice
VMware Employees

NSX-T N-VDS VLAN Pinning

Posted by oziltener Aug 19, 2019

Dear readers

As you are probably aware NSX-T use its own vSwitch called N-VDS. The N-VDS is primarily used to encapsulate and decapsulate GENEVE overlay traffic between NSX-T transport nodes along supporting the distributed Firewall (dFW) for micro-segmentation. The N-VDS requires its own dedicated pNIC interfaces. These pNIC cannot be shared with vSphere vSwitches (vDS or vSS). Each NSX-T transport node has in a typically NSX-T deployment one or two Tunnel End Points (TEPs) to terminate the GENEVE overlay traffic. The number of TEP is directly related to the attached Uplink Profile. In case you use an uplink teaming policy "failover", then only a single TEP is used. In case of a teaming policy "Load Balance Source" then you have for each physical NIC a TEP assigned. Such an "Load Balance Source" Uplink Profile is showed below and will be used for this lab exercise.

Screen Shot 2019-08-19 at 20.07.00.png

The mapping of the "Uplinks" is as follow:

  • ActiveUplink1 is the pNIC (vmnic2) connected to ToR switch NY-CAT3750G-A
  • ActiveUplink2 is the pNIC (vmnic3) connected to ToR switch NY-CAT3750G-B

 

Additionally, you could see the VLAN 150 to carry the GENEVE encapsulated traffic.

 

However, the N-VDS can also be used for VLAN-based segments. VLAN-based segments are very similar as vDS portgroups. In deployment, where your hosts has only two pNICs and both pNICs are used for the N-VDS (yes, for redundancy reason), you have to use VLAN-based segments to carry VmKernel interfaces (e.g. mgmt, vMotion or vSAN). When your VLAN-based segments are used to carry VMKernel interface traffic and you use an Uplink Profile as shown above, then it is difficult to figure out on which pNIC the VmKernel traffic is carried, as these traffic is following the default teaming policy, in our case "Load Balance Source". Please note, VLAN-based segments is not limited to VmKernel traffic, such segment can also carry regular virtual machine traffic.

 

There are often good reasons to do traffic steering to have a predicable traffic flow behavior, as example you would like to transport Management and vMotion VmKernel traffic under normal conditions (all physical links are up) on pNIC_A and vSAN on pNIC_B. One of the top two reasons are:

1.) predict the forwarding traffic pattern under normal conditions (all links are up) and align as example the VmKernel traffic with the active First Hop Gateway Protocol (e.g. HSRP)

2.) reduce ISL traffic between the two ToR Switches or ToR-to-Spine traffic for high load traffic (e.g. vSAN or vMotion) along with predictable and low latency traffic forwarding (assume as example you have 20 hosts in a single rack and all hosts use for vSAN the left ToR Switch, in such situation the ISL is not carrying vSAN traffic)

 

This is where NSX-T "VLAN Pinning" comes into the game. The term "VLAN Pinning" is in our NSX-T public documentation referred as "Named Teaming Policy". Actually I like the term "VLAN Pinning". In this lab exercise for this blog, I would like to show you how you could configure "VLAN Pinning". The physical lab setup looks like the diagram below:

Physical Host Representation-Version1.png

For this exercise is only host NY-ESX72A relevant. This host NY-ESX72A is attached to two Top of Rack (ToR) Layer 3 Switches, called NY-CAT3750G-A and NY-CAT3750G-B. As you see, this hosts has four pNICs (vmnic0...3). But only the pNIC vmnic2 and vmnic3 assigned to the N-VDS are relevant for this lab exercise. On the host NY-ESX72A, I have created three additional "artificial/dummy" VmKernel interfaces (vmk3, vmk4, vmk5). Each of the three VmKernel is assigned to a dedicated NSX-T VLAN-based segment. The diagram below shows the three VmKernel interfaces, all attached to a dedicated VLAN-based segment owned by the N-VDS (NY-NVDS) and the MAC address from vmk3 as example.

Screen Shot 2019-08-19 at 21.00.26.png

 

The simplified logical setup is shown below:

Logical Representation-default-teaming-Version1.png

 

 

From the NSX-T perspective we actually have configured three VLAN-based segments. These VLAN-based segments are created with the new policy UI/API.

NSX-T-VLAN-Segments-red-marked.png

The policy UI/API is the new interface since NSX-T 2.4.0 which is the preferred interface for the majority of NSX-T deployments. The "legacy" UI/API is still available and is visible in the UI under the tab "Advanced Networking & Security".

 

As already mentioned, the three VLAN-based segments use the default teaming policy (Load Balance Source), so the VMkernel traffic is distributed over the two pNIC (vmnic2 or vmnic3). Hence, we typically cannot predict, which of the ToR switches will learn the associated MAC address from the three individual VMkernel interfaces. Before we move forward and configure "VLAN Pinning", lets see how the three VmKernel traffic is distributed. One of the easiest way is to check the "MAC address" table for the two ToR switches for interface Gi1/0/10.

Screen Shot 2019-08-19 at 20.53.59.png

As you could see NY-CAT3750G-A is learning the MAC address from vmk3 (0050.5663.f4eb) only, whereas NY-CAT3750G-B is learning the MAC address from vmk4 (0050.5667.50eb) and vmk5 (0050.566d.410d). With the default teaming option "Load Balance Source", the administrator has actually no option to steer the traffic. Please ignore the two learned MAC addresses from VLAN 150, these are TEP MAC addresses.

 

Before we now configure VLAN Pinning, lets assume we would like that vmk3 and vmk4 are learnt on NY-CAT3750-A and vmk5 on the NY-CAT3750-B (when all links are up). We would like to use two new "Named Teaming Policies" with failover. The traffic flows should look like the diagram below --> dotted line means "standby link".

Logical Representation-vlan-pinning-teaming-Version1.png

The first step is to create two additionally "Named Teaming Policies". Please compare this diagram with the very first diagram above. Please be sure you use the identically names for the uplinks (ActiveUplink1 and ActiveUplink2) as for the default teaming policy.

Edit-Uplink-Profile.png

 

The second step is we need to make these two new "Named Teaming Policy" or the associated VLAN transport zone (TZ) available.

Edit-TZ-for-vlan-pinning.png

The third and last step is to edit the three VLAN-based segments according to your traffic steering policy. As you could see, we unfortunately need to edit the VLAN-based segments in the "legacy" "Advanced Networking & Security" UI section. We plan to support this editing option to be available in the new policy UI/API in one of the future NSX-T releases.

NY-VLAN-SEGMENT-90.png

NY-VLAN-SEGMENT-91.png

NY-VLAN-SEGMENT-92.png

As soon you edit the VLAN-based segments with the new "Named Teaming Policy", the ToR switches will immediately learn the MAC address from the associated physical interfaces.

The two ToR switches learn after applying "VLAN Pinning" through two new "Named Teaming Policy" in the following way:

Catalyst-MAC-table-with-vlan-pinning.png

As you could see NY-CAT3750G-A is learning now the MAC address from vmk3 and vmk4, whereas NY-CAT3750G-B is learning only the MAC address from vmk5.

Hope you had a little bit fun reading this NSX-T VLAN Pinning write-up.

 

 

Software Inventory:

vSphere version: 6.5.0, build 13635690

vCenter version: 6.5.0, build 10964411

NSX-T version: 2.4.1.0.0.13716575

 

Blog history

Version 1.0 - 19.08.2019 - first published version

Dear readers

I was recently at the customer site, where we have discussed the details about the NSX-T north/south connectivity with active/active edge node virtual machines to maximizing throughput and resiliency. To achieve the highest north to south and vice versa bandwidth requires the installation of multiple edge nodes in active/active mode leveraging ECMP routing.

But lets have first a basic view of a NSX-T ECMP deployment.

The physical router is in a typical deployment a Layer 3 leaf switch acting as Top of Rack (ToR) device. Two of them are required to provide redundancy. NSX-T support basically two edge node deployment option. Active/standby and active/active deployments. For maximizing throughput and highest level of resiliency is the active/active deployment option the right choice. NSX-T is able to install up to eight paths leveraging ECMP routing. As you are most likely already familiar with NSX-T, then you know that NSX-T requires the Service Router (SR) component on each individual edge nodes (VM or Bare Metal) to setup the BGP peering with the physical router. But have you ever thought about the details what does eight ECMP path entries really mean? Are these eight paths counted on the Tier0 logical router or on the edge node itself or where?

 

Before we talk about the eight ECMP paths let us have a closer look to the physical setup. For this exercise I have in my lab only 4 ESXi hosts available. Each host is equipped with four 1Gbit/s pNIC. Two of these ESXi hosts are purely used to provide CPU and memory resources to the edge node VMs and the other two ESXi hosts are prepared with NSX-T (NSX-T VIBs installed). The two "Edge" ESXi hosts have two vDS, each with 2 pNIC configured. The first vDS is used for vmk0 management, vMotion and IPStorage, the second vDS is used for the Tunnel End Point (TEP) encapsulated GENEVE traffic and the routed uplink(s) traffic towards the ToR switches. The edge node VM is acting as NSX-T transport nodes, they have typically two or three N-VDS embedded (future release will support a single N-VDS per edge node). The two compute hosts are prepared with NSX-T, they act also as transport nodes and they have a slightly different setup regarding vSwitches. The first vSwitch is again a vDS with two pNIC and is used for vmk0 management, vMotion and IPStorage. The other two pNIC are assigned to the NSX-T N-VDS and is responsible for the TEP traffic. The diagram below shows the simplified physical setup.

Physical Host Representation-Version1.png

As you could easily see, the two "Edge" vSphere hosts have totally eight edge node VMs installed. This is a purpose-built "Edge" vSphere cluster to serve edge node VMs only. Is this kind of deployment recommend in a real customer deployment? It depends :-)

To have 4 pNICs probably is a good choice, but most likely are 10Gbit/s or 25Gbit/s interfaces instead 1Gbit/s interfaces preferred respective required for high bandwidth throughput. When you host more than one edge node VM per ESXi hosts, then I recommend to use at least 25Gbit/s interfaces. As our focus is on maximizing throughput and resiliency, a customer deployment would have likely 4 or more ESXi hosts for the Edge" vSphere cluster.  Other aspects should be consider as well, like the used storage system (e.g vSAN), operational aspects (e.g. maintenance mode) or vSphere cluster settings. For this lab are "small" sized edge node VM used; real deployment should use "large" sized edge node VM where maximal throughput is required. To have a dedicated purpose-built "Edge" vSphere cluster can be considered as best practice when maximal throughput and highest resiliency along with operation simplification is required. Here two additional diagrams from the edge node VM deployment in my lab.

Screen Shot 2019-08-06 at 06.06.20.png

Screen Shot 2019-08-06 at 06.14.38.png

 

As we now have already an idea, how the physical environment looks, it is now time to move forward and dig into the logical routing design.

 

Multi Tier Logical Routing-Version1.png

To simplify the diagram, the diagram shows only a single compute transport node (NY-ESX70A) and only six of the eight edge node VMs. All these eight edge node VMs are assigned to a single NSX-T edge cluster and these edge cluster is assigned to the Tier0 logical router. The logical design show a two tier architecture with Tier0 logical routers and two Tier1 logical routers. This is very common design. Centralized services are not deployed at Tier1 level in this exercise. A Tier0 logical router consist in almost all cases (as you normally want use static or dynamic routing to reach the physical world) of a Service Router (SR) and a Distributed Router (DR). Only the edge node VM can host the Service Router (SR). As already said, the Tier1 logical router has in this exercise only the DR component instantiated, a Service Router (SR) is not required, as centralized service (e.g. Load Balancer) are not configured. Each SR has two eBGP peerings with the physical routers. Please keep in mind, only the two overlay segments green-240 and blue-241 are user configured segments. Workload VMs are attached to these overlay segments. This overlay segment provides VM mobility across physical boundaries. The segment between the Tier0 SR and DR and the segments between the Tier0 DR and Tier1 DR are automatically configured overlay segments through NSX-T, including the IP addressing assignment.

Meanwhile, you might have already recognized that eight edge node might be equally with eight ECMP path. Yes this is true....but where we have these eight ECMP path installed in the routing respective in the forwarding table? These eight paths are not installed on the logical construct Tier0 logical router nor on a single edge node. The eight ECMP path are installed on each Tier0 DR component of the individual compute transport node, in our case on the NY-70A Tier0 DR and NY-71A Tier0 DR. The CLI output below shows the forwarding table on the compute transport node NY-ESX70A.

 

IPv4 Forwarding Table NY-ESX70A Tier0 DR

NY-ESX70A> get logical-router e4a0be38-e1b6-458a-8fad-d47222d04875 forwarding ipv4

                                   Logical Routers Forwarding Table - IPv4                            

--------------------------------------------------------------------------------------------------------------

Flags Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]

[H: Host], [R: Reject], [B: Blackhole], [F: Soft Flush], [E: ECMP]

 

                   Network                               Gateway                Type               Interface UUID   

==============================================================================================================

0.0.0.0/0                                              169.254.0.2              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

0.0.0.0/0                                              169.254.0.3              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

0.0.0.0/0                                              169.254.0.4              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

0.0.0.0/0                                              169.254.0.5              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

0.0.0.0/0                                              169.254.0.6              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

0.0.0.0/0                                              169.254.0.7              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

0.0.0.0/0                                              169.254.0.8              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

0.0.0.0/0                                              169.254.0.9              UGE     48d83fc7-1117-4a28-92c0-7cd7597e525f

100.64.48.0/31                                           0.0.0.0                UCI     03ae946a-bef4-45f5-a807-8e74fea878b6

100.64.48.2/31                                           0.0.0.0                UCI     923cbdaf-ad8a-45ce-9d9f-81d984c426e4

169.254.0.0/25                                           0.0.0.0                UCI     48d83fc7-1117-4a28-92c0-7cd7597e525f

--snip--

Each compute transport node can distribute the traffic sourced from the attached workload VMs from south to north for these eight paths (as we have eight different next hops), a single paths per Service Router. With such a active/active ECMP deployment we can maximize the forwarding bandwidth from south to north. This is shown in the diagram below.

Multi Tier Logical Routing-South-to-North-Version1.png

On the other hand, from north to south, each ToR switch has eight path installed (indicated with "multipath") to reach the destination networks green-240 or blue-241. The ToR switch will distributed the traffic from the physical world to all of the eight next hops. Here we achieve as well the maximum of throughput from north to south. Lets have a look to the two ToR switches routing table for the destination network green-240.

 

BGP Table for "green" prefix 172.16.240.0/24 on RouterA and RouterB

NY-CAT3750G-A#show ip bgp 172.16.240.0/0

BGP routing table entry for 172.16.240.0/24, version 189

Paths: (9 available, best #8, table Default-IP-Routing-Table)

Multipath: eBGP

Flag: 0x1800

  Advertised to update-groups:

     1          2  

  64513

    172.16.160.20 from 172.16.160.20 (172.16.160.20)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.160.22 from 172.16.160.22 (172.16.160.22)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.160.23 from 172.16.160.23 (172.16.160.23)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.160.21 from 172.16.160.21 (172.16.160.21)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.160.27 from 172.16.160.27 (172.16.160.27)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.160.26 from 172.16.160.26 (172.16.160.26)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.160.25 from 172.16.160.25 (172.16.160.25)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.160.24 from 172.16.160.24 (172.16.160.24)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath, best

  64513

    172.16.3.11 (metric 11) from 172.16.3.11 (172.16.3.11)

      Origin incomplete, metric 0, localpref 100, valid, internal

NY-CAT3750G-A#

NY-CAT3750G-B#show ip bgp 172.16.240.0/0

BGP routing table entry for 172.16.240.0/24, version 201

Paths: (9 available, best #9, table Default-IP-Routing-Table)

Multipath: eBGP

Flag: 0x1800

  Advertised to update-groups:

     1          2  

  64513

    172.16.161.20 from 172.16.161.20 (172.16.160.20)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.161.23 from 172.16.161.23 (172.16.160.23)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.161.21 from 172.16.161.21 (172.16.160.21)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.161.26 from 172.16.161.26 (172.16.160.26)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.161.22 from 172.16.161.22 (172.16.160.22)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.161.27 from 172.16.161.27 (172.16.160.27)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.161.25 from 172.16.161.25 (172.16.160.25)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath

  64513

    172.16.3.10 (metric 11) from 172.16.3.10 (172.16.3.10)

      Origin incomplete, metric 0, localpref 100, valid, internal

  64513

    172.16.161.24 from 172.16.161.24 (172.16.160.24)

      Origin incomplete, metric 0, localpref 100, valid, external, multipath, best

NY-CAT3750G-B#

 

Traffic arriving at the Service Router (SR) from the ToR switches is kept locally on the edge node before the traffic is forwarded to the destination VM (GENEVE encapsulated). This is shown in the next diagram below.

Multi Tier Logical Routing-North-to-South-Version1.png

And what is the final conclusion of this little lab exercise?

Each single Service Router on an edge node provide to each individual compute transport node exactly a single next hop. The number of BGP peerings per edge node VM is not relevant for the eight ECMP path, the number of edge nodes is relevant. Theoretically a single eBGP peer from each edge node would achieve the same number of ECMP path. But please keep in mind, two BGP session per edge node provide better resiliency. Hope you had a little bit fun reading this NSX-T ECMP edge node write-up.

 

Software Inventory:

vSphere version: 6.5.0, build 13635690

vCenter version: 6.5.0, build 10964411

NSX-T version: 2.4.1.0.0.13716575

 

Blog history

Version 1.0 - 06.08.2019 - first published version

Version 1.1 - 19.08.2019 - minor changes

Dear readers

this is the second blog of a series related to NSX-T. This second coverage provide you relevant information required to better understand the implication of a centralized services in NSX-T. In the first blog where I have provided you an introduction of the lab setup, this second blog will now discuss the impact when you add a Tier-1 Edge Firewall for the tenant BLUE. The diagram below shows the logical representation of the lab setup with the Edge Firewall attached to the Tier-1 uplink interface of the Logical Router for tenant BLUE.

Blog-Diagram-2.1.png

 

For this blog I have selected to add an Edge Firewall for a Tier-1 Logical Router, but I could have also selected a Load Balancer, VPN service or NAT service. The implication to the "internal" NSX-T networking are similar. However, please keep in mind, not all NSX-T centralized services are supported at the Tier-1 level (as example VPN) or at Tier-0 (as example Load Balancer) with NSX-T 2.3 and not all services (as example DHCP or Metadata Proxy) will instantiate a Service Router.

 

Before I move forward and try to explain what is happen under the hood when you enable an Edge Firewall, I would like to update you with some additional information to the diagram below.

Blog-Diagram-2.2.png

I am sure you are already familiar with this diagram above, as we have talked about the same in my first blog. Each of the four Transport Nodes (TN) has two tenants Tier-1 Logical Routers instantiated. Inside of each Transport Node there are two Logical Switches with VNI 17295 and 17296 between the Tier-1 tenant DR and Tier-0 DR used. These two automatically instantiated (sometimes referred as auto-plumbing) transit overlay Logical Switches have got subnets 100.64.144.18/31 and 100.64.144.20/31 automatic assigned. Internal filtering avoids duplicate IP address challenges;  in the same way as NSX-T is doing already for gateways IP (.254) the Logical Switches 17289 and 17294 where the VMs are attached. Each of this Tier-1 to Tier-0 transit Logical Switch (17295 and 17296) could be showed as linked together in the diagram, but as internal filtering takes place, this is for the moment irrelevant.

The intra Tier-0 Logical Switch with the VNI 17292 is used to forward traffic between the Tier-0 DRs and towards northbound via the Service Router (SR). This Logical Switch 17292 has again an automatic assigned IP subnet (169.254.0.0/28). Each Tier-0 DR has assigned the same IP address (.1), but the two Service Routers use different IPs (.2 and .3), otherwise the Tier-0 DR would not be able to forward based on equal cost with two different next hops.

 

Before the network administrator is able to configure an Edge Firewall for tenant BLUE at the Tier-1 level, he has to assign and edge-cluster to the Tier-1 Logical Router along the edge-cluster members. This is shown in the diagram below.

Blog2-add-edge-nodes-to-Tier1-BLUE.png

Please be aware, as soon as you assign an edge-cluster to a Tier-1 Logical Router a Service Router is automatically instantiated, independent of the Edge Firewall.

 

These two new Service Routers are running on the edge-nodes and they are in active/standby mode. Please see in the next diagram below.

Blog2-routing-tier1-blue-overview.png

 

The configuration of the tenant BLUE Edge Firewall itself is shown in the next diagram. Here we use for this lab the default firewall policy.

Blog2-enable-edge-firewall.png

This simple configuration step with adding the two edge-nodes to the Tier-1 Logical Router for tenant BLUE will cause that NSX-T "re-organize" the internal auto-plumbing network. To understand what is happening under the hood, I have divided these internal network changes into four steps instead showing only the final result.

 

In step 1, NSX-T will internally disconnect the Tier-0 DR to Tier-1 DR for the BLUE tenant, as the northbound traffic needs to be redirected to the two edge-nodes, where the Tier-1 Service Routers are running. The internal Logical Switch with VNI 17295 is now explicit linked together between the four Transport Nodes (TN).

Blog-Diagram-2.3.png

 

In step 2, NSX-T automatically instantiate on each edge-node a new Service Router at Tier-1 level for the tenant BLUE with an Edge Firewall. The Service Router are active/standby mode. In this example, the Service Router running on the Transport Node EN1-TN is active, where the Service Router running on EN2-TN is standby. The Tier-1 Service Router uplink interface with the IP address 100.64.144.19 is either UP or DOWN.

Blog-Diagram-2.4.png

 

In step 3, NSX-T connects the Tier-1 Service Router and the Distributed Router for the BLUE tenant together. For this connection is a new Logical Switch with VNI 17288 added. Again, the Service Router running on EN1-TN has the active interface with the IP address 169.254.0.2 up and the Service Router on EN2-TN is down. This ensure, that only the active Service Router can forward traffic.

Blog-Diagram-2.5.png

 

In the final step 4, NSX-T extends the Logical Switch with VNI 17288 to the two compute Transport Nodes ESX70A and ESX71A. This extension is required to route traffic from vm1 as example on the local host before the traffic is forwarded to the Edge Transport Nodes. NSX-T adds finally the required static routing between the different Distributed and Service Routers. NSX-T does all these steps under the hood automatically.

Blog-Diagram-2.6.png

 

The next diagram below shows a traffic flow between vm1 and vm3. The traffic sourced from vm1 will first hit the local DR in the BLUE tenant on ESX70A-TN. The traffic now needs to be forwarded to the active Tier-1 Service Router (SR) with the Edge Firewall running on Edge Transport Node EN1-TN. The traffic reach then the Tier-0 DR on EN1-TN and then is the traffic forwarded to the RED Tier-1 DR and finally arrives at vm3. The return traffic will hit first the local DR in the RED tenant on ESX71A-TN before the traffic reach the Tier-0 DR on the same host. The next hop is the BLUE Tier-1 Service Router (SR). The Edge Firewall inspects the return traffic and forwards the traffic locally the BLUE Tier-1 DR before finally the traffic arrives back at vm1. The majority of the traffic is handled locally on the EN1-TN. The used bandwidth between the physically hosts and therefore the GENEVE encapsulated traffic is the same as without the Edge Firewall. But as everybody could imagine an edge-node which might hosts multiple Edge Firewalls for multiple tenants or any other centralized services should be designed accordingly.

Blog-Diagram-2.7.png

 

Hope you had a little bit fun reading these two blogs. Feel free to share this blog!

 

Lab Software Details:

NSX-T: 2.3.0.0

vSphere: 6.5.0 Update 1 Build 5969303

vCenter:  6.5 Update 1d Build 2143838

 

Version 1.0 - 10.12.2018

Dear readers

this is the first blog of a series related to NSX-T. This first coverage provide you a simple introduction to the most relevant information required to better understand the implication of a centralized services in NSX-T. A centralized service could be as example a Load Balancer or an Edge Firewall.

 

NSX-T has the ability to do distributed routing and supports distributed firewall. Distributed routing in the context that each host, which is prepared for NSX-T, can do local routing. From the logical view is this part called Distributed Router (DR). The DR is part of a Logical Router (LR) and this LR can be configured at Tier-0 or at Tier-1 level. Distributed routing is perfect for scale and could reduce the bandwidth utilization of each physical NIC on the host, as the routing decision is done on the local host. As example, when the source and the destination VM is located on the same host, but connected to different IP subnets and therefore attached to different overlay Logical Switches, then the traffic never leaves the host. All traffic forwarding is processed on the host itself instead at the physical network as example on the ToR switch.

Each host which is prepared with NSX-T and attached to a NSX-T Transport Zone is called a Transport Node (TN). Transport Nodes have implicit a N-VDS configured, which provides as example GENEVE Tunnel Endpoint or is responsible for the distributed Firewall processing. However, there are services like load balancing or edge firewalling which is not a distributed service. VMware call these services "centralized services". Centralized services instantiate a Service Router (SR) and this SR runs on the NSX-T edge-node (EN). An edge-node could be a VM or a bare metal server. Each edge-node is also a Transport Node (TN).

 

Lets have now a look to a simple two tier NSX-T topology with a tenant BLUE and a tenant RED. Both have for new no centralized services at Tier-1 level enabled. For the North-South connectivity to the physical world, there is already a centralized services at the Tier-0 instantiated. However, we don't want focus on this North-South routing part, but as we later would like the understand, what it means to have a centralized services configured on a Tier-1 logical router, it is important to understand this part as well, because North-South routing is also a centralized service. The diagram below shows the logical representation of a simple lab setup. This lab setup will later be used to instantiate the a centralized service at a Tier-1 Logical Router.

Blog-Diagram-1.png

For those which like to get a better understanding of the topology, I have included a diagram of the physical view below. In this lab, we actually use 4 ESXi hosts. For simplification we focus in this blog on the Hypervisor ESXi, instead KVM, even we could build a similar lab with KVM too. On each of these two Transport Nodes ESX70A-TN and ESX71A-TN is a VM installed. The two other hosts ESX50A and ESX51A are NOT* prepared for NSX-T, but they host on each a single edge-node (EN1 and EN2) VM. These two edge-nodes don't have to run on two different ESXi hosts, but it is recommended for redundancy reason.

Blog-Diagram-2.png

As shown in the next diagram, we combine now the physical and logical view. The two Transport Nodes ESX70A-TN and ESX71A-TN have only DRs at Tier-1 and Tier-0 level instantiated, but no Service Router. That means the Logical Router consists of only a DR. These DRs at Tier-1 level provide the gateway (.254) for the attached Logical Switch. The tenant BLUE uses VNI 17289 and the tenant RED uses VNI 17294. NSX-T assign these VNIs out of a VNI pool (default pool: 5000 - 65535). The edge-nodes VMs, now showed as Edge Transport Node (EN1-TN and EN2-TN) have the same Tier-1 and Tier-0 DRs instantiated, but only the Tier-0 includes a Service Router (SR).

Blog-Diagram-1.3.png

The two Tier-1 Logical Routers respective DRs can only talk to each other via the green Tier-0 DR. But before you are able to attach the two Tier-1 DRs to a Tier-0 DR a Tier-0 Logical Router is required. And a Tier-0 Logical Router mandates the assignment of an edge-cluster during the configuration of the Tier-0 Logical Router. Lets assume at this point, we have already configured two edge-node VMs and these edge-node VMs are assigned to an edge-cluster. A Tier-0 Logical Router consists always of a Distributed Router (DR) and depending on the node type as well with a Service Router. A Service Router is always required for the Tier-0 Logical Router, as the Service Router is responsible for the routing connectivity to the physical world. But the Service Router is only instantiated on the edge-nodes. In this lab both Service Router are configured on two edge-nodes respective as Edge Transport Node in active/active mode to provide ECMP to the physical world.

All the internal transit links, as shown in the diagram below, are automatically configured through NSX-T. The only task for the network administrator is to connected the Tier-0 DR to the Tier-1 DRs.

The northbound connection to the physical world requires further a configuration of an additional (or better two Transport Zones for routing redundancy) VLAN based Transport Zone plus the routing peering (typically eBGP). Below is the resulting logical network topology.

One probably ask, why NSX-T instantiate on each edge-node the two Tier-1 DRs too? Well, this is required for an optimized forwarding. As already mentioned, routing decisions are always done on that hosts where the traffic is sourced. Assume vm1 in tenant BLUE would like to talk to a server in the physical world. Traffic sourced at vm1 is forwarded to its local gateway on the Tier-1 DR and then towards the traffic to the Tier-0 DR on the same host. From the Tier-0 DR is then the traffic forwarded to the left Tier-0 SR on EN1-TN (lets assume, traffic is hashed accordingly) and then the flow reach the external destination. The return traffic reach first Tier-0 SR on EN2-TN (lets assume again based on the hash), then the traffic is forwarded locally to Tier-0 DR on the same Edge Transport Node and then to the Tier-1 DR in tenant BLUE. The traffic never leaves EN2-TN until the traffic reach locally the Logical Switch where the vm1 is attached. This is what is called optimized forwarding which is possible due the distributed NSX-T architecture. The traffic needs to be forwarded only once over the physical data center infrastructure and therefore encapsulated into GENEVE per direction!

Blog-Diagram-1.4.png

For now we close this first blog. For the second blog we will dive into the instantiation of a centralized service at Tier-1. Hope you had a little big fun reading this first write-up.

 

 

 

 

*Today, NSX-T supports to run edge-node VMs on NSX-T prepared hosts too. This capability is important to combine compute and edge-node services on the same host.

Version 1.0 - 19.11.2018

Version 1.1 - 27.11.2018 (minor changes)

Version 1.2 - 04.12.2018 (cosmetic changes)

Version 1.3 - 10.12.2018 (link for second blog added)