Skip navigation
2020

Dear readers

Welcome to this new blog post talking about static routing with the NSX-T Tier-0 Gateway. The majority of our customers are using BGP for the Tier-0 Gateway to Top of Rack (ToR) switches connectivity to exchange IP prefixes. For those customers who prefer static routing, this blog post talks about the two design options.

  • Design Option 1: Static Routing using SVI as Next Hop with NSX-T Edge Node in Active/Active Mode to support ECMP for North/South
  • Design Option 2: Static Routing using SVI as Next Hop with NSX-T Edge Node in Active/Standby Mode using HA VIP

I have the impression that the second design option with a Tier-0 Gateway with two NSX-T Edge Node in Active/Standby mode using HA VIP is widely known, but the first option with NSX-T Edge Node in Active/Active mode leveraging ECMP with static routing is pretty unknown. This first option is for example also a valid Enterprise PKS (new name is Tanzu Kubernetes Grid Integration - TKGI) design option (with shared Tier-1 Gateway) or can be used with vSphere 7 with Kubernetes (Project Pacific) as well where BGP is not allowed nor preferred. I am sure the reader is aware, that Tier-0 Gateway in Active/Active mode cannot be enabled for stateful services (e.g. Edge firewall).

 

Before we start to configure these two different design options, we need to describe the overall lab topology, the physical and logical setup along with the NSX-T Edge Node setup including the NSX-T Edge Node main installation steps. For both options we will configure only a single N-VDS on the NSX-T Edge Node. This is not a requirement, but it is considered a pretty simple design option. The other popular design options consist of typically three embedded N-VDS on the NSX-T Edge Node for design option 1 and two embedded N-VDS on the NSX-T Edge Node for design option 2.

 

Logical Lab Topology

The lab setup is pretty simple. For an easy comparison between those two options, I have configured both design options in parallel. The most relevant part for this blog post is between the two Tier-0 Gateways and the two ToR switches acting as Layer 3 Leaf switches. The configuration and design for the Tier-1 Gateway and the compute vSphere cluster hosting the eight workload Ubuntu VMs is identially for both design options. There is only a single Tier-1 Gateway per Tier-0 Gateway configured, each with two overlay segments. The eight workload Ubuntu VMs are installed on different Compute vSphere cluster called NY-CLUSTER-COMPUTE1 with only two ESXi hosts and are evenly distributed on the two ESXi hosts. Those two compute ESXi hosts are prepared with NSX-T and have only a single overlay Transport Zone configured. The four NSX-T Edge Node VMs are running on another vSphere cluster, called NY-CLUSTER-EDGE1. This vSphere cluster has again only two ESXi hosts. A third vSphere cluster called NY-CLUSTER-MGMT is used for the management component, like vCenter and the NSX-T managers. Details about the compute and management vSphere clusters are not relevant for this blog post and hence are deliberately omitted.

The diagram below shows the NSX-T logical topology, the most relevant vSphere objects and underneath the NSX-T overlay and VLAN segments (for the NSX-T Edge Node North/South connectivity.

Overall Lab Topology Combined.png

 

Physical Setup

Lets have first a look at physical setup used for our four NSX-T VM-based Edge Nodes. Understanding the physical is no less important than the logical setup. Two Nexus 3048 ToR switches configured as Layer 3 Leaf switches are used. They have a Layer 3 connection towards a single spine (not shown) and two Layer 2 trunks combined into a single portchannel with LACP between the two ToR switches. Two ESXi hosts (ny-esx50a and ny-esx51a) with 4 pNICs in total assigned to two different virtual Distributed Switches (vDS). Please note, the Nexus 3048 switches are not configured with Cisco vPC, even this would also be a valid option.

Networking – Physical Diagram.png

The relevant physical links for the NSX-T Edge Nodes connectivity are the four green links only connected to vDS2.

 

Those two ESXi hosts (ny-esx50a and ny-esx51a) are NOT prepared. The two ESXi hosts belong to a single vSphere Cluster exclusively used for NSX-T Edge Node VMs. There are a few good reasons NOT to prepare those ESXi hosts with NSX-T where you host only NSX-T Edge Node VMs:

  • It is not required
  • Better NSX-T upgrade-ability (you don't need to evacuate the NSX-T VM-based Edge Nodes during host NSX-T software upgrade with vMotion to enter maintenance mode; every vMotion of the NSX-T VM-based Edge Node will cause a short unnecessary data plane glitch)
  • Shorter NSX-T upgrade cycles (for every NSX-T upgrade you only need to upgrade the ESXi hosts which are used for the payload VMs and only the NSX-T VM-based Edge Nodes, but not the ESXi hosts where you have your Edge Nodes deployed
  • vSphere HA can be turned off (do we want to move a highly loaded packet forwarding node like an NSX-T Edge Node with vMotion in a host vSphere HA event? No I don't think so - as the routing HA model react in a failure event faster)
  • Simplified DRS settings (do we want to move an NSX-T VM-based Edge Node with vMotion to balance the resources?)
  • Typically a resource pool is not required

We should never underestimate how important smooth upgrade cycles are. Upgrade cycles are time consuming events and are typically required multiple times per year.

To have the ESXi host NOT prepared for NSX-T is considered best practice and should always be deployed in any NSX-T deployments which can afford a dedicated vSphere Cluster only for NSX-T VM-based Edge Nodes. Install NSX-T on ESXi hosts where you have deployed your NSX-T VM-based Edge Nodes (called collapsed design) is valid too and appropriate for customers who have a low number of ESXi hosts to keep the CAPEX costs low.

 

ESXi Host vSphere Networking

The first virtual Distributed Switch (vDS1) is used for the host vmkernel networking only. The typical vmkernel interfaces are attached to three different port groups. The second virtual Distributed Switch (vDS2) is used for the NSX-T VM-based Edge Node networking only. All virtual Distributed Switches port groups are tagged with the appropriate VLAN id, with the exception of the three uplink trunk port groups (more details later). Both virtual Distributed Switches are configured for MTU 9000 bytes and I am using a different Geneve Tunnel End Point (TEP) VLAN for the Compute ESXi hosts (VLAN 150 for ny-esx70a and ny-esx71a) and for the two NSX-T VM-based Edge Node (VLAN 151) running on the ESXi hosts (ny-esx50a and ny-esx51a). In such a setup this is not a requirement, but helps to distribute the BUM traffic replication effort leveraging the hierarchical 2-Tier replication mode. The "dummy" port group is used to connect the unused NSX-T Edge Node fast path interface (fp-ethX); the attachment to a dummy port group is done to avoid that NSX-T reports it as interface admin status down.

 

Table 1 - vDS Setup Overview

Name Diagram
vDS NamePhysical Interfaces
Port Groups
vDS1NY-vDS-ESX5x-EDGE1vmnic0 and vmnic1

NY-vDS-PG-ESX5x-EDGE1-VMK0-Mgmt50

NY-vDS-PG-ESX5x-EDGE1-VMK1-vMotion51

NY-vDS-PG-ESX5x-EDGE1-VMK2-ipStorage52

vDS2NY-vDS-ESX5x-EDGE2vmnic2 and vmnic3

NY-vDS-PG-ESX5x-EDGE2-EDGE-Mgmt60 (Uplink 1 active, Uplink 2 standby)

NY-vDS-PG-ESX5x-EDGE2-EDGE-TrunkA (Uplink 1 active, Uplink 2 unused)

NY-vDS-PG-ESX5x-EDGE2-EDGE-TrunkB (Uplink 1 unused, Uplink 2 active)

Ny-vDS-PG-ESX5x-EDGE2-EDGE-TrunkC (Uplink 1 active, Uplink 2 active)

NY-vDS-PG-ESX5x-EDGE2-Dummy999 (Uplink 1 and Uplink 2 are unused)

 

The combined diagram below shows the most relevant NY-vDS-ESX5x-EDGE2 port group settings regarding VLAN trunking and Teaming and Failover.

vDS2 trunk port groups A and B and C.png

 

Logical VLAN Setup

The ToR switches are configured with those relevant four VLANs (60, 151,160 and 161) for the NSX-T Edge Nodes and the associated Switched Virtual Interfaces (SVI). The VLANs 151, 160 and 161 (VLAN 161 is not used in design option 2) are carried over the three vDS trunk port groups (NY-vDS-PG-ESX5x-EDGE2-EDGE-TrunkA, NY-vDS-PG-ESX5x-EDGE2-EDGE-TrunkB and NY-vDS-PG-ESX5x-EDGE2-EDGE-TrunkC). The SVI on the Nexus 3048 for Edge Management (VLAN 60) and for the Edge Node TEP (VLAN 151) are configured with HSRPv2 with a VIP of .254. The two SVIs on the Nexus 3048 for the Uplink VLAN (160 and 161) are configured without HSRP. VLAN999 as the dummy VLAN does not exists on the ToR switches. The Tier-1 Gateway is not shown in the diagrams below.

 

Please note the dotted line to SVI161 respective SVI160 indicates that the VLAN/SVI configuration on the ToR switch exists, but is not used for the static routing when using Active/Active ECMP with static routing (design option 1).

And the dotted line to SVI161 in design option 2 indicates that the VLAN/SVI configuration on the ToR switches exists, but is not used for the static routing when using Active/Standby with HA VIP with static routing. More details about the static routing is shown in a later step.

Networking – Logical VLAN Diagram Option 1&2.png

 

 

NSX-T Edge Node Deployment

The NSX-T Edge Node deployment option with the single Edge Node N-VDS is simple and has been discussed in one of my other blog posts. In this lab exercise I have done an NSX-T Edge Node ova installation, followed by the "join" command followed by the final step of the NSX-T Edge Transport Node configuration. The NSX-T UI installation option is valid as well, but my personal preference is the ova deployment option. The most relevant step for such a NSX-T Edge Node setup is the correct place of the dot1q tagging and the correct mapping of the NSX-T Edge Node interfaces to the virtual Distributed Switches (vDS2) trunk port groups (A & B for option 1 and C for option 2) as shown in the diagrams below.

 

The diagram below shows the NSX-T Edge Node overall setup and the network selection for the NSX-T Edge Node 20 & 21 during the ova deployment for the design option 1:

Networking – NSX-T Edge Combined Design 1.png

 

The diagram below shows the NSX-T Edge Node overall setup and the network selection for the NSX-T Edge Node 22 & 23 during the ova deployment for the design option 2:

Networking – NSX-T Edge Combined Design 2.png

After the successful ova deployment the "join" command must be used to connect the management plane of the NSX-T Edge Nodes to the NSX-T managers. The "join" command requires the NSX-T manager thumbprint. Jump with SSH to the first NSX-T manager and read the API thumbprint. Jump via SSH to every ova deployed NSX-T Edge Node and execute the "join" command. The two steps are shown the in the table below:

 

Table 2 - NSX-T Edge Node "join" to the NSX-T Managers

Step
Command Example
Device
Comments
Read API Thumbprint

ny-nsxt-manager-21> get certificate api thumbprint

ea90e8cc7adb6d66994a9ecc0a930ad4bfd1d09f668a3857e252ee8f74ba1eb4

first NSX-T managerN/A
Join the NSX-T Manager for each NSX-T Edge Node

ny-nsxt-edge-node-20> join management-plane ny-nsxt-manager-21.corp.local thumbprint ea90e8cc7adb6d66994a9ecc0a930ad4bfd1d09f668a3857e252ee8f74ba1eb4 username admin

Password for API user:

Node successfully registered as Fabric Node: 437e2972-bc40-11ea-b89c-005056970bf2

 

ny-nsxt-edge-node-20>

 

--- do the same for all other NSX-T Edge Nodes ---

on all previous deployed NSX-T Edge Node through ova

NSX-T will sync the configuration with the two other NSX-T managers

Do not join using the NSX-T manager VIP FQDN/IP

 

The resulting UI after the "join" command is shown below. The configuration state must be "Configure NSX".

NSX-T View after Edge Join.png

 

NSX-T Edge Transport Node Configuration

Before we can start with the NSX-T Edge Transport Node configuration, we need to be sure, that the Uplink Profiles are ready. The two design options require two different Uplink Profiles. The two diagrams below shows the two different Uplink Profiles for the NSX-T Edge Transport Nodes:

NY-EDGE-UPLINK-PROFILE-COMBINED.png

The Uplink Profile "NY-EDGE-UPLINK-PROFILE-SRC-ID-TEP-VLAN151" is used for design option 1 and is required for Multi-TEP with the teaming policy "LOADBALANCE_SRCID" with two Active Uplinks (EDGE-UPLINK01 and EDGE-UPLINK02). Two additional named teaming policies are configured for a proper ECMP dataplane forwarding; please see blog post "Single NSX-T Edge Node N-VDS with correct VLAN pinning" for more details. I am using the same named teaming configuration for design option 1 as in the other blog post where I have used BGP instead of static routing. As mentioned already, the dot1q tagging (Transport VLAN = 151) for the two TEP interfaces is required as part of this Uplink Profile configuration.

 

The Uplink Profile "NY-EDGE-UPLINK-PROFILE-FAILOVER-TEP-VLAN151" is used for design option 2 and requires the teaming policy "FAILOVER_ORDER" with only a single Active Uplink (EDGE-UPLINK01). Named teaming policies are not required. Again the dot1q tagging for the single TEP interface (Transport VLAN = 151) is required as part of this Uplink Profile configuration.

 

The NSX-T Edge Transport Node configuration itself is straightforward and is shown in the two diagrams below for a single NSX-T Edge Transport Node per design option.

Edge Transport Node Combined.png

NSX-T Edge Transport Node 20 & 21 (design option 1) are using the previous configured Uplink Profile "NY-EDGE-UPLINK-PROFILE-SRC-ID-TEP-VLAN151". Two static TEP IP addresses are configured and the two Uplinks (EDGE-UPLINK01 & EDGE-UPLINK02) are mapped to the fast path interfaces (fp-eth0 & fp-eth1).

 

NSX-T Edge Transport Node 22 & 23 (design option 2) are using the previous configured Uplink Profile "NY-EDGE-UPLINK-PROFILE-FAILOVER-TEP-VLAN151". A single static TEP IP address is configured and the single Uplink (EDGE-UPLINK01) is mapped to the fast path interface (fp-eth0).

 

Please note, the required configuration of the two NSX-T Transport Zones and the single N-VDS switch is not shown.

 

The NSX-T Edge Transport Node ny-nsxt-edge-node-20 and ny-nsxt-edge-node-21 are assigned to the NSX-T Edge cluster NY-NSXT-EDGE-CLUSTER01 and the NSX-T Edge Transport Node ny-nsxt-edge-node-22 and ny-nsxt-edge-node-22 are assigned to the NSX-T Edge cluster NY-NSXT-EDGE-CLUSTER02. This NSX-T Edge cluster configuration is also not shown.

 

NSX-T Tier-0 Gateway Configuration

The base NSX-T Tier-0 Gateway configuration is straightforward and is shown in the two diagrams below.

The Tier-0 Gateway NY-T0-GATEWAY-01 (design option 1) is configured in Active/Active mode along with the association with the NSX-T Edge Cluster NY-NSXT-EDGE-CLUSTER01.

The Tier-0 Gateway NY-T0-GATEWAY-02 (design option 2) is configured in Active/Standby mode along with the association with the NSX-T Edge Cluster NY-NSXT-EDGE-CLUSTER02. In this example preemptive is selected and the first NSX-T Edge Transport Node (ny-nsxt-edge-node-22) is the preferred Edge Transport Node (the active node when both nodes are up and running).

NY-T0-Gateway Combined Design 1&2.png

The next step of Tier-0 Gateway configuration is about the Layer 3 interfaces (LIF) for the northbound connectivity towards the ToR switches.

The next two diagrams show the IP topologies including the ToR switches IP configuration along the resulting NSX-T Tier-0 Gateway Layer 3 interface configuration for the design option 1 (A/A ECMP).

Networking – IP Diagram Combined Option 1.png

The next diagrams show the IP topology including the ToR switches IP configuration along the resulting NSX-T Tier-0 Gateway interface configuration for the design option 2 (A/S HA VIP).

Networking – IP Diagram Combined Option 2.png

The HA VIP configuration requires that both NSX-T Edge Transport Node interfaces belong to the same Layer 2 segment. Here I am using the previous configured Layer 3 interfaces (LIF); both belong to the same VLAN segment 160 (NY-T0-VLAN-SEGMENT-160).

NY-T0-Gateway-02-HA VIP Design 2.png

 

All the previous steps are probably known by the majority of the readers. However, the next step is about the static routing configuration; these steps highlights the relevant configurations to archive ECMP with two NSX-T Edge Transport Node in Active/Active mode.

 

Design Option 1 Static Routing (A/A ECMP)

The first step in design option 1 is the Tier-0 static route configuration for northbound traffic. The most common way is to configure default routes northbound.

Two default routes each with a different Next Hop (172.16.160.254 and 172.16.161.254) are configured on the NY-T0-GATEWAY-01. This is the first step to achieve ECMP for northbound traffic towards the ToR switches. The diagram below shows the corresponding NSX-T Tier-0 Gateway static routing configuration. Please keep in mind, that at the NSX-T Edge Transport Node level, each Edge Transport Node will have two default route entries. This is shown in the table below.

The difference between the logical construct configuration (Tier-0 Gateway) and the "physical" construct configuration (the Edge Transport Nodes) might already be known, as we have the same behavior with BGP. This approach limits configuration errors. With BGP we typically configure only two BGP peers towards the two ToR switches, but each NSX-T Edge Transport Nodes gets two BGP session realized.

 

The diagram below shows the setup with the two default routes (in black) northbound.

Networking – IP StaticRouting North Diagram Combined Option 1.png

 

Please note, the configuration steps how to configure the Tier-1 Gateway (NY-T1-GATEWAY-GREEN) and how to connect it to the Tier-0 Gateway is not shown.

 

Table 3 - NSX-T Edge Transport Node Routing Table for Design Option 1 (A/A ECMP)

ny-nsxt-edge-node-20 (Service Router)
ny-nsxt-edge-node-21 (Service Router)

ny-nsxt-edge-node-20(tier0_sr)> get route 0.0.0.0/0

 

Flags: t0c - Tier0-Connected, t0s - Tier0-Static, b - BGP,

t0n - Tier0-NAT, t1s - Tier1-Static, t1c - Tier1-Connected,

t1n: Tier1-NAT, t1l: Tier1-LB VIP, t1ls: Tier1-LB SNAT,

t1d: Tier1-DNS FORWARDER, t1ipsec: Tier1-IPSec, isr: Inter-SR,

> - selected route, * - FIB route

 

Total number of routes: 1

 

t0s> * 0.0.0.0/0 [1/0] via 172.16.160.254, uplink-307, 03:29:43

t0s> * 0.0.0.0/0 [1/0] via 172.16.161.254, uplink-309, 03:29:43

ny-nsxt-edge-node-20(tier0_sr)>

ny-nsxt-edge-node-21(tier0_sr)> get route 0.0.0.0/0

 

Flags: t0c - Tier0-Connected, t0s - Tier0-Static, b - BGP,

t0n - Tier0-NAT, t1s - Tier1-Static, t1c - Tier1-Connected,

t1n: Tier1-NAT, t1l: Tier1-LB VIP, t1ls: Tier1-LB SNAT,

t1d: Tier1-DNS FORWARDER, t1ipsec: Tier1-IPSec, isr: Inter-SR,

> - selected route, * - FIB route

 

Total number of routes: 1

 

t0s> * 0.0.0.0/0 [1/0] via 172.16.160.254, uplink-292, 03:30:42

t0s> * 0.0.0.0/0 [1/0] via 172.16.161.254, uplink-306, 03:30:42

ny-nsxt-edge-node-21(tier0_sr)>

 

The second step is to configure static routing southbound from the ToR switches towards NSX-T Edge Transport Node. This step is required to achieve ECMP for southbound traffic. Each ToR switch is configured with four static routes in total to forward traffic to the destination overlay networks within NSX-T. We could easily see that each NSX-T Edge Transport Node is used twice as Next Hop for the static route entries.

Networking – IP StaticRouting South Diagram Option 1.png

Table 4 - Nexus ToR Switches Static Routing Configuration and Resulting Routing Table for Design Option 1 (A/A ECMP)

NY-N3K-LEAF-10
NY-N3K-LEAF-11

ip route 172.16.240.0/24 Vlan160 172.16.160.20

ip route 172.16.240.0/24 Vlan160 172.16.160.21

 

ip route 172.16.241.0/24 Vlan160 172.16.160.20

ip route 172.16.241.0/24 Vlan160 172.16.160.21

ip route 172.16.240.0/24 Vlan161 172.16.161.20

ip route 172.16.240.0/24 Vlan161 172.16.161.21

 

ip route 172.16.241.0/24 Vlan161 172.16.161.20

ip route 172.16.241.0/24 Vlan161 172.16.161.21

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 03:26:44, static

    *via 172.16.160.21, Vlan160, [1/0], 03:26:58, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 03:26:44, static

    *via 172.16.160.21, Vlan160, [1/0], 03:26:58, static

---snip---

 

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 03:27:39, static

    *via 172.16.161.21, Vlan161, [1/0], 03:27:51, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 03:27:39, static

    *via 172.16.161.21, Vlan161, [1/0], 03:27:51, static

---snip---

 

NY-N3K-LEAF-11#


Again, these steps are straightforward and it shows how we can archive ECMP with static routing for North/South traffic. But what will happen, if for as example one of the two NSX-T Edge Transport Node is down? Lets assume, ny-nsxt-edge-node-20 is down. Traffic from the Spine switches will be forwarded still to both ToR switches and once the ECMP hash is calculated, the traffic is forwarded to one of the four Next Hops (the four Edge Transport Node Layer 3 interfaces). Based on the hash calculation, it could be Next Hop 172.16.160.20 or 172.16.161.20, both interfaces belong to ny-nsxt-edge-node-20. This traffic will be blackholed and dropped! But why do the ToR switches still announce these overlay networks 172.16.240.0/24 and 172.16.241.0/24 to the Spine switches? The reason is simple, because for both ToR switches the static route entries are still valid, as VLAN160/161 or/and the Next Hop are still UP. So from the ToR switch routing table perspective all is fine. These static route entries will potentially never go down, as the Next Hop IP addresses belong to the VLAN 160 or VLAN 161 and these VLANs are always in the state UP as long a single physical port is UP and part of one of these VLANs (assuming the ToR switch is up and running).  Even when all attached ESXi host are down, the InterSwitch link between the ToR switches is still UP and hence VLAN 160 and VLAN 161 are still UP.  Please keep in mind, with BGP this problem does not exists, as we have BGP keepalives and once the NSX-T Edge Transport Node is down, the ToR switch tears down the BGP session and invalidate the local route entries.

But how could we solve the blackholing issue with static routing? The answer is Bi-Directional Forwarding (BFD) for static routing.

 

What is BFD?

BFD is nothing else then a purpose build keepalive protocol that typically routing protocols including first hop redundancy protocols (e.g. HSRP or VRRP) subscribe to. Various protocols can piggyback a single BFD session. BFD can detect link failures in milliseconds or sub-seconds (NSX-T Bare Metal Edge Nodes with 3 x 50ms) or near sub-seconds (NSX-T VM-based Edge Nodes 3 x 500ms) in the context of NSX-T. All protocols have some way of detecting failure, usually timer-related. Tuning these timers can theoretically get you sub-second failure detection too, but this produces unnecessary high overhead as theses protocols weren't designed with that in mind. BFD was specifically built for fast failure detection and maintain low CPU load. Please keep in mind, if you have as an example BGP running between two physical routers, there's no need to have BFD sessions for link failure detection, as the routing protocol will detect the link-down event instantly. But for two routers (e.g. Tier-0 Gateways) connected through intermediate Layer 2/3 nodes (physical infra, vDS, etc.) where the routing protocol cannot detect a link-down event, the failure event must be detected through a dead timer. Welcome to the virtual world!! BFD was enhanced with the capability to support static routing too, even the driver using BFD for static routing was not the benefit to keep the CPU low and have fast failure detection, it was about extension of the functionality of static routes with keepalives with BFD.

 

So how can we apply BFD for static routing in our lab? There are multiple configuration steps required.

Before we can associate BFD with the static routes on the NSX-T Tier-0 Gateway NY-T0-GATEWAY-01, the creation of a BFD profile for static routes is required. This is shown in the diagram below. I am using the same BFD parameter (Interval=500ms and Declare Dead Multiple=3) as NSX-T 3.0 has defined a default for BFD registered for BGP.

NY-T0-Gateway-01-BFD-Profile Design 1.png

The next step is the configuration of BFD peers for static routing at Tier-0 Gateway level. I am using the same Next Hop IP addresses (172.16.160.254 and 172.16.161.254) for the BFD peers as I have used for the static routes northbound towards the ToR switches. Again, this BFD peer configuration is configured at Tier-0 Gateway level, but the realization of the BFD peers happens at Edge Transport Node level. On each of the two NSX-T Edge Transport Nodes (Service Router) two BGP sessions are realized. The appropriate BFD peer source interface on the Tier-0 Gateway is automatically selected (the Layer 3 LIF) by NSX-T, but as you see, NSX-T allows you to specify the BFD source interface too.

NY-T0-Gateway-01-BFD for staticRouting with Design 1.png

The table below shows the global BFD timer configuration and the BFD peers with source and peer (destination) IP.

Table 5 - NSX-T Edge Transport Node BFD Configuration

ny-nsxt-edge-node-20 (Service Router)ny-nsxt-edge-node-21 (Service Router)

ny-nsxt-edge-node-20(tier0_sr)> get bfd-config

Logical Router

UUID           : 1cfd7da2-f37c-4108-8f19-7725822f0552

vrf            : 2

lr-id          : 8193

name           : SR-NY-T0-GATEWAY-01

type           : PLR-SR

 

Global BFD configuration

    Enabled        : True

    Min RX Interval: 500

    Min TX Interval: 500

    Min RX TTL     : 255

    Multiplier     : 3

 

 

Port               : 64a2e029-ad69-4ce1-a40e-def0956a9d2d

 

Session BFD configuration

 

   Source         : 172.16.160.20

    Peer           : 172.16.160.254

    Enabled        : True

    Min RX Interval: 500

    Min TX Interval: 500

    Min RX TTL     : 255

    Multiplier     : 3

 

 

Port               : 371a9b3f-d669-493a-a46b-161d3536b261

 

Session BFD configuration

 

    Source         : 172.16.161.20

    Peer           : 172.16.161.254

    Enabled        : True

    Min RX Interval: 500

    Min TX Interval: 500

    Min RX TTL     : 255

    Multiplier     : 3

 

ny-nsxt-edge-node-20(tier0_sr)>

ny-nsxt-edge-node-21(tier0_sr)> get bfd-config

Logical Router

UUID           : a2ea4cbc-c486-46a1-a663-c9c5815253af

vrf            : 1

lr-id          : 8194

name           : SR-NY-T0-GATEWAY-01

type           : PLR-SR

 

Global BFD configuration

    Enabled        : True

    Min RX Interval: 500

    Min TX Interval: 500

    Min RX TTL     : 255

    Multiplier     : 3

 

 

Port               : a5454564-ef1c-4e30-922f-9876b9df38df

 

Session BFD configuration

 

   Source         : 172.16.160.21

    Peer           : 172.16.160.254

    Enabled        : True

    Min RX Interval: 500

    Min TX Interval: 500

    Min RX TTL     : 255

    Multiplier     : 3

 

 

Port               : 8423e83b-0a69-44f4-90d1-07d8ece4f55e

 

Session BFD configuration

 

   Source         : 172.16.161.21

    Peer           : 172.16.161.254

    Enabled        : True

    Min RX Interval: 500

    Min TX Interval: 500

    Min RX TTL     : 255

    Multiplier     : 3

 

ny-nsxt-edge-node-21(tier0_sr)>

 

BFD in general and for static routing as wll requires that the peering site is configured with BFD too to ensure BFD keepalives are send out replied respectively. Once BFD peers are configured on the Tier-0 Gateway, the ToR switches require the appropriate BFD peer configuration too. This is shown in the table below. Each ToR switch gets two BFD peer configurations, one for each of the NSX-T Edge Transport Node.

Table 6 - Nexus ToR Switches BFD for Static Routing Configuration

NY-N3K-LEAF-10
NY-N3K-LEAF-11

feature bfd

!

ip route static bfd Vlan160 172.16.160.20

ip route static bfd Vlan160 172.16.160.21

feature bfd

!

ip route static bfd Vlan161 172.16.161.20

ip route static bfd Vlan161 172.16.161.21

 

Once both ends of the BFD peers are configured correctly, the BFD sessions should come up and the static route should be installed into the routing table.

The table below shows the two BFD neighbors for the static routing (interface VLAN160 respective VLAN161). The BFD neighbor for interface Eth1/49 is used for the BFD peer towards the Spine switch and is registered for OSPF.  The NX-OS operating system does not mention "static routing" for the registered protocol, it shows "netstack" - reason unknown.

Table 7 - Nexus ToR Switches BFD for Static Routing Configuration and Verification

NY-N3K-LEAF-10/11

NY-N3K-LEAF-10# show bfd neighbors

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                 

172.16.160.254  172.16.160.20   1090519041/2635291218 Up              1099(3)           Up          Vlan160               default                      

172.16.160.254  172.16.160.21   1090519042/3842218904 Up              1413(3)           Up          Vlan160               default               

172.16.3.18     172.16.3.17     1090519043/1090519041 Up              5629(3)           Up          Eth1/49               default             

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show bfd neighbors

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                 

172.16.161.254  172.16.161.20   1090519041/591227029  Up              1384(3)           Up          Vlan161               default                      

172.16.161.254  172.16.161.21   1090519042/2646176019 Up              1385(3)           Up          Vlan161               default              

172.16.3.22     172.16.3.21     1090519043/1090519042 Up              4696(3)           Up          Eth1/49               default             

NY-N3K-LEAF-11#

NY-N3K-LEAF-10# show bfd neighbors details

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                   

172.16.160.254  172.16.160.20   1090519041/2635291218 Up              1151(3)           Up          Vlan160               default                        

 

Session state is Up and not using echo function

Local Diag: 0, Demand mode: 0, Poll bit: 0, Authentication: None

MinTxInt: 500000 us, MinRxInt: 500000 us, Multiplier: 3

Received MinRxInt: 500000 us, Received Multiplier: 3

Holdown (hits): 1500 ms (0), Hello (hits): 500 ms (22759)

Rx Count: 20115, Rx Interval (ms) min/max/avg: 83/1921/437 last: 348 ms ago

Tx Count: 22759, Tx Interval (ms) min/max/avg: 386/386/386 last: 24 ms ago

Registered protocols:  netstack

Uptime: 0 days 2 hrs 26 mins 39 secs, Upcount: 1

Last packet: Version: 1                - Diagnostic: 0

             State bit: Up             - Demand bit: 0

             Poll bit: 0               - Final bit: 0

             Multiplier: 3             - Length: 24

             My Discr.: -1659676078    - Your Discr.: 1090519041

             Min tx interval: 500000   - Min rx interval: 500000

             Min Echo interval: 0      - Authentication bit: 0

Hosting LC: 1, Down reason: None, Reason not-hosted: None

 

 

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                   

172.16.160.254  172.16.160.21   1090519042/3842218904 Up              1260(3)           Up          Vlan160               default                        

 

Session state is Up and not using echo function

Local Diag: 0, Demand mode: 0, Poll bit: 0, Authentication: None

MinTxInt: 500000 us, MinRxInt: 500000 us, Multiplier: 3

Received MinRxInt: 500000 us, Received Multiplier: 3

Holdown (hits): 1500 ms (0), Hello (hits): 500 ms (22774)

Rx Count: 20105, Rx Interval (ms) min/max/avg: 0/1813/438 last: 239 ms ago

Tx Count: 22774, Tx Interval (ms) min/max/avg: 386/386/386 last: 24 ms ago

Registered protocols:  netstack

Uptime: 0 days 2 hrs 26 mins 46 secs, Upcount: 1

Last packet: Version: 1                - Diagnostic: 0

             State bit: Up             - Demand bit: 0

             Poll bit: 0               - Final bit: 0

             Multiplier: 3             - Length: 24

             My Discr.: -452748392     - Your Discr.: 1090519042

             Min tx interval: 500000   - Min rx interval: 500000

             Min Echo interval: 0      - Authentication bit: 0

Hosting LC: 1, Down reason: None, Reason not-hosted: None

 

 

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                   

172.16.3.18     172.16.3.17     1090519043/1090519041 Up              5600(3)           Up          Eth1/49               default               

 

Session state is Up and using echo function with 500 ms interval

Local Diag: 0, Demand mode: 0, Poll bit: 0, Authentication: None

MinTxInt: 500000 us, MinRxInt: 2000000 us, Multiplier: 3

Received MinRxInt: 2000000 us, Received Multiplier: 3

Holdown (hits): 6000 ms (0), Hello (hits): 2000 ms (5309)

Rx Count: 5309, Rx Interval (ms) min/max/avg: 7/2101/1690 last: 399 ms ago

Tx Count: 5309, Tx Interval (ms) min/max/avg: 1689/1689/1689 last: 249 ms ago

Registered protocols:  ospf

Uptime: 0 days 2 hrs 29 mins 29 secs, Upcount: 1

Last packet: Version: 1                - Diagnostic: 0

             State bit: Up             - Demand bit: 0

             Poll bit: 0               - Final bit: 0

             Multiplier: 3             - Length: 24

             My Discr.: 1090519041     - Your Discr.: 1090519043

             Min tx interval: 500000   - Min rx interval: 2000000

             Min Echo interval: 500000 - Authentication bit: 0

Hosting LC: 1, Down reason: None, Reason not-hosted: None

 

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show bfd neighbors details

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                   

172.16.161.254  172.16.161.20   1090519041/591227029  Up              1235(3)           Up          Vlan161               default                        

 

Session state is Up and not using echo function

Local Diag: 0, Demand mode: 0, Poll bit: 0, Authentication: None

MinTxInt: 500000 us, MinRxInt: 500000 us, Multiplier: 3

Received MinRxInt: 500000 us, Received Multiplier: 3

Holdown (hits): 1500 ms (0), Hello (hits): 500 ms (22634)

Rx Count: 19972, Rx Interval (ms) min/max/avg: 93/1659/438 last: 264 ms ago

Tx Count: 22634, Tx Interval (ms) min/max/avg: 386/386/386 last: 127 ms ago

Registered protocols:  netstack

Uptime: 0 days 2 hrs 25 mins 47 secs, Upcount: 1

Last packet: Version: 1                - Diagnostic: 0

             State bit: Up             - Demand bit: 0

             Poll bit: 0               - Final bit: 0

             Multiplier: 3             - Length: 24

             My Discr.: 591227029      - Your Discr.: 1090519041

             Min tx interval: 500000   - Min rx interval: 500000

             Min Echo interval: 0      - Authentication bit: 0

Hosting LC: 1, Down reason: None, Reason not-hosted: None

 

 

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                   

172.16.161.254  172.16.161.21   1090519042/2646176019 Up              1162(3)           Up          Vlan161               default                        

 

Session state is Up and not using echo function

Local Diag: 0, Demand mode: 0, Poll bit: 0, Authentication: None

MinTxInt: 500000 us, MinRxInt: 500000 us, Multiplier: 3

Received MinRxInt: 500000 us, Received Multiplier: 3

Holdown (hits): 1500 ms (0), Hello (hits): 500 ms (22652)

Rx Count: 20004, Rx Interval (ms) min/max/avg: 278/1799/438 last: 337 ms ago

Tx Count: 22652, Tx Interval (ms) min/max/avg: 386/386/386 last: 127 ms ago

Registered protocols:  netstack

Uptime: 0 days 2 hrs 25 mins 58 secs, Upcount: 1

Last packet: Version: 1                - Diagnostic: 0

             State bit: Up             - Demand bit: 0

             Poll bit: 0               - Final bit: 0

             Multiplier: 3             - Length: 24

             My Discr.: -1648791277    - Your Discr.: 1090519042

             Min tx interval: 500000   - Min rx interval: 500000

             Min Echo interval: 0      - Authentication bit: 0

Hosting LC: 1, Down reason: None, Reason not-hosted: None

 

 

 

OurAddr         NeighAddr       LD/RD                 RH/RS           Holdown(mult)     State       Int                   Vrf                   

172.16.3.22     172.16.3.21     1090519043/1090519042 Up              4370(3)           Up          Eth1/49               default               

 

Session state is Up and using echo function with 500 ms interval

Local Diag: 0, Demand mode: 0, Poll bit: 0, Authentication: None

MinTxInt: 500000 us, MinRxInt: 2000000 us, Multiplier: 3

Received MinRxInt: 2000000 us, Received Multiplier: 3

Holdown (hits): 6000 ms (0), Hello (hits): 2000 ms (5236)

Rx Count: 5236, Rx Interval (ms) min/max/avg: 553/1698/1690 last: 1629 ms ago

Tx Count: 5236, Tx Interval (ms) min/max/avg: 1689/1689/1689 last: 1020 ms ago

Registered protocols:  ospf

Uptime: 0 days 2 hrs 27 mins 26 secs, Upcount: 1

Last packet: Version: 1                - Diagnostic: 0

             State bit: Up             - Demand bit: 0

             Poll bit: 0               - Final bit: 0

             Multiplier: 3             - Length: 24

             My Discr.: 1090519042     - Your Discr.: 1090519043

             Min tx interval: 500000   - Min rx interval: 2000000

             Min Echo interval: 500000 - Authentication bit: 0

Hosting LC: 1, Down reason: None, Reason not-hosted: None

 

NY-N3K-LEAF-11#

 

The table below shows the BFD session on the Tier-0 Gateway on the Service Router (SR). The CLI shows the BFD peers and source IP addresses along the state. Please note, BFD does not require that both end of the BFD peer are configured with an identically interval and multiplier value, but for troubleshooting reason are identically parameter recommended.

Table 8 - NSX-T Edge Transport Node BFD Verification

ny-nsxt-edge-node-20 (Service Router)ny-nsxt-edge-node-21 (Service Router)

ny-nsxt-edge-node-20(tier0_sr)> get bfd-sessions

BFD Session

Dest_port                     : 3784

Diag                          : No Diagnostic

Encap                         : vlan

Forwarding                    : last true (current true)

Interface                     : 64a2e029-ad69-4ce1-a40e-def0956a9d2d

Keep-down                     : false

Last_cp_diag                  : No Diagnostic

Last_cp_rmt_diag              : No Diagnostic

Last_cp_rmt_state             : up

Last_cp_state                 : up

Last_fwd_state                : UP

Last_local_down_diag          : No Diagnostic

Last_remote_down_diag         : No Diagnostic

Last_up_time                  : 2020-07-07 15:42:23

Local_address                 : 172.16.160.20

Local_discr                   : 2635291218

Min_rx_ttl                    : 255

Multiplier                    : 3

Received_remote_diag          : No Diagnostic

Received_remote_state         : up

Remote_address                : 172.16.160.254

Remote_admin_down             : false

Remote_diag                   : No Diagnostic

Remote_discr                  : 1090519041

Remote_min_rx_interval        : 500

Remote_min_tx_interval        : 500

Remote_multiplier             : 3

Remote_state                  : up

Router                        : 1cfd7da2-f37c-4108-8f19-7725822f0552

Router_down                   : false

Rx_cfg_min                    : 500

Rx_interval                   : 500

Service-link                  : false

Session_type                  : LR_PORT

State                         : up

Tx_cfg_min                    : 500

Tx_interval                   : 500

 

 

BFD Session

Dest_port                     : 3784

Diag                          : No Diagnostic

Encap                         : vlan

Forwarding                    : last true (current true)

Interface                     : 371a9b3f-d669-493a-a46b-161d3536b261

Keep-down                     : false

Last_cp_diag                  : No Diagnostic

Last_cp_rmt_diag              : No Diagnostic

Last_cp_rmt_state             : up

Last_cp_state                 : up

Last_fwd_state                : UP

Last_local_down_diag          : No Diagnostic

Last_remote_down_diag         : No Diagnostic

Last_up_time                  : 2020-07-07 15:42:24

Local_address                 : 172.16.161.20

Local_discr                   : 591227029

Min_rx_ttl                    : 255

Multiplier                    : 3

Received_remote_diag          : No Diagnostic

Received_remote_state         : up

Remote_address                : 172.16.161.254

Remote_admin_down             : false

Remote_diag                   : No Diagnostic

Remote_discr                  : 1090519041

Remote_min_rx_interval        : 500

Remote_min_tx_interval        : 500

Remote_multiplier             : 3

Remote_state                  : up

Router                        : 1cfd7da2-f37c-4108-8f19-7725822f0552

Router_down                   : false

Rx_cfg_min                    : 500

Rx_interval                   : 500

Service-link                  : false

Session_type                  : LR_PORT

State                         : up

Tx_cfg_min                    : 500

Tx_interval                   : 500

 

ny-nsxt-edge-node-20(tier0_sr)>

ny-nsxt-edge-node-21(tier0_sr)> get bfd-sessions

BFD Session

Dest_port                     : 3784

Diag                          : No Diagnostic

Encap                         : vlan

Forwarding                    : last true (current true)

Interface                     : a5454564-ef1c-4e30-922f-9876b9df38df

Keep-down                     : false

Last_cp_diag                  : No Diagnostic

Last_cp_rmt_diag              : No Diagnostic

Last_cp_rmt_state             : up

Last_cp_state                 : up

Last_fwd_state                : UP

Last_local_down_diag          : No Diagnostic

Last_remote_down_diag         : No Diagnostic

Last_up_time                  : 2020-07-07 15:42:15

Local_address                 : 172.16.160.21

Local_discr                   : 3842218904

Min_rx_ttl                    : 255

Multiplier                    : 3

Received_remote_diag          : No Diagnostic

Received_remote_state         : up

Remote_address                : 172.16.160.254

Remote_admin_down             : false

Remote_diag                   : No Diagnostic

Remote_discr                  : 1090519042

Remote_min_rx_interval        : 500

Remote_min_tx_interval        : 500

Remote_multiplier             : 3

Remote_state                  : up

Router                        : a2ea4cbc-c486-46a1-a663-c9c5815253af

Router_down                   : false

Rx_cfg_min                    : 500

Rx_interval                   : 500

Service-link                  : false

Session_type                  : LR_PORT

State                         : up

Tx_cfg_min                    : 500

Tx_interval                   : 500

 

 

BFD Session

Dest_port                     : 3784

Diag                          : No Diagnostic

Encap                         : vlan

Forwarding                    : last true (current true)

Interface                     : 8423e83b-0a69-44f4-90d1-07d8ece4f55e

Keep-down                     : false

Last_cp_diag                  : No Diagnostic

Last_cp_rmt_diag              : No Diagnostic

Last_cp_rmt_state             : up

Last_cp_state                 : up

Last_fwd_state                : UP

Last_local_down_diag          : No Diagnostic

Last_remote_down_diag         : No Diagnostic

Last_up_time                  : 2020-07-07 15:42:15

Local_address                 : 172.16.161.21

Local_discr                   : 2646176019

Min_rx_ttl                    : 255

Multiplier                    : 3

Received_remote_diag          : No Diagnostic

Received_remote_state         : up

Remote_address                : 172.16.161.254

Remote_admin_down             : false

Remote_diag                   : No Diagnostic

Remote_discr                  : 1090519042

Remote_min_rx_interval        : 500

Remote_min_tx_interval        : 500

Remote_multiplier             : 3

Remote_state                  : up

Router                        : a2ea4cbc-c486-46a1-a663-c9c5815253af

Router_down                   : false

Rx_cfg_min                    : 500

Rx_interval                   : 500

Service-link                  : false

Session_type                  : LR_PORT

State                         : up

Tx_cfg_min                    : 500

Tx_interval                   : 500

 

ny-nsxt-edge-node-21(tier0_sr)>

 

I would really like to emphasize, that static routing with NSX-T Edge Transport Node in A/A mode must use BFD to avoid blackholing. In case of BFD for static routing is not supported on the ToR switches, then I highly recommend to use A/S mode with HA VIP instead or switch back to BGP.

 

 

 

Design Option 2 - Static Routing (A/S HA VIP)

The first step in design option 2 is the Tier-0 static route configuration for northbound traffic. The most common way is to configure a default route northbound. The diagram below shows the setup with the two default routes (in black) northbound. As already mentioned, HA VIP requires that both NSX-T Edge Transport Node interfaces belong to the same Layer 2 segment (NY-T0-VLAN-SEGMENT-160). A single default route with with two different Next Hops (172.16.160.254 and 172.16.161.254) is configured on the NY-T0-GATEWAY-02. With this design we could also achieve ECMP for northbound traffic towards the ToR switches. The diagram below shows the corresponding NSX-T Tier-0 Gateway static routing configuration. Please keep in mind again, that at the NSX-T Edge Transport Node level, each Edge Transport Node will have two default route entries even though we have configured only two default routes at Tier-0 Gateway level , not four. This is shown in the table below.

Networking – IP StaticRouting North Diagram Combined Option 2.png

Please note, the configuration steps how to configure the Tier-1 Gateway (NY-T1-GATEWAY-BLUE) and how to connect it to the Tier-0 Gateway is not shown.

 

 

Table 9 - NSX-T Edge Transport Node Routing Table for Design Option 2 (A/S HA VIP)

ny-nsxt-edge-node-22 (Service Router)
ny-nsxt-edge-node-23 (Service Router)

ny-nsxt-edge-node-22(tier0_sr)> get route 0.0.0.0/0

 

Flags: t0c - Tier0-Connected, t0s - Tier0-Static, b - BGP,

t0n - Tier0-NAT, t1s - Tier1-Static, t1c - Tier1-Connected,

t1n: Tier1-NAT, t1l: Tier1-LB VIP, t1ls: Tier1-LB SNAT,

t1d: Tier1-DNS FORWARDER, t1ipsec: Tier1-IPSec, isr: Inter-SR,

> - selected route, * - FIB route

 

Total number of routes: 1

 

t0s> * 0.0.0.0/0 [1/0] via 172.16.160.253, uplink-278, 00:00:27

t0s> * 0.0.0.0/0 [1/0] via 172.16.160.254, uplink-278, 00:00:27

ny-nsxt-edge-node-22(tier0_sr)>

ny-nsxt-edge-node-23(tier0_sr)> get route 0.0.0.0/0

 

Flags: t0c - Tier0-Connected, t0s - Tier0-Static, b - BGP,

t0n - Tier0-NAT, t1s - Tier1-Static, t1c - Tier1-Connected,

t1n: Tier1-NAT, t1l: Tier1-LB VIP, t1ls: Tier1-LB SNAT,

t1d: Tier1-DNS FORWARDER, t1ipsec: Tier1-IPSec, isr: Inter-SR,

> - selected route, * - FIB route

 

Total number of routes: 1

 

t0s> * 0.0.0.0/0 [1/0] via 172.16.160.253, uplink-279, 00:00:57

t0s> * 0.0.0.0/0 [1/0] via 172.16.160.254, uplink-279, 00:00:57

ny-nsxt-edge-node-23(tier0_sr)>

 

The second step is to configure static routing southbound from the ToR switches towards NSX-T Edge Transport Node. Each ToR switch is configured with two static routes to forward traffic to the destination overlay networks (overlay segments 172.16.242.0/24 and 172.16.243.0/24) within NSX-T. For each of the static routes the Next Hop is the NSX-T Tier-0 Gateway HA VIP.

Networking – IP StaticRouting South Diagram Option 2.png

The table below shows the static routing configuration on the ToR switch and the resulting routing table. The Next Hop is the Tier-0 Gateway HA VIP 172.16.160.24 for all static routes.

Table 10 - Nexus ToR Switches Static Routing Configuration and Resulting Routing Table for Design Option 2 (A/S HA VIP)

NY-N3K-LEAF-10
NY-N3K-LEAF-11

ip route 172.16.242.0/24 Vlan160 172.16.160.24

ip route 172.16.243.0/24 Vlan160 172.16.160.24

ip route 172.16.242.0/24 Vlan160 172.16.160.24

ip route 172.16.243.0/24 Vlan160 172.16.160.24

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 02:51:34, static

    *via 172.16.160.21, Vlan160, [1/0], 02:51:41, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 02:51:34, static

    *via 172.16.160.21, Vlan160, [1/0], 02:51:41, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 02:55:42, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 02:55:42, static

 

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 02:53:04, static

    *via 172.16.161.21, Vlan161, [1/0], 02:53:12, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 02:53:04, static

    *via 172.16.161.21, Vlan161, [1/0], 02:53:12, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 02:55:03, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 02:55:03, static

 

NY-N3K-LEAF-11#

 

Failover Sanity checks

The table below

Table 11 - Failover Sanity Check

Failover Case
NY-N3K-LEAF-10 (Routing Table)
NY-N3K-LEAF-11 (Routing Table)
Comments
All NSX-T Edge Transport Nodes are UP

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 00:58:27, static

    *via 172.16.160.21, Vlan160, [1/0], 00:58:43, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 00:58:27, static

    *via 172.16.160.21, Vlan160, [1/0], 00:58:43, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:02:47, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:02:47, static

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 00:59:10, static

    *via 172.16.161.21, Vlan161, [1/0], 00:59:25, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 00:59:10, static

    *via 172.16.161.21, Vlan161, [1/0], 00:59:25, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:01:21, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:01:21, static

NY-N3K-LEAF-11#

NSX-T Edge Transport Node

ny-nsxt-edge-node-20 is DOWN

All other NSX-T Edge Transport Node are UP

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 1/0

    *via 172.16.160.21, Vlan160, [1/0], 01:01:01, static

172.16.241.0/24, ubest/mbest: 1/0

    *via 172.16.160.21, Vlan160, [1/0], 01:01:01, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:05:05, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:05:05, static

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 1/0

    *via 172.16.161.21, Vlan161, [1/0], 01:01:21, static

172.16.241.0/24, ubest/mbest: 1/0

    *via 172.16.161.21, Vlan161, [1/0], 01:01:21, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:03:17, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:03:17, static

NY-N3K-LEAF-11#

Route entries with

ny-nsxt-edge-node-20

(172.16.160.20

and 172.16.161.20)

are removed by BFD

NSX-T Edge Transport Node

ny-nsxt-edge-node-21 is DOWN

All other NSX-T Edge Transport Node are UP

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 1/0

    *via 172.16.160.20, Vlan160, [1/0], 00:02:40, static

172.16.241.0/24, ubest/mbest: 1/0

    *via 172.16.160.20, Vlan160, [1/0], 00:02:40, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:12:13, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:12:13, static

 

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 1/0

    *via 172.16.161.20, Vlan161, [1/0], 00:03:04, static

172.16.241.0/24, ubest/mbest: 1/0

    *via 172.16.161.20, Vlan161, [1/0], 00:03:04, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:10:28, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:10:28, static

 

NY-N3K-LEAF-11#

Route entries with

ny-nsxt-edge-node-21

(172.16.160.21

and 172.16.161.21)

are removed by BFD

NSX-T Edge Transport Node

ny-nsxt-edge-node-22 is DOWN

All other NSX-T Edge Transport Node are UP

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 00:06:55, static

    *via 172.16.160.21, Vlan160, [1/0], 00:00:09, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 00:06:55, static

    *via 172.16.160.21, Vlan160, [1/0], 00:00:09, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:16:28, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:16:28, static

 

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 00:07:01, static

    *via 172.16.161.21, Vlan161, [1/0], 00:00:16, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 00:07:01, static

    *via 172.16.161.21, Vlan161, [1/0], 00:00:16, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:14:25, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:14:25, static

 

NY-N3K-LEAF-11#

A single NSX-T Edge

Transport Node down

used for HA VIP

does not change the

routing table

NSX-T Edge Transport Node

ny-nsxt-edge-node-23 is DOWN

All other NSX-T Edge Transport Node are UP

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 00:10:58, static

    *via 172.16.160.21, Vlan160, [1/0], 00:04:12, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.160.20, Vlan160, [1/0], 00:10:58, static

    *via 172.16.160.21, Vlan160, [1/0], 00:04:12, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:20:31, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:20:31, static

 

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.240.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 00:11:30, static

    *via 172.16.161.21, Vlan161, [1/0], 00:04:45, static

172.16.241.0/24, ubest/mbest: 2/0

    *via 172.16.161.20, Vlan161, [1/0], 00:11:30, static

    *via 172.16.161.21, Vlan161, [1/0], 00:04:45, static

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:18:54, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:18:54, static

 

NY-N3K-LEAF-11#

A single NSX-T Edge

Transport Node down

used for HA VIP

does not change the

routing table

NSX-T Edge Transport Node

ny-nsxt-edge-node-20 and

ny-nsxt-edge-node-21 are DOWN

All other NSX-T Edge Transport Node are UP

NY-N3K-LEAF-10# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:24:06, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:24:06, static

 

NY-N3K-LEAF-10#

NY-N3K-LEAF-11# show ip route static

IP Route Table for VRF "default"

'*' denotes best ucast next-hop

'**' denotes best mcast next-hop

'[x/y]' denotes [preference/metric]

'%<string>' in via output denotes VRF <string>

 

172.16.242.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:22:54, static

172.16.243.0/24, ubest/mbest: 1/0

    *via 172.16.160.24, Vlan160, [1/0], 01:22:54, static

 

NY-N3K-LEAF-11#

All route entries

related to design

option 1 are removed

by BFD

 

I hope you had a little bit of fun reading this blog post about a static routing with NSX-T. Now with the knowledge how to archive ECMP with static routing, you might have a new and interessting design option for your customers NSX-T deployments.

 

Software Inventory:

vSphere version: VMware ESXi, 6.5.0, 15256549

vCenter version: 6.5.0, 10964411

NSX-T version: 3.0.0.0.0.15946738 (GA)

Cisco Nexus 3048 NX-OS version: 7.0(3)I7(6)

 

 

Blog history

Version 1.0 - 08.07.2020 - first published version

Version 1.1 - 09.07.2020 - minor changes

Version 1.2 - 30.07.2020 - grammar updates - thanks to James Lepthien :-)