6 Replies Latest reply on May 17, 2020 5:32 AM by WarlockArg

    NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks

    WarlockArg Enthusiast

      Hi everybody,

         I'm doing my first implementation of NSX-T and have an issue in the T-0 and T1 Gateways that I think is because a fault of mine.

       

         I deployed a Tier-0 Gateway that manages only one uplink to a physical core switch with a VLAN segment as the uplink (192.168.16.0/24) , and for its downlink I have two LIFs on overlay segments (10.10.101.0/24 and 192.168.17.0/24) and the other one connected to a Tier-1 Gateway (the auto plumbed NSX Network).

         On the Tier-1 Gateways I have as its uplink the auto plumbed network and as its downlinks two overlay segments (10.10.102.0/24 and 10.10.103.0/24).

       

         The problem I'm facing is that I see that both gateways route the traffic between their downlinks without problem. It is to say, hosts on 192.168.17.0/24 can ping hosts con 10.10.101.0/24 and also to the auto plumbed network (100.64.112.0/31). The same happens on the Tier-1 gateway, hosts on 10.10.102.0/24 can ping hosts on 10.10.103.0/24. But no one can ping hosts or gateway's interfaces that have to traverse the Gateway. For example hosts on overlay segments connected to Tier-1 Gateway cannot ping hosts on Tier-0 Gateway's overlay segment. They neither can ping the inter gateways interfaces.
          The question is, could be possible that this happens because the gateways theirself are not the same device but actually two ones, the DR and SR and the DRs don't know how to reach its half in the SR??

         One more thing, I'm not using BGP neither statics routes. What I have observed is that when I choose to distribute the connected subnets in the Tier-1 Gateway, from the VRF CLI of the Tier-0 Gateway SR I can see those subnets in the "get routes" command as "t1c" (tier-1 connected). But from that CLI in the Tier-0 SR if I ping those LIFs in the Tier-1 DR I cannot reach them.
          It seems that there are no connections between the DR and SR entities, although there is a subnet (169.254.0.0/24).
          The other problem I have is that I don't know what the Tier-1 Gateway routing table is because the "get router" command is only available on the Tier-0 Gateway and not in the Tier-1 one. This might be a conceptual error I'm not realizing why that is happening.

          All the documents and videos I read and saw, show that these type of implementations are done activating BGP, that perhaps it solves all the issues I'm having. But because the customer doesn't have a core for now that supports BGP, I preferred no to use a dynamic routing protocol. Perhaps, if it is convenient, if the problem is I'm not using BGP, I could enable it just for the internal virtual networking without advertising any route to the physical network. It is to say, I could use BGP for NSX networking and use statics routes between the Edge and physical network (I know it is not optimal). For the moment, the production will have only one or two NSX segments, that is manageable thru static routes. In the future the customer will change his core for a BGP capable one.

         I'll appreciate if you can help me why this is going on.

         I attach a network diagram below.

       

       

      Thanks in advance.

      Guido.

        • 1. Re: NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks
          Chris Mentjox Enthusiast

          Hi,

           

          I am not seeing this behavior.

          When i look on the T0-SR, i see

          t1c> * 10.1.192.0/24 [3/0] via 100.64.32.7, downlink-500, 00:00:02

           

          When i move the segment to the t0 is see

          t0c> * 10.1.192.0/24 is directly connected, downlink-529, 00:01:11

           

          In both cases i can ping a host in the segment. from behind a t1

           

          You can see some routes on the T1-DR using "get forwarding"

           

          T1-DR > get forwarding

          Logical Router

          UUID                                   VRF    LR-ID  Name                              Type

          b5c5f619-4368-4f1f-b316-7a1fd6c1f92d   10     8      DR-T1-test                        DISTRIBUTED_ROUTER_TIER1

          IPv4 Forwarding Table

          IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC

          0.0.0.0/0          169.254.0.2     route    d58d74b5-6537-4c68-9293-6ad10fd97a4f   02:50:56:56:53:00

          169.254.0.0/28                     route    d58d74b5-6537-4c68-9293-6ad10fd97a4f

          169.254.0.1/32                     route    fc58ab32-deee-5d9d-b1f3-8c7aedc10a98

           

          Are all MTU's ok ? do you see geneve tunnels ?

          1 person found this helpful
          • 2. Re: NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks
            WarlockArg Enthusiast

            Chris,

               First of all, thank you for your answer.

             

               I'll paste the output of some commands:

             

            From Tier-0 SR:

             

             

            nsxtesg0(tier0_sr)> get route

            Flags: t0c - Tier0-Connected, t0s - Tier0-Static, b - BGP,

            t0n - Tier0-NAT, t1s - Tier1-Static, t1c - Tier1-Connected,

            t1n: Tier1-NAT, t1l: Tier1-LB VIP, t1ls: Tier1-LB SNAT,

            t1d: Tier1-DNS FORWARDER, t1ipsec: Tier1-IPSec, isr: Inter-SR,

            > - selected route, * - FIB route

             

            Total number of routes: 9

             

            t0c> * 10.10.101.0/24 is directly connected, downlink-337, 1d16h22m

            t1c> * 10.10.102.0/24 [3/0] via 100.64.112.1, downlink-294, 1d06h53m

            t1c> * 10.10.103.0/24 [3/0] via 100.64.112.1, downlink-294, 1d06h53m

            t0c> * 100.64.112.0/31 is directly connected, downlink-294, 1d16h22m

            t0c> * 169.254.0.0/24 is directly connected, downlink-344, 1d16h22m

            t0c> * 192.168.16.0/24 is directly connected, uplink-346, 1d16h22m

            t0c> * 192.168.17.0/24 is directly connected, downlink-328, 1d16h22m

            t0c> * fcef:2313:2800:800::/64 is directly connected, downlink-294, 1d16h22m

            t0c> * fe80::/64 is directly connected, downlink-344, 1d16h22m

             

            The two routes marked in red, the t1c, appeared when I selected to redistribute connected routes on Tier-1 Gateway. Before that I had only been seeing the "t0c" routes.

             

            If I ping from this T1 SR to 192.168.16.1, the uplink interface that connects to physical routers, that is in a VLAN backed segment I reach the destination (it is logical causa it is the direct-connected interface):

             

            nsxtesg0(tier0_sr)> ping 192.168.16.1

            PING 192.168.16.1 (192.168.16.1): 56 data bytes

            64 bytes from 192.168.16.1: icmp_seq=0 ttl=255 time=7.479 ms

            64 bytes from 192.168.16.1: icmp_seq=1 ttl=255 time=1.731 ms

            64 bytes from 192.168.16.1: icmp_seq=2 ttl=255 time=1.995 ms

            64 bytes from 192.168.16.1: icmp_seq=3 ttl=255 time=1.234 ms

             

            Moreover, I algo reach the core's switch interface (192.168.16.2):

             

            nsxtesg0(tier0_sr)> ping 192.168.16.2

            PING 192.168.16.2 (192.168.16.2): 56 data bytes

            64 bytes from 192.168.16.2: icmp_seq=0 ttl=64 time=0.635 ms

            64 bytes from 192.168.16.2: icmp_seq=1 ttl=64 time=0.549 ms

             

            But if I ping one of the LIFs that is also "direct-connected" I cannot reach it. For example 192.168.101.1

             

            nsxtesg0(tier0_sr)> ping 10.10.101.1

            PING 10.10.101.1 (10.10.101.1): 56 data bytes

            ^C

            --- 10.10.101.1 ping statistics ---

            5 packets transmitted, 0 packets received, 100.0% packet loss

             

            At this moment I thought that this happens because that overlay segment is actually direct-connected to the T0 DR, not to the SR. So, I moved to the Tier-0 DR:

             

            From Tier-0 DR:

             

            nsxtesg0> get logical-router

            Logical Router

            UUID                                   VRF    LR-ID  Name                              Type                        Ports

            736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4

            b958c9bb-0d97-4c73-b39b-73b045e5ed76   3      1026   SR-RosarioT-1-Gateway             SERVICE_ROUTER_TIER1        5

            224c5db1-d92f-488f-9787-cb736b2cd396   4      2      DR-RosarioT-0-Gateway             DISTRIBUTED_ROUTER_TIER0    6

            002b63f3-688b-4fc7-84f1-bed9b00cc035   6      4      SR-RosarioT-0-Gateway             SERVICE_ROUTER_TIER0        5

            e6eb4986-7e44-4dfa-b1d7-1da0c8ac3d41   8      1025   DR-RosarioT-1-Gateway             DISTRIBUTED_ROUTER_TIER1    5

             

            nsxtesg0> vrf 4

            nsxtesg0(vrf)> ping 10.10.101.1

            PING 10.10.101.1 (10.10.101.1): 56 data bytes

            64 bytes from 10.10.101.1: icmp_seq=0 ttl=64 time=0.677 ms

            64 bytes from 10.10.101.1: icmp_seq=1 ttl=64 time=1.041 ms

            64 bytes from 10.10.101.1: icmp_seq=2 ttl=64 time=0.665 ms

             

            And from the DR I can reach the Tier-0 LIFs. So, I thought it might have been a problem in the connection between the SR and DR. For some reason, the SR knows that it has a LIF direct-connected to him but it doesn't know how to reach it. Very strange behaviour, because in a normal router, is an interface is directly connected, it always can reach it (I don't have too much experience working with VRF, perhaps this is a normal behaviour. Actually I don't think it because you don't usually have a router split in two parts, one the DR and one the SR).

             

            The last test in the Tier-0 Gateway is that if I ping from the host 10.10.101.11 to the host 192.168.17.11 (two segments that are connected by the Tier-0 Gateway, it works ok):

             

             

            From Tier-1 SR:

             

            If I move to Tier-1 SR:

             

            nsxtesg0> vrf 3

            nsxtesg0(tier1_sr)> get forwarding

            Logical Router

            UUID                                   VRF    LR-ID  Name                              Type

            b958c9bb-0d97-4c73-b39b-73b045e5ed76   3      1026   SR-RosarioT-1-Gateway             SERVICE_ROUTER_TIER1

            IPv4 Forwarding Table

            IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC

            0.0.0.0/0          100.64.112.0    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

            10.10.102.0/24                     route    b9ba19cf-af79-4c88-b6a3-8749d767708a

            10.10.102.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

            10.10.103.0/24                     route    b4d4b235-70f1-4c25-99cf-4c205ecb5044

            10.10.103.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

            100.64.112.0/31                    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

            100.64.112.1/32                    route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

            127.0.0.1/32                       route    2e52e6b9-d924-45fb-aa80-7568525e3630

            169.254.0.0/28                     route    fe859aa2-5095-42d7-973c-2b8dfccad54c

            169.254.0.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

            169.254.0.2/32                     route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

            IPv6 Forwarding Table

            IP Prefix                                     Gateway IP                                Type        UUID                                   Gateway MAC

            ::/0                                          fcef:2313:2800:800::1                     route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

            ::1/128                                                                                 route       2e52e6b9-d924-45fb-aa80-7568525e3630

            fcef:2313:2800:800::/64                                                                 route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

            fcef:2313:2800:800::2/128                                                               route       15f2f498-b4ad-59de-a31c-1d165cd3ecff

             

            As you can see, I don't have any entry to get the segments that are direct-connected to Tier-0 DR neither Tier-0 SR but a default route. Something strange here. I can ping the LIF of the overlay segment direct-connected to Tier-0 DR, but I cannot ping a host from that segment:

             

            nsxtesg0(tier1_sr)> ping 10.10.101.1

            PING 10.10.101.1 (10.10.101.1): 56 data bytes

            64 bytes from 10.10.101.1: icmp_seq=0 ttl=64 time=1.428 ms

            64 bytes from 10.10.101.1: icmp_seq=1 ttl=64 time=1.262 ms

            64 bytes from 10.10.101.1: icmp_seq=2 ttl=64 time=1.439 ms

             

            nsxtesg0(tier1_sr)> ping 10.10.101.11

            PING 10.10.101.11 (10.10.101.11): 56 data bytes

            --- 10.10.101.11 ping statistics ---

            4 packets transmitted, 0 packets received, 100.0% packet loss

             

            And I also cannot ping its own LIF, although it is direct-connected to him (the same behaviour as in the Tier-0 SR):

             

            nsxtesg0(tier1_sr)> ping 10.10.103.1

            PING 10.10.103.1 (10.10.103.1): 56 data bytes

             

            --- 10.10.103.1 ping statistics ---

            5 packets transmitted, 0 packets received, 100.0% packet loss

             

            But I can ping the uplink interface that connects to Tier-0 Gateway:

             

            nsxtesg0(tier1_sr)> ping 100.64.112.1

            PING 100.64.112.1 (100.64.112.1): 56 data bytes

            64 bytes from 100.64.112.1: icmp_seq=0 ttl=64 time=0.708 ms

            64 bytes from 100.64.112.1: icmp_seq=1 ttl=64 time=1.432 ms

            64 bytes from 100.64.112.1: icmp_seq=2 ttl=64 time=1.384 ms

             

            From Tier-1 DR:

             

            Lastly, if I move to the Tier-1 DR I get:

             

            nsxtesg0> vrf 8

            nsxtesg0(vrf)> get forwarding

            Logical Router

            UUID                                   VRF    LR-ID  Name                              Type

            e6eb4986-7e44-4dfa-b1d7-1da0c8ac3d41   8      1025   DR-RosarioT-1-Gateway             DISTRIBUTED_ROUTER_TIER1

            IPv4 Forwarding Table

            IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC

            0.0.0.0/0          100.64.112.0    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23   02:50:56:56:44:52

            10.10.102.0/24                     route    b9ba19cf-af79-4c88-b6a3-8749d767708a

            10.10.102.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

            10.10.103.0/24                     route    b4d4b235-70f1-4c25-99cf-4c205ecb5044

            10.10.103.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

            100.64.112.0/31                    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

            100.64.112.1/32                    route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

            127.0.0.1/32                       route    2e52e6b9-d924-45fb-aa80-7568525e3630

            169.254.0.0/28                     route    fe859aa2-5095-42d7-973c-2b8dfccad54c

            169.254.0.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

            169.254.0.2/32                     route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

            IPv6 Forwarding Table

            IP Prefix                                     Gateway IP                                Type        UUID                                   Gateway MAC

            ::/0                                          fcef:2313:2800:800::1                     route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

            ::1/128                                                                                 route       2e52e6b9-d924-45fb-aa80-7568525e3630

            fcef:2313:2800:800::/64                                                                 route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

            fcef:2313:2800:800::2/128                                                               route       15f2f498-b4ad-59de-a31c-1d165cd3ecff

             

            As you can see, the same forwarding table as in the Tier-1 SR (it is logical):

             

            But from here I can ping its  LIFs:

             

            nsxtesg0(vrf)> ping 10.10.103.1

            PING 10.10.103.1 (10.10.103.1): 56 data bytes

            64 bytes from 10.10.103.1: icmp_seq=0 ttl=64 time=0.742 ms

            64 bytes from 10.10.103.1: icmp_seq=1 ttl=64 time=0.885 ms

            64 bytes from 10.10.103.1: icmp_seq=2 ttl=64 time=1.233 ms

            --- 10.10.103.1 ping statistics ---

            4 packets transmitted, 3 packets received, 25.0% packet loss

            round-trip min/avg/max/stddev = 0.742/0.953/1.233/0.206 ms

             

            But I cannot ping its uplink interface that connect to Tier-0 Gateway:

             

            nsxtesg0(vrf)> ping 100.64.112.1

            PING 100.64.112.1 (100.64.112.1): 56 data bytes

             

            --- 100.64.112.1 ping statistics ---

            6 packets transmitted, 0 packets received, 100.0% packet loss

             

            So, as I said in my first post, It seems that there is something broken in the connection between the gateways' DR and SR components.

             

            I think the MTUs are OK, everything in 1600 except the VLAN segments. I think this wouldn't generate issues now cause I'm testing with standard pings that have 56 bytes of payload.

             

             

            Some tunnels are down but I think they are in that state because I don't have any traffic on them, cause when I create new VMs o generate traffic they come up.

             

            One question Chris, cause you say that you don't observe this behaviour in your infrastructure. Are you using BGP or just static routes? Because at the moment, I haven't activated any routing protocol in my infrastructure. Perhaps, activating BGP and redistributing routes in both, Tier-0 and Tier-1 Gateways I would solve the problem of interconnection that I'm having between gateways' SR and DR components. I don't know, just an idea.

             

            Thank you.

            Guido.

            • 3. Re: NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks
              Chris Mentjox Enthusiast

              HI,

               

              I indeed have a tier-0 with bgp running.

              But for testing purposes i created a second tier0 without any bgp enabled.

               

              Connected a segment to that tier-0 and connect the T1 with segment also to that tier0

               

              I am using physical edge nodes. But that should not make any difference.

               

              I am able to ping without any problems.

               

              Is it maybe a firewall rule ? on your vm ? esp. windows blocks stuff by default

              Double check your mtu settings. Also the setting on the vswitch (when using virtual-edge-nodes)

              • 4. Re: NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks
                WarlockArg Enthusiast

                I'll check the uplink profiles of the edge. In my case my edge node is virtual. I'm using two N-VDS in the edge node, one for the overlay traffic and one for the VLAN backed segment. Perhaps there's something wrong there.

                It's a straight forward configuration what I'm testing, I shouldn't be having this problems. And the edge node configuration might have something to do with this, cause my problems are with the connection between the Gateway's DR and SR components, and that connections are taking place inside the edge node.

                 

                All the VMs I'm testing are lubuntu, light version of Ubuntu. They have no firewall activated. On the other hand, I'm also having problems pinging the routers interfaces from inside the gateway itself!!!

                 

                I'll tell you if I find the problems.

                 

                Thanks for everything Chris.

                 

                Guido.

                • 5. Re: NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks
                  Chris Mentjox Enthusiast

                  Check the mtu on the vswitch the virtual-edge is connected to. (also on you physical switch)
                  Allot of issues appear when mtu settings are wrong.

                  • 6. Re: NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks
                    WarlockArg Enthusiast

                    I think I found the problem. There are some Geneve tunnels between the transport nodes and edge nodes that are not coming up.

                     

                    You asked me to check the tunnels status and I saw that some of them were down but I had read somewhere those tunnels were coming up and down depending whether on the transport nodes were VMs connected to logical switches or not. It is to say, if there are no VMs connected to a logical switch (segment) in a transport node, that transport node doesn't activate the Geneve tunnel against other transport node cause it has no sense to maintain an enabled tunnel that won't have any traffic.

                     

                    What I didn't notice was that I had two transport nodes that had never activated their tunnels against the edge nodes, no to other transport nodes but the edge ones. When I used the trace tool that comes with NSX GUI I realized the traffic were stuck in the tunnel that didn't exist!!!

                     

                    My infrastructure for now is a 3-fully-collapsed-node, where I used them as transport, management and edge nodes. What I don't understand is why just one of them has the Geneve tunnels up against the edge transport node and the other two don't. All I read about sharing a host for edge and transport node is that you cannot share the same VLAN ID for overlay traffic when you use the same virtual switch for the host transport node VTEPs and for Edge Transport Node ones. I'm using the same VLAN ID and layer-3 segment for all the VTEPs, but I DO NOT use the same virtual switch for them inside the host. For the Host Transport Node I use a N-VDS that uses two physical vmnics (vmnic 2 and 3), and the Edge Transport Node VM is connected to another vSphere VDS that uses another two physical vmnics (0 and 1). I'll open another thread in order to see if someone knows what could be happening.

                     

                     

                    Thanks you very much.
                    Guido.