RagsRachamadugu
Contributor
Contributor

NSX-T north-south traffic not working due to dropped ingress packets on Tier-0 router uplink

Jump to solution

I'm using NSX-T 2.4. I have two segments - web and app - that are connected to a tier-1 router - 'bstier1gw', which is connected to 'bs-tier0' router. The tier-0 router is an edge VM with single uplink profile, deployed on top of ESXi that has a single physical NIC connected to a ToR. I've setup a NAT service on tier0 to translate web overlay network (192.168.2.0/24) to an IP 10.4.11.154, which is part of the 10.4.11.0/24 network that is a VLAN on the physical fabric. I also added a static route on tier-0 edge for default prefix (0.0.0.0/0) with nexthop set as 10.4.11.1, which is the SVI IP on the ToR leaf.

When I login from a web vm with ip 192.168.2.4 and ping to 8.8.8.8, it doesn't work. I see ingress dropped packets on uplink port of tier-0 router as seen in the attachment.

How do I debug this further and am I missing any more steps?

Thanks Rags

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
daphnissov
Immortal
Immortal

My mistake, you are correct.

On your VLAN TZ, are you configuring a VLAN ID? Can you show your edge interface assignments?

View solution in original post

0 Kudos
11 Replies
daphnissov
Immortal
Immortal

From inside your network if you do a traceroute to 10.4.11.154, what is your result? Can you ping from your web VM (192.168.2.4) to an IP that is inside your LAN (but outside the scope of NSX-T)? Those are the first two things to check.

0 Kudos
RagsRachamadugu
Contributor
Contributor

Thank you for looking into this.

Following is the info when I login to a baremetal server that is in the same subnet as the underlay VLAN 11.

root@bs101-01l:~# ip addr show dev eno1

4: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 10000

    link/ether 0c:c4:7a:33:5c:d0 brd ff:ff:ff:ff:ff:ff

    inet 10.4.11.101/24 brd 10.4.11.255 scope global eno1

       valid_lft forever preferred_lft forever

    inet6 fe80::ec4:7aff:fe33:5cd0/64 scope link

       valid_lft forever preferred_lft forever

root@bs101-01l:~# traceroute 10.4.11.154

traceroute to 10.4.11.154 (10.4.11.154), 30 hops max, 60 byte packets

1  10.4.11.101 (10.4.11.101)  3077.627 ms !H  3077.538 ms !H  3077.515 ms !H

root@bs101-01l:~#

root@bs101-01l:~# ping 10.4.11.154

PING 10.4.11.154 (10.4.11.154) 56(84) bytes of data.

From 10.4.11.101 icmp_seq=1 Destination Host Unreachable

From 10.4.11.101 icmp_seq=2 Destination Host Unreachable

From 10.4.11.101 icmp_seq=3 Destination Host Unreachable

From 10.4.11.101 icmp_seq=4 Destination Host Unreachable

From 10.4.11.101 icmp_seq=5 Destination Host Unreachable

From 10.4.11.101 icmp_seq=6 Destination Host Unreachable

^C

--- 10.4.11.154 ping statistics ---

7 packets transmitted, 0 received, +6 errors, 100% packet loss, time 6148ms

pipe 4

From a web vm on the overlay network, I'm unable to ping 10.4.11.101 (a sample host on the underlay network). Few other things about my environment

  1. The ESXi host on which the edge vms are running, is not NSX-T configured
  2. The ESXi edge VM's eth0 and fp-eth0 are both connected to a single VSS on ESXi that is configured to send untagged traffic out to ToR through the single vmnic0 on the ESXi host. So eth0 and fp-eth0 have IP addresses 10.4.11.151 and 10.4.11.152 respectively.  The remaining edge VM interfaces fp-eth1 and fp-eth2 are disconnected and the uplink profile I used for edge VM has a single uplink that is mapped to fp-eth0
  3. On tier-0 router, when I added an uplink (which was not possible from GUI btw - I had to hack the HTML to un-disable the Type dropdown Smiley Sad ), I added a subnet (a bad name imho for an IP address) - 10.4.11.153/24. Strangely even this IP 10.4.11.153 is not ping-able from anywhere..

Thank you for your help again. I'll also try to use tcpdump etc on the ToR to see what's going on..

Thanks Rags

0 Kudos
RagsRachamadugu
Contributor
Contributor

I spent lot more time but still no luck. I now started using fp-eth1 as well on the edge vm (nsxtedge02) with Tier-0 router and mapped it to a different VLAN than the transport VLAN. I'm sharing some info below from my own debugging - hopefully that will expose something for someone to help me.

nsxtedge02(tier0_sr)> get interfaces

Logical Router

UUID                                   VRF    LR-ID  Name                              Type                      

e5421544-631e-41f5-bd47-1105da236e4f   2      3      DR--bs-tier0                DISTRIBUTED_ROUTER_TIER0  

Interfaces

    Interface     : b035990b-de81-58b2-86fc-1ec2fcdec173

    Ifuid         : 277

    Mode          : blackhole

    Interface     : b8fac27f-1adb-4adb-abef-c2d7a48785f4

    Ifuid         : 278

    Name          : bp-dr-port

    Mode          : lif

    IP/Mask       : 169.254.0.1/28;fe80::50:56ff:fe56:4452/64

    MAC           : 02:50:56:56:44:52

    VNI           : 71683

    LS port       : 9a0a8564-4ae8-4b3a-a0f1-44319390c79e

    Urpf-mode     : PORT_CHECK

    Admin         : up

    Op_state      : up

    MTU           : 1500

    Interface     : db4c0c0a-b342-49d8-905c-6b09db7ee9c8

    Ifuid         : 279

    Name          : -bs-tier0-bstier1gw-t0_lr

    Internal name : downlink-279

    Mode          : lif

    IP/Mask       : 100.64.240.0/31;fc21:3e51:80e3:b000::1/64;fe80::50:56ff:fe56:4452/64

    MAC           : 02:50:56:56:44:52

    VNI           : 71682

    LS port       : c2c3a0e6-594d-48fa-b302-70a1732587ac

    Urpf-mode     : PORT_CHECK

    Admin         : up

    Op_state      : up

    MTU           : 1500

    Interface     : fe3788d9-3597-575a-b8ef-845882c6e2e0

    Ifuid         : 276

    Mode          : cpu

Logical Router

UUID                                   VRF    LR-ID  Name                              Type                      

d8942a8a-b52d-4f9d-a170-d1ed79f51457   1      5      SR--bs-tier0                SERVICE_ROUTER_TIER0      

Interfaces

    Interface     : 4feca6e7-da2e-59b8-9717-b0b99b32df6c

    Ifuid         : 269

    Mode          : cpu

    Interface     : 3a5c3f9e-8f07-4aec-96d0-87a79f9792b3

    Ifuid         : 275

    Mode          : loopback

    IP/Mask       : 127.0.0.1/8;10.4.14.11/32;::1/128

    Interface     : 9d000121-d24b-5c3e-8324-cfe6356dfc12

    Ifuid         : 270

    Mode          : blackhole

    Interface     : 200603e9-5ded-49fd-9485-4a07b9531049

    Ifuid         : 273

    Name          : bp-sr0-port

    Mode          : lif

    IP/Mask       : 169.254.0.2/28;fe80::50:56ff:fe56:5300/64

    MAC           : 02:50:56:56:53:00

    VNI           : 71683

    LS port       : 312ce065-db04-4d35-8da9-a448fe281825

    Urpf-mode     : NONE

    Admin         : up

    Op_state      : up

    MTU           : 1500

    Interface     : b6f466c5-4a85-4c06-9a57-22f1366f5643

    Ifuid         : 272

    Name          : uplink-ns-exit

    Internal name : uplink-272

    Mode          : lif

    IP/Mask       : 10.4.14.10/24

    MAC           : 00:50:56:a5:7d:85

    LS port       : 9ff79c75-69e8-45c0-b561-11a39dbbcb76

    Urpf-mode     : STRICT_MODE

    Admin         : up

    Op_state      : up

    MTU           : 1600

nsxtedge02(tier0_sr)> ping 8.8.8.8 repeat 3

PING 8.8.8.8 (8.8.8.8): 56 data bytes

64 bytes from 8.8.8.8: icmp_seq=0 ttl=52 time=2.320 ms

64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=2.599 ms

64 bytes from 8.8.8.8: icmp_seq=2 ttl=52 time=2.139 ms

--- 8.8.8.8 ping statistics ---

3 packets transmitted, 3 packets received, 0.0% packet loss

round-trip min/avg/max/stddev = 2.139/2.353/2.599/0.189 ms

nsxtedge02(tier0_sr)> exit

nsxtedge02> vrf 2

nsxtedge02(vrf)> ping 8.8.8.8 repeat 3

PING 8.8.8.8 (8.8.8.8): 56 data bytes

--- 8.8.8.8 ping statistics ---

3 packets transmitted, 0 packets received, 100.0% packet loss

nsxtedge02(vrf)> get forwarding

Logical Router

UUID                                   VRF    LR-ID  Name                              Type                      

e5421544-631e-41f5-bd47-1105da236e4f   2      3      DR--bs-tier0                DISTRIBUTED_ROUTER_TIER0  

IPv4 Forwarding Table

IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC     

0.0.0.0/0          10.4.14.1       route    b6f466c5-4a85-4c06-9a57-22f1366f5643   64:00:6a:d6:4b:a1

10.4.14.0/24                       route    b6f466c5-4a85-4c06-9a57-22f1366f5643                   

10.4.14.10/32                      route    4feca6e7-da2e-59b8-9717-b0b99b32df6c                   

10.4.14.11/32                      route    3a5c3f9e-8f07-4aec-96d0-87a79f9792b3                   

100.64.240.0/32                    route    fe3788d9-3597-575a-b8ef-845882c6e2e0                   

100.64.240.0/31                    route    db4c0c0a-b342-49d8-905c-6b09db7ee9c8                   

127.0.0.1/32                       route    3a5c3f9e-8f07-4aec-96d0-87a79f9792b3                   

169.254.0.0/28                     route    200603e9-5ded-49fd-9485-4a07b9531049                   

169.254.0.1/32                     route    fe3788d9-3597-575a-b8ef-845882c6e2e0                   

169.254.0.2/32                     route    4feca6e7-da2e-59b8-9717-b0b99b32df6c                   

192.168.2.0/24     100.64.240.1    route    db4c0c0a-b342-49d8-905c-6b09db7ee9c8                   

192.168.3.0/24     100.64.240.1    route    db4c0c0a-b342-49d8-905c-6b09db7ee9c8                   

IPv6 Forwarding Table

IP Prefix                                     Gateway IP                                Type        UUID                                   Gateway MAC     

::1/128                                                                                 route       3a5c3f9e-8f07-4aec-96d0-87a79f9792b3                   

fc21:3e51:80e3:b000::/64                                                                route       db4c0c0a-b342-49d8-905c-6b09db7ee9c8                   

fc21:3e51:80e3:b000::1/128                                                              route       fe3788d9-3597-575a-b8ef-845882c6e2e0                   

fe80::/64                                                                               route       200603e9-5ded-49fd-9485-4a07b9531049

nsxtedge02(vrf)> ping 100.64.240.1 repeat 3

PING 100.64.240.1 (100.64.240.1): 56 data bytes

36 bytes from 100.64.240.1: Destination Host Unreachable

Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst

4  5  00 0054 0000   0 0000  40  01 d226 100.64.240.0  100.64.240.1

--- 100.64.240.1 ping statistics ---

3 packets transmitted, 0 packets received, 100.0% packet loss

nsxtedge02(vrf)> get neighbor

Logical Router

UUID        : e5421544-631e-41f5-bd47-1105da236e4f

VRF         : 2

LR-ID       : 3

Name        : DR--bs-tier0

Type        : DISTRIBUTED_ROUTER_TIER0

Neighbor

    Interface   : b8fac27f-1adb-4adb-abef-c2d7a48785f4

    IP          : fe80::50:56ff:fe56:5300

    MAC         : 02:50:56:56:53:00

    State       : perm

    Interface   : b8fac27f-1adb-4adb-abef-c2d7a48785f4

    IP          : 169.254.0.2

    MAC         : 02:50:56:56:53:00

    State       : perm

Logical Router

UUID        : d8942a8a-b52d-4f9d-a170-d1ed79f51457

VRF         : 1

LR-ID       : 5

Name        : SR--bs-tier0

Type        : SERVICE_ROUTER_TIER0

Neighbor

    Interface   : b6f466c5-4a85-4c06-9a57-22f1366f5643

    IP          : 10.4.14.1

    MAC         : 64:00:6a:d6:4b:a1

    State       : reach

    Timeout     : 804

nsxtedge02(vrf)> ping 10.4.14.1 repeat 2

PING 10.4.14.1 (10.4.14.1): 56 data bytes

--- 10.4.14.1 ping statistics ---

2 packets transmitted, 0 packets received, 100.0% packet loss

0 Kudos
daphnissov
Immortal
Immortal

From your web VM (or any VM connected to a LS) can you:

  1. Ping its gateway (downlink on T1)
  2. Ping the downlink on T0
  3. Ping the uplink on T0

Can you also show your NAT rules on your T0?

0 Kudos
RagsRachamadugu
Contributor
Contributor

From a web vm

  1. I'm able to ping its gateway (192.168.2.1) and even a vm in the app network with IP 192.168.3.4 (distributed routing).
  2. Ping to uplink on T1 connecting to T0, 100.64.240.1 works. Ping to downlink on T0 connecting to T1, 100.64.240.0 works
  3. Ping to uplink on T0, which is 10.4.14.10 does NOT work

Details from relevant API responses

    Details of the T0 uplink port

        {

            "subnets": [

                {

                    "ip_addresses": [

                        "10.4.14.10"

                    ],

                    "prefix_length": 24

                }

            ],

            "edge_cluster_member_index": [

                1

            ],

            "linked_logical_switch_port_id": {

                "target_id": "9ff79c75-69e8-45c0-b561-11a39dbbcb76",

                "target_display_name": "9ff79c75-69e8-45c0-b561-11a39dbbcb76",

                "target_type": "LogicalPort",

                "is_valid": true

            },

            "urpf_mode": "STRICT",

            "mtu": 1600,

            "mac_address": "00:50:56:a5:7d:85",

            "resource_type": "LogicalRouterUpLinkPort",

            "id": "b6f466c5-4a85-4c06-9a57-22f1366f5643",

            "display_name": "uplink-ns-exit",

            "logical_router_id": "e5421544-631e-41f5-bd47-1105da236e4f",

            "_create_user": "admin",

            "_create_time": 1558050667180,

            "_last_modified_user": "admin",

            "_last_modified_time": 1558158219773,

            "_system_owned": false,

            "_protection": "NOT_PROTECTED",

            "_revision": 6

        }

NAT rules on tier-0 gateway

{

    "results": [

        {

            "rule_priority": 1124,

            "action": "SNAT",

            "match_source_network": "192.168.2.0/24",

            "translated_network": "10.4.14.11",

            "enabled": true,

            "logging": true,

            "logical_router_id": "e5421544-631e-41f5-bd47-1105da236e4f",

            "nat_pass": false,

            "firewall_match": "MATCH_INTERNAL_ADDRESS",

            "internal_rule_id": "01003000-0000-0402-0000-000000000003",

            "resource_type": "NatRule",

            "id": "1026",

            "display_name": "web-tier",

            "tags": [

                {

                    "scope": "policyPath",

                    "tag": "/infra/tier-0s/apstra-bs-tier0/nat/USER/nat-rules/76822df0-783a-11e9-b88f-ef8f85d7f602"

                }

            ],

            "_create_user": "nsx_policy",

            "_create_time": 1558052893072,

            "_last_modified_user": "nsx_policy",

            "_last_modified_time": 1558158288113,

            "_system_owned": false,

            "_protection": "REQUIRE_OVERRIDE",

            "_revision": 1

        }

    ],

    "result_count": 1,

    "sort_by": "rule_priority"

}

0 Kudos
daphnissov
Immortal
Immortal

And are you able to ping the T0 uplink interface (10.4.14.10) from:

  1. Another host on the same segment (10.4.14.0/24)?
  2. A different host on a different segment?
0 Kudos
RagsRachamadugu
Contributor
Contributor

I'm able to ping '10.4.14.10' from my laptop on the LAN and from the ToR. Even the ping to '10.4.14.11' (NAT translated IP) is working.

BTW I'm observing that tunnel status on the edge hosting the Tier0 router shows as down. This was up last nigh (iow 8+ hours back)

api/v1/transport-nodes/8e7ef61e-775d-11e9-9ad6-005056a55d96/tunnels

{

    "tunnels": [

        {

            "name": "geneve168037270",

            "status": "DOWN",

            "egress_interface": "fp-eth0",

            "local_ip": "10.4.11.152",

            "remote_ip": "10.4.11.150",

            "remote_node_id": "a6291406-72a4-11e9-a5a9-005056a55d90",

            "remote_node_display_name": "nsxtedge01",

            "encap": "GENEVE",

            "bfd": {

                "state": "DOWN",

                "active": true,

                "forwarding": false,

                "diagnostic": "CONTROL_DETECTION_TIME_EXPIRED",

                "remote_state": "DOWN",

                "remote_diagnostic": "NO_DIAGNOSTIC"

            },

            "last_updated_time": 1558201153510

        }

    ],

    "result_count": 1,

    "sort_by": "tunnelName",

    "sort_ascending": true

}

But the interface itself is up

root@nsxtedge02:~# nsxcli

NSX CLI (Edge 2.4.0.0.0.12454265). Press ? for command list or enter: help

nsxtedge02> vrf 0

nsxtedge02(vrf)> get interfaces

Logical Router

UUID                                   VRF    LR-ID  Name                              Type                      

736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                    

Interfaces

    Interface     : 9fd3c667-32db-5921-aaad-7a88c80b5e9f

    Ifuid         : 261

    Mode          : blackhole

    Interface     : 31f2f7f3-0c5c-579a-8306-42968882cb0e

    Ifuid         : 288

    Name          :

    Mode          : lif

    IP/Mask       : 10.4.11.152/24

    MAC           : 00:50:56:a5:fa:fd

    LS port       : 1f3fbeae-b02f-55a5-95d8-5f388421767e

    Urpf-mode     : PORT_CHECK

    Admin         : up

    Op_state      : up

    MTU           : 1600

    Interface     : f322c6ca-4298-568b-81c7-a006ba6e6c88

    Ifuid         : 260

    Mode          : cpu

nsxtedge02(vrf)>

The VTEP IP on this edge, 10.4.11.152 is not pingable from anywhere else, except from vrf 0 on the edge itself.

0 Kudos
daphnissov
Immortal
Immortal

Hang on here. So you have your TEP network on the same segment as your VLAN uplink for the T0? If so, that's not going to work. They need to be on separate networks. The underlay and VLAN transport zone cannot be the same.

0 Kudos
RagsRachamadugu
Contributor
Contributor

My TEP network on tier 0 is 10.4.11.0/24 and my up link network is 10.4.14.0/24. They are different. Where did you see them to be the same?

0 Kudos
daphnissov
Immortal
Immortal

My mistake, you are correct.

On your VLAN TZ, are you configuring a VLAN ID? Can you show your edge interface assignments?

View solution in original post

0 Kudos
RagsRachamadugu
Contributor
Contributor

Ok things are working now. Per your suggestion about edge interface assignments, I took a look at the uplink profile used for the overlay transport zone - which had a [transport] vlan of 11. I changed this to use a different uplink profile that has a vlan of 0. This brought the VTEP on the edge back up and after this I'm able to ping external and LAN endpoints just fine from the web VM!

Summarizing my learnings here

  1. When I initially started, the edge hosting the tier0 router had only 2 interfaces - eth0 and fp-eth0. I was planning to use fp-eth0 both for transport/overlay and uplink traffic on the same VLAN. I expected the system to error out if using the same VLAN is not going to work.
  2. After a hunch (perhaps I read/heard about unique VLANs somewhere), I added a new VLAN 14 on ToR and enabled a new vnic on edge vm, thus having access to fp-eth1 for uplink traffic. While doing this, I also changed the VSS port-group (to which all edge VNIC are connecting to), to use VLAN 4095 (trunk and allow all VLANs). Then I changed the uplink profile for overlay transport zone to something that explicitly tags packets with VLAN 11 (for transport network) - this should work as the ToR is configured to accept untagged or tagged-with-11 packets. Anyways, after your valuable suggestions, I went back and changed this to VLAN 0, which made things work!

Overall this was a good learning experience but, oh boy there are way too many concepts, screens/steps, and tier-0 router port broken GUI etc.. But thanks to good community forum like this, I'm out of woods (for now)!

Thanks Rags

0 Kudos