VMware Networking Community
WarlockArg
Enthusiast
Enthusiast
Jump to solution

NSX-T 3.0 Tier-0 and Tier-1 Gateways don't route between downlinks and uplinks

Hi everybody,

   I'm doing my first implementation of NSX-T and have an issue in the T-0 and T1 Gateways that I think is because a fault of mine.

   I deployed a Tier-0 Gateway that manages only one uplink to a physical core switch with a VLAN segment as the uplink (192.168.16.0/24) , and for its downlink I have two LIFs on overlay segments (10.10.101.0/24 and 192.168.17.0/24) and the other one connected to a Tier-1 Gateway (the auto plumbed NSX Network).

   On the Tier-1 Gateways I have as its uplink the auto plumbed network and as its downlinks two overlay segments (10.10.102.0/24 and 10.10.103.0/24).

   The problem I'm facing is that I see that both gateways route the traffic between their downlinks without problem. It is to say, hosts on 192.168.17.0/24 can ping hosts con 10.10.101.0/24 and also to the auto plumbed network (100.64.112.0/31). The same happens on the Tier-1 gateway, hosts on 10.10.102.0/24 can ping hosts on 10.10.103.0/24. But no one can ping hosts or gateway's interfaces that have to traverse the Gateway. For example hosts on overlay segments connected to Tier-1 Gateway cannot ping hosts on Tier-0 Gateway's overlay segment. They neither can ping the inter gateways interfaces.
    The question is, could be possible that this happens because the gateways theirself are not the same device but actually two ones, the DR and SR and the DRs don't know how to reach its half in the SR??

   One more thing, I'm not using BGP neither statics routes. What I have observed is that when I choose to distribute the connected subnets in the Tier-1 Gateway, from the VRF CLI of the Tier-0 Gateway SR I can see those subnets in the "get routes" command as "t1c" (tier-1 connected). But from that CLI in the Tier-0 SR if I ping those LIFs in the Tier-1 DR I cannot reach them.
    It seems that there are no connections between the DR and SR entities, although there is a subnet (169.254.0.0/24).
    The other problem I have is that I don't know what the Tier-1 Gateway routing table is because the "get router" command is only available on the Tier-0 Gateway and not in the Tier-1 one. This might be a conceptual error I'm not realizing why that is happening.

    All the documents and videos I read and saw, show that these type of implementations are done activating BGP, that perhaps it solves all the issues I'm having. But because the customer doesn't have a core for now that supports BGP, I preferred no to use a dynamic routing protocol. Perhaps, if it is convenient, if the problem is I'm not using BGP, I could enable it just for the internal virtual networking without advertising any route to the physical network. It is to say, I could use BGP for NSX networking and use statics routes between the Edge and physical network (I know it is not optimal). For the moment, the production will have only one or two NSX segments, that is manageable thru static routes. In the future the customer will change his core for a BGP capable one.

   I'll appreciate if you can help me why this is going on.

   I attach a network diagram below.

pastedImage_3.png

Thanks in advance.

Guido.

1 Solution

Accepted Solutions
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I finally solved the problem. I had to open a support ticket with VMware.

The problem was that tunnels between the Host Transport Nodes and Edge Nodes were not up and running. However, in the NSX Manager GUI they appeared as up and in green color!!! There is a bug there that the GUI shows you tunnels UP and running where they are actually down and without connection between the VTEPs.

Those tunnels were down because there was no layer-3 connection between the Host Transport Node VTEPs and the Edge Node VTEP. This infrastructure were running on a HP chassis C7000 with two Virtual Connect switches. There was a problem in the configuration of the core switch LAG ports. In the core, the ports that were connected to the Virtual Connect HP switches shouldn't have been configured as a LAG (port-channel) but as single ports.

We unconfigured the LAG and everything began to work.

Thanks for your help.

Guido.

View solution in original post

Reply
0 Kudos
10 Replies
p0wertje
Hot Shot
Hot Shot
Jump to solution

Hi,

I am not seeing this behavior.

When i look on the T0-SR, i see

t1c> * 10.1.192.0/24 [3/0] via 100.64.32.7, downlink-500, 00:00:02

When i move the segment to the t0 is see

t0c> * 10.1.192.0/24 is directly connected, downlink-529, 00:01:11

In both cases i can ping a host in the segment. from behind a t1

You can see some routes on the T1-DR using "get forwarding"

T1-DR > get forwarding

Logical Router

UUID                                   VRF    LR-ID  Name                              Type

b5c5f619-4368-4f1f-b316-7a1fd6c1f92d   10     8      DR-T1-test                        DISTRIBUTED_ROUTER_TIER1

IPv4 Forwarding Table

IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC

0.0.0.0/0          169.254.0.2     route    d58d74b5-6537-4c68-9293-6ad10fd97a4f   02:50:56:56:53:00

169.254.0.0/28                     route    d58d74b5-6537-4c68-9293-6ad10fd97a4f

169.254.0.1/32                     route    fc58ab32-deee-5d9d-b1f3-8c7aedc10a98

Are all MTU's ok ? do you see geneve tunnels ?

pastedImage_0.png

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
WarlockArg
Enthusiast
Enthusiast
Jump to solution

Chris,

   First of all, thank you for your answer.

   I'll paste the output of some commands:

From Tier-0 SR:

nsxtesg0(tier0_sr)> get route

Flags: t0c - Tier0-Connected, t0s - Tier0-Static, b - BGP,

t0n - Tier0-NAT, t1s - Tier1-Static, t1c - Tier1-Connected,

t1n: Tier1-NAT, t1l: Tier1-LB VIP, t1ls: Tier1-LB SNAT,

t1d: Tier1-DNS FORWARDER, t1ipsec: Tier1-IPSec, isr: Inter-SR,

> - selected route, * - FIB route

Total number of routes: 9

t0c> * 10.10.101.0/24 is directly connected, downlink-337, 1d16h22m

t1c> * 10.10.102.0/24 [3/0] via 100.64.112.1, downlink-294, 1d06h53m

t1c> * 10.10.103.0/24 [3/0] via 100.64.112.1, downlink-294, 1d06h53m

t0c> * 100.64.112.0/31 is directly connected, downlink-294, 1d16h22m

t0c> * 169.254.0.0/24 is directly connected, downlink-344, 1d16h22m

t0c> * 192.168.16.0/24 is directly connected, uplink-346, 1d16h22m

t0c> * 192.168.17.0/24 is directly connected, downlink-328, 1d16h22m

t0c> * fcef:2313:2800:800::/64 is directly connected, downlink-294, 1d16h22m

t0c> * fe80::/64 is directly connected, downlink-344, 1d16h22m

The two routes marked in red, the t1c, appeared when I selected to redistribute connected routes on Tier-1 Gateway. Before that I had only been seeing the "t0c" routes.

If I ping from this T1 SR to 192.168.16.1, the uplink interface that connects to physical routers, that is in a VLAN backed segment I reach the destination (it is logical causa it is the direct-connected interface):

nsxtesg0(tier0_sr)> ping 192.168.16.1

PING 192.168.16.1 (192.168.16.1): 56 data bytes

64 bytes from 192.168.16.1: icmp_seq=0 ttl=255 time=7.479 ms

64 bytes from 192.168.16.1: icmp_seq=1 ttl=255 time=1.731 ms

64 bytes from 192.168.16.1: icmp_seq=2 ttl=255 time=1.995 ms

64 bytes from 192.168.16.1: icmp_seq=3 ttl=255 time=1.234 ms

Moreover, I algo reach the core's switch interface (192.168.16.2):

nsxtesg0(tier0_sr)> ping 192.168.16.2

PING 192.168.16.2 (192.168.16.2): 56 data bytes

64 bytes from 192.168.16.2: icmp_seq=0 ttl=64 time=0.635 ms

64 bytes from 192.168.16.2: icmp_seq=1 ttl=64 time=0.549 ms

But if I ping one of the LIFs that is also "direct-connected" I cannot reach it. For example 192.168.101.1

nsxtesg0(tier0_sr)> ping 10.10.101.1

PING 10.10.101.1 (10.10.101.1): 56 data bytes

^C

--- 10.10.101.1 ping statistics ---

5 packets transmitted, 0 packets received, 100.0% packet loss

At this moment I thought that this happens because that overlay segment is actually direct-connected to the T0 DR, not to the SR. So, I moved to the Tier-0 DR:

From Tier-0 DR:

nsxtesg0> get logical-router

Logical Router

UUID                                   VRF    LR-ID  Name                              Type                        Ports

736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4

b958c9bb-0d97-4c73-b39b-73b045e5ed76   3      1026   SR-RosarioT-1-Gateway             SERVICE_ROUTER_TIER1        5

224c5db1-d92f-488f-9787-cb736b2cd396   4      2      DR-RosarioT-0-Gateway             DISTRIBUTED_ROUTER_TIER0    6

002b63f3-688b-4fc7-84f1-bed9b00cc035   6      4      SR-RosarioT-0-Gateway             SERVICE_ROUTER_TIER0        5

e6eb4986-7e44-4dfa-b1d7-1da0c8ac3d41   8      1025   DR-RosarioT-1-Gateway             DISTRIBUTED_ROUTER_TIER1    5

nsxtesg0> vrf 4

nsxtesg0(vrf)> ping 10.10.101.1

PING 10.10.101.1 (10.10.101.1): 56 data bytes

64 bytes from 10.10.101.1: icmp_seq=0 ttl=64 time=0.677 ms

64 bytes from 10.10.101.1: icmp_seq=1 ttl=64 time=1.041 ms

64 bytes from 10.10.101.1: icmp_seq=2 ttl=64 time=0.665 ms

And from the DR I can reach the Tier-0 LIFs. So, I thought it might have been a problem in the connection between the SR and DR. For some reason, the SR knows that it has a LIF direct-connected to him but it doesn't know how to reach it. Very strange behaviour, because in a normal router, is an interface is directly connected, it always can reach it (I don't have too much experience working with VRF, perhaps this is a normal behaviour. Actually I don't think it because you don't usually have a router split in two parts, one the DR and one the SR).

The last test in the Tier-0 Gateway is that if I ping from the host 10.10.101.11 to the host 192.168.17.11 (two segments that are connected by the Tier-0 Gateway, it works ok):

pastedImage_18.png

pastedImage_19.png

From Tier-1 SR:

If I move to Tier-1 SR:

nsxtesg0> vrf 3

nsxtesg0(tier1_sr)> get forwarding

Logical Router

UUID                                   VRF    LR-ID  Name                              Type

b958c9bb-0d97-4c73-b39b-73b045e5ed76   3      1026   SR-RosarioT-1-Gateway             SERVICE_ROUTER_TIER1

IPv4 Forwarding Table

IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC

0.0.0.0/0          100.64.112.0    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

10.10.102.0/24                     route    b9ba19cf-af79-4c88-b6a3-8749d767708a

10.10.102.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

10.10.103.0/24                     route    b4d4b235-70f1-4c25-99cf-4c205ecb5044

10.10.103.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

100.64.112.0/31                    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

100.64.112.1/32                    route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

127.0.0.1/32                       route    2e52e6b9-d924-45fb-aa80-7568525e3630

169.254.0.0/28                     route    fe859aa2-5095-42d7-973c-2b8dfccad54c

169.254.0.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

169.254.0.2/32                     route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

IPv6 Forwarding Table

IP Prefix                                     Gateway IP                                Type        UUID                                   Gateway MAC

::/0                                          fcef:2313:2800:800::1                     route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

::1/128                                                                                 route       2e52e6b9-d924-45fb-aa80-7568525e3630

fcef:2313:2800:800::/64                                                                 route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

fcef:2313:2800:800::2/128                                                               route       15f2f498-b4ad-59de-a31c-1d165cd3ecff

As you can see, I don't have any entry to get the segments that are direct-connected to Tier-0 DR neither Tier-0 SR but a default route. Something strange here. I can ping the LIF of the overlay segment direct-connected to Tier-0 DR, but I cannot ping a host from that segment:

nsxtesg0(tier1_sr)> ping 10.10.101.1

PING 10.10.101.1 (10.10.101.1): 56 data bytes

64 bytes from 10.10.101.1: icmp_seq=0 ttl=64 time=1.428 ms

64 bytes from 10.10.101.1: icmp_seq=1 ttl=64 time=1.262 ms

64 bytes from 10.10.101.1: icmp_seq=2 ttl=64 time=1.439 ms

nsxtesg0(tier1_sr)> ping 10.10.101.11

PING 10.10.101.11 (10.10.101.11): 56 data bytes

--- 10.10.101.11 ping statistics ---

4 packets transmitted, 0 packets received, 100.0% packet loss

And I also cannot ping its own LIF, although it is direct-connected to him (the same behaviour as in the Tier-0 SR):

nsxtesg0(tier1_sr)> ping 10.10.103.1

PING 10.10.103.1 (10.10.103.1): 56 data bytes

--- 10.10.103.1 ping statistics ---

5 packets transmitted, 0 packets received, 100.0% packet loss

But I can ping the uplink interface that connects to Tier-0 Gateway:

nsxtesg0(tier1_sr)> ping 100.64.112.1

PING 100.64.112.1 (100.64.112.1): 56 data bytes

64 bytes from 100.64.112.1: icmp_seq=0 ttl=64 time=0.708 ms

64 bytes from 100.64.112.1: icmp_seq=1 ttl=64 time=1.432 ms

64 bytes from 100.64.112.1: icmp_seq=2 ttl=64 time=1.384 ms

​From Tier-1 DR:

Lastly, if I move to the Tier-1 DR I get:

nsxtesg0> vrf 8

nsxtesg0(vrf)> get forwarding

Logical Router

UUID                                   VRF    LR-ID  Name                              Type

e6eb4986-7e44-4dfa-b1d7-1da0c8ac3d41   8      1025   DR-RosarioT-1-Gateway             DISTRIBUTED_ROUTER_TIER1

IPv4 Forwarding Table

IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC

0.0.0.0/0          100.64.112.0    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23   02:50:56:56:44:52

10.10.102.0/24                     route    b9ba19cf-af79-4c88-b6a3-8749d767708a

10.10.102.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

10.10.103.0/24                     route    b4d4b235-70f1-4c25-99cf-4c205ecb5044

10.10.103.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

100.64.112.0/31                    route    ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

100.64.112.1/32                    route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

127.0.0.1/32                       route    2e52e6b9-d924-45fb-aa80-7568525e3630

169.254.0.0/28                     route    fe859aa2-5095-42d7-973c-2b8dfccad54c

169.254.0.1/32                     route    ccb76982-83e8-5ef4-8d6f-067542430bab

169.254.0.2/32                     route    15f2f498-b4ad-59de-a31c-1d165cd3ecff

IPv6 Forwarding Table

IP Prefix                                     Gateway IP                                Type        UUID                                   Gateway MAC

::/0                                          fcef:2313:2800:800::1                     route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

::1/128                                                                                 route       2e52e6b9-d924-45fb-aa80-7568525e3630

fcef:2313:2800:800::/64                                                                 route       ad8ae731-1476-4b0f-b6f7-772b7cfdeb23

fcef:2313:2800:800::2/128                                                               route       15f2f498-b4ad-59de-a31c-1d165cd3ecff

As you can see, the same forwarding table as in the Tier-1 SR (it is logical):

But from here I can ping its  LIFs:

nsxtesg0(vrf)> ping 10.10.103.1

PING 10.10.103.1 (10.10.103.1): 56 data bytes

64 bytes from 10.10.103.1: icmp_seq=0 ttl=64 time=0.742 ms

64 bytes from 10.10.103.1: icmp_seq=1 ttl=64 time=0.885 ms

64 bytes from 10.10.103.1: icmp_seq=2 ttl=64 time=1.233 ms

--- 10.10.103.1 ping statistics ---

4 packets transmitted, 3 packets received, 25.0% packet loss

round-trip min/avg/max/stddev = 0.742/0.953/1.233/0.206 ms

But I cannot ping its uplink interface that connect to Tier-0 Gateway:

nsxtesg0(vrf)> ping 100.64.112.1

PING 100.64.112.1 (100.64.112.1): 56 data bytes

--- 100.64.112.1 ping statistics ---

6 packets transmitted, 0 packets received, 100.0% packet loss

So, as I said in my first post, It seems that there is something broken in the connection between the gateways' DR and SR components.

I think the MTUs are OK, everything in 1600 except the VLAN segments. I think this wouldn't generate issues now cause I'm testing with standard pings that have 56 bytes of payload.

pastedImage_0.png

Some tunnels are down but I think they are in that state because I don't have any traffic on them, cause when I create new VMs o generate traffic they come up.

One question Chris, cause you say that you don't observe this behaviour in your infrastructure. Are you using BGP or just static routes? Because at the moment, I haven't activated any routing protocol in my infrastructure. Perhaps, activating BGP and redistributing routes in both, Tier-0 and Tier-1 Gateways I would solve the problem of interconnection that I'm having between gateways' SR and DR components. I don't know, just an idea.

Thank you.

Guido.

Reply
0 Kudos
p0wertje
Hot Shot
Hot Shot
Jump to solution

HI,

I indeed have a tier-0 with bgp running.

But for testing purposes i created a second tier0 without any bgp enabled.

Connected a segment to that tier-0 and connect the T1 with segment also to that tier0

I am using physical edge nodes. But that should not make any difference.

I am able to ping without any problems.

pastedImage_0.png

pastedImage_1.png

pastedImage_2.png

Is it maybe a firewall rule ? on your vm ? esp. windows blocks stuff by default

Double check your mtu settings. Also the setting on the vswitch (when using virtual-edge-nodes)

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
Reply
0 Kudos
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I'll check the uplink profiles of the edge. In my case my edge node is virtual. I'm using two N-VDS in the edge node, one for the overlay traffic and one for the VLAN backed segment. Perhaps there's something wrong there.

It's a straight forward configuration what I'm testing, I shouldn't be having this problems. And the edge node configuration might have something to do with this, cause my problems are with the connection between the Gateway's DR and SR components, and that connections are taking place inside the edge node.

All the VMs I'm testing are lubuntu, light version of Ubuntu. They have no firewall activated. On the other hand, I'm also having problems pinging the routers interfaces from inside the gateway itself!!!

I'll tell you if I find the problems.

Thanks for everything Chris.

Guido.

Reply
0 Kudos
p0wertje
Hot Shot
Hot Shot
Jump to solution

Check the mtu on the vswitch the virtual-edge is connected to. (also on you physical switch)
Allot of issues appear when mtu settings are wrong.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
Reply
0 Kudos
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I think I found the problem. There are some Geneve tunnels between the transport nodes and edge nodes that are not coming up.

You asked me to check the tunnels status and I saw that some of them were down but I had read somewhere those tunnels were coming up and down depending whether on the transport nodes were VMs connected to logical switches or not. It is to say, if there are no VMs connected to a logical switch (segment) in a transport node, that transport node doesn't activate the Geneve tunnel against other transport node cause it has no sense to maintain an enabled tunnel that won't have any traffic.

What I didn't notice was that I had two transport nodes that had never activated their tunnels against the edge nodes, no to other transport nodes but the edge ones. When I used the trace tool that comes with NSX GUI I realized the traffic were stuck in the tunnel that didn't exist!!!

My infrastructure for now is a 3-fully-collapsed-node, where I used them as transport, management and edge nodes. What I don't understand is why just one of them has the Geneve tunnels up against the edge transport node and the other two don't. All I read about sharing a host for edge and transport node is that you cannot share the same VLAN ID for overlay traffic when you use the same virtual switch for the host transport node VTEPs and for Edge Transport Node ones. I'm using the same VLAN ID and layer-3 segment for all the VTEPs, but I DO NOT use the same virtual switch for them inside the host. For the Host Transport Node I use a N-VDS that uses two physical vmnics (vmnic 2 and 3), and the Edge Transport Node VM is connected to another vSphere VDS that uses another two physical vmnics (0 and 1). I'll open another thread in order to see if someone knows what could be happening.

Thanks you very much.
Guido.

Reply
0 Kudos
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I finally solved the problem. I had to open a support ticket with VMware.

The problem was that tunnels between the Host Transport Nodes and Edge Nodes were not up and running. However, in the NSX Manager GUI they appeared as up and in green color!!! There is a bug there that the GUI shows you tunnels UP and running where they are actually down and without connection between the VTEPs.

Those tunnels were down because there was no layer-3 connection between the Host Transport Node VTEPs and the Edge Node VTEP. This infrastructure were running on a HP chassis C7000 with two Virtual Connect switches. There was a problem in the configuration of the core switch LAG ports. In the core, the ports that were connected to the Virtual Connect HP switches shouldn't have been configured as a LAG (port-channel) but as single ports.

We unconfigured the LAG and everything began to work.

Thanks for your help.

Guido.

Reply
0 Kudos
grantd
Contributor
Contributor
Jump to solution

Where are you getting the ICONs/Stencils for the drawings?

Reply
0 Kudos
p0wertje
Hot Shot
Hot Shot
Jump to solution

Hi,

 

It is in the nsx-t gui itself.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
Reply
0 Kudos
grantd
Contributor
Contributor
Jump to solution

Yeah I know, was hoping there was external icons for concept diagrams.

Reply
0 Kudos