VMware Networking Community
WarlockArg
Enthusiast
Enthusiast
Jump to solution

NSX-T 3.0 VTEPs Geneve Tunnel don't come up against the Edge Transport Nodes

Hi,

    I have a 3-fully-collapsed cluster (3 hosts for transport, management and edge). I noticed that Geneve tunnels between the transport nodes and the edge node don't come up from two of the transport nodes but they do from one of them!!!
    I use the same layer-3 segment and same VLAN ID for VTEPs for the host transport and edge transport nodes. I read a lot of documentation, communities and blogs where all say that you cannot share the same VLAN ID for the host and edge transport nodes IF you use the SAME virtual switch in the host for the VTEPs host itself and for the edge VM's VTEPs. But that is not my case. My three hosts have 4 physical vmnics (0 to 3). I use a N-VDS that uses two physical vmnics (2 and 3) for the host transport node's VTEPs and another vSphere DVS that has two vmnics (0 and 1) where the Edge VM is connected. So, I thought I could use the same VLAN ID for all the VTEPs. I'm not sure if this is actually the problem why the Geneve tunnels don't come up from two host transport nodes but they do from one of them. It is supposed that if the same VLAN ID for the VTEPs was the problem the tunnels wouldn't come up from one host and don't from the other two. Am I correct?

Just a pair of images where you can see that from one host they come up and from other one don't.

pastedImage_1.png

pastedImage_4.png

Does anyone know why this could be happening?

Thank you advance,

Guido.

1 Solution

Accepted Solutions
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I finally solved the problem. I had to open a support ticket with VMware.

The problem was that tunnels between the Host Transport Nodes and Edge Nodes were not up and running. However, in the NSX Manager GUI they appeared as up and in green color!!! There is a bug there that the GUI shows you tunnels UP and running where they are actually down and without connection between the VTEPs. (you can see in the image the the tunnel against the IP 192.168.12.17 is up where it is actually down).

Those tunnels were down because there was no layer-3 connection between the Host Transport Node VTEPs and the Edge Node VTEP. This infrastructure were running on a HP chassis C7000 with two Virtual Connect switches. There was a problem in the configuration of the core switch LAG ports. In the core, the ports that were connected to the Virtual Connect HP switches shouldn't have been configured as a LAG (port-channel) but as single ports.

We unconfigured the LAG and everything began to work.

Thanks for your help.

Guido.

View solution in original post

9 Replies
daphnissov
Immortal
Immortal
Jump to solution

You can use the same VLAN if you are spreading the host nodes and the edge nodes across switches. More than likely you have an MTU or trunking problem so I'd check end to end that MTU is a minimum 1600 and the proper IDs are trunked.

Reply
0 Kudos
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I have made a lot of tests in order to discard MTU and connection problems and I got the following results:

From each ESXi CLI I executed the following ping tests:

vmkping -s 1572 -S vxlan -I vmk10 -d 192.168.12.x

"x" is each vxlan vmkernel (VTEP) of the other two ESXi and lastly the VTEP IP of the Edge Node.

From the two hosts which have the Geneve tunnels down against the Edge node (host 5 and 6) I can reach all the VTEPs of the other host transport nodes (ESXi hosts) but not the edge one. It is to say, I cannot ping the VTEP of the edge node which is what I see from the GUI, the Geneve tunnel against it is down from those two hosts. But they are up against the other two transport nodes.

I make another test that was taking out the "size" argument of the ping against the VTEP IP of the Edge Node in order to see if it was a MTU problem and I neither could reach it. It is as if the Edge Node's VTEP didn't exist for those hosts. No ping response.

But from the only ESXi host of the three ones that has the Geneve tunnel up and running against the Edge Node I can ping all the VTEPs (the ones that belong to the host transport nodes and the one that belongs to the edge node). It is also consistent with what I see in the GUI, that that ESXi host (host 14) has the tunnel with the Edge Node up.

Strange thing:

Just to see if there was something wrong in the configuration of the VDS where the Edge VM is connected to, I created in the same Distributed Port Group a vmkernel port for hosts 5 and 6 (as default, without associating it to any IP stack). I configured it with an IP address of the same subnet of the VTEPs. Remember those two hosts have the tunnels with the Edge Node down.

I executed the same ping using that vmkernel interface as the source and I couldn't reach any of the other hosts VTEPs (the ones that could be reached before using as source interface the VTEPs) but I could reach the Edge Node one!!! It is to say, for some reason from a vmkernel port in the same VTEP VLAN and subnet I cannot reach the other hosts VTEPs but I can reach the VTEP that I cannot reach from the other VTEPs that are created in the N-VDS of each host.

I don't know if it has anything to do with the used IP stack..

But from the point of view of the Edge Node VTEP, what is the difference between the ping request it receives from the VTEPs of hosts 5 or 6, and the same ping request it receives from the vmkernel port of host 5?? Just the source IP address. In the ping packet there is no information about the source ESXi interface that originates the request nor the IP stack. It is an ICMP standard packet!!!

One thing I don't know how to test:

I don't kwnow how to generate the same ping requests that I do from the ESXi CLI but do it from the Edge Node. Because from the Edne Node CLI the only interface I see (if iI issue the "get interfaces" command) is the managent one, not the VTEP is has configured. I see the VTEP issuing the command "get logical switches", but is there any ways to generate a ping request from that IP address as the source?

Thank you,

Guido

Reply
0 Kudos
p0wertje
Hot Shot
Hot Shot
Jump to solution

You can do the pinging from the edge :

get tunnel-port

i.e

Tunnel : 49905f4e-a4ed-52bf-a596-70958395d223
IFUID  : 266
LOCAL  : 10.0.9.106
REMOTE : 10.0.9.104
ENCAP  : GENEVE

ping 10.0.9.104 source 10.0.9.106 vrfid 0 size 1572 dfbit enable

PING 10.0.9.104 (10.0.9.104) from 10.0.9.106: 1572 data bytes

1580 bytes from 10.0.9.104: icmp_seq=0 ttl=64 time=0.329 ms

1580 bytes from 10.0.9.104: icmp_seq=1 ttl=64 time=0.323 ms

1580 bytes from 10.0.9.104: icmp_seq=2 ttl=64 time=0.278 ms

If mtu is incorrect, you see something like:

ping 10.0.9.104 source 10.0.9.106 vrfid 0 size 1800 dfbit enable

PING 10.0.9.104 (10.0.9.104) from 10.0.9.106: 1800 data bytes

36 bytes from 10.0.9.107: frag needed and DF set (MTU 1600)

Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst

4  5  00 0724 0000   0 0000  40  01 0d08 10.0.9.106  10.0.9.104

Or

vrf 0

get neighbor

Interface   : f75fb918-a629-5f62-83d0-ff98e832d553

    IP          : 10.0.9.102

    MAC         : 00:50:56:6a:bf:cf

    State       : reach

    Timeout     : 615

And then ping

ping 10.0.9.102 size 1572 dfbit enable

PING 10.0.9.102 (10.0.9.102): 1572 data bytes

1580 bytes from 10.0.9.102: icmp_seq=0 ttl=64 time=0.454 ms

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I get the following output from the commands you told me Chris:

nsxtesg0(vrf)> get neighbor

Logical Router

UUID        : 736a80e3-23f6-5a2d-81d6-bbefb2786666

VRF         : 0

LR-ID       : 0

Name        :

Type        : TUNNEL

Neighbor

    Interface   : fdb716f0-4a7a-50a4-9cff-6872f19c73de

    IP          : 192.168.12.12

    MAC         : 00:50:56:62:f7:92

    State       : reach

    Timeout     : 341

    Interface   : fdb716f0-4a7a-50a4-9cff-6872f19c73de

    IP          : 192.168.12.13

    MAC         : 00:50:56:68:8d:54

    State       : reach

    Timeout     : 921

    Interface   : fdb716f0-4a7a-50a4-9cff-6872f19c73de

    IP          : 192.168.12.15

    MAC         : 00:50:56:68:35:3c

    State       : reach

    Timeout     : 656

    Interface   : fdb716f0-4a7a-50a4-9cff-6872f19c73de

    IP          : 192.168.12.16

    MAC         : 00:50:56:60:d5:db

    State       : reach

    Timeout     : 1083

    Interface   : fdb716f0-4a7a-50a4-9cff-6872f19c73de

    IP          : 192.168.12.19

    MAC         : 00:50:56:94:1f:42

    State       : reach

    Timeout     : 563

    Interface   : fdb716f0-4a7a-50a4-9cff-6872f19c73de

    IP          : 192.168.12.14

    MAC         : 00:50:56:6e:9e:ae

    State       : reach

    Timeout     : 296

    Interface   : fdb716f0-4a7a-50a4-9cff-6872f19c73de

    IP          : 192.168.12.11

    MAC         : 00:50:56:66:fc:79

    State       : reach

    Timeout     : 321

As you can see, all the neighbors says "reach" (I don't know whether that means the that the edge node can reach that neighbor), but the only tunnel I have up is to 192.168.12.11 and 192.168.12.12 that are the only ones that can be reached by a ping:

nsxtesg0(vrf)> ping 192.168.12.13

PING 192.168.12.13 (192.168.12.13): 56 data bytes

--- 192.168.12.13 ping statistics ---

3 packets transmitted, 0 packets received, 100.0% packet loss

nsxtesg0(vrf)> ping 192.168.12.11

PING 192.168.12.11 (192.168.12.11): 56 data bytes

64 bytes from 192.168.12.11: icmp_seq=0 ttl=64 time=1.156 ms

64 bytes from 192.168.12.11: icmp_seq=1 ttl=64 time=1.707 ms

64 bytes from 192.168.12.11: icmp_seq=2 ttl=64 time=2.186 ms

--- 192.168.12.11 ping statistics ---

4 packets transmitted, 3 packets received, 25.0% packet loss

round-trip min/avg/max/stddev = 1.156/1.683/2.186/0.421 ms

nsxtesg0(vrf)> ping 192.168.12.11 size 1572 dfbit enable    //Forcing the packet size and don't fragment bit

PING 192.168.12.11 (192.168.12.11): 1572 data bytes

1580 bytes from 192.168.12.11: icmp_seq=0 ttl=64 time=1.525 ms

1580 bytes from 192.168.12.11: icmp_seq=1 ttl=64 time=1.693 ms

--- 192.168.12.11 ping statistics ---

3 packets transmitted, 2 packets received, 33.3% packet loss

round-trip min/avg/max/stddev = 1.525/1.609/1.693/0.084 ms

Very strange. I don't know what else I can test. Tomorrow perhaps I'll change the IP Subnet and VLAN ID for the VTEPs of the Edge Node, although I think this is not the problem, because if it was, I couldn't have the tunnel against one host up. There should be no tunnels up if the problem was the VTEP VLAN ID.

Thanks.

Guido.

Reply
0 Kudos
macgaver2
Contributor
Contributor
Jump to solution

Not helping, but just saying I am really looking forward to idea on this issue. I am having the same problem

Its lab, It's for learning. To try to fix that issue I reinstalled everything. Fresh esxi, vcenter, nsxt, nsx-edge. All latest releases

It's configured both on same vlan-0, tried to have my vdswitch in trunk or single vlan without impact (it's all on the same host here anyway)

nsx-edge-1a> get bfd-sessions

BFD Session

Dest_port                     : 3784

Diag                          : No Diagnostic

Encap                         : geneve

Forwarding                    : last false (current false)

Interface                     : 3a89989f-22a8-5673-8d52-12a1e0a91925

Keep-down                     : false

Last_cp_diag                  : No Diagnostic

Last_cp_rmt_diag              : No Diagnostic

Last_cp_rmt_state             : down

Last_cp_state                 : down

Last_fwd_state                : NONE

Last_local_down_diag          : No Diagnostic

Last_remote_down_diag         : No Diagnostic

Local_address                 : 10.129.255.11

Local_discr                   : 2377231423

Min_rx_ttl                    : 255

Multiplier                    : 3

Received_remote_diag          : No Diagnostic

Received_remote_state         : down

Remote_address                : 10.129.255.10

Remote_admin_down             : false

Remote_diag                   : No Diagnostic

Remote_discr                  : 0

Remote_min_rx_interval        : 0

Remote_min_tx_interval        : 0

Remote_multiplier             : 0

Remote_state                  : down

Router_down                   : false

Rx_cfg_min                    : 1000

Rx_interval                   : 1000

Session_type                  : TUNNEL

State                         : down

Tx_cfg_min                    : 100

Tx_interval                   : 1000

nsxedge1(vrf)> get neighbor

Logical Router

UUID        : 736a80e3-23f6-5a2d-81d6-bbefb2786666

VRF         : 0

LR-ID       : 0

Name        :

Type        : TUNNEL

Neighbor

    Interface   : d843afab-ea93-540b-a8a4-766dc9c89e9f

    IP          : 10.129.255.10

    MAC         : 00:50:56:63:fb:f4

    State       : reach

    Timeout     : 208

   

nsxedge1(vrf)> ping 10.129.255.10 source 10.129.255.11 size 1572

PING 10.129.255.10 (10.129.255.10) from 10.129.255.11: 1572 data bytes

1580 bytes from 10.129.255.10: icmp_seq=0 ttl=64 time=0.594 ms

1580 bytes from 10.129.255.10: icmp_seq=1 ttl=64 time=0.440 ms

1580 bytes from 10.129.255.10: icmp_seq=2 ttl=64 time=0.596 ms

Reply
0 Kudos
macgaver2
Contributor
Contributor
Jump to solution

Ok I think someone documented our problem and solution here: https://www.spillthensxt.com/nsx-t-tep-ip-addressing/

Look like it is mandatory to have the traffic going out of the dswitch. I was hoping a 2-nodes cluster could have a edge builtin without involving external L3 routing ... I don't see how now.

Will try moving edge outside the cluster and report here soon

Reply
0 Kudos
macgaver2
Contributor
Contributor
Jump to solution

FYI - Moving the edge outside the cluster worked. I did a vmotion of my edge to a nearby cluster, connected it's second nic (the one for overlay) to a dedicated standard-switch using a NIC directly connected to that remote distributed switch.

I find this to be a big limitation. We want a full software stack, but for this we need to have traffic going out on physical NIC and then back in on that same NIC.

Hopefully someone can provide a better solution, but the previously shared url is very good at explaining the 3 workarounds

Reply
0 Kudos
HassanAlKak88
Expert
Expert
Jump to solution

Hello,

First of all it looks like a challenging issue,

As I understand and please correct me if I am wrong, you have 4*pNics on each server. you used them as follows:

2*pNics for vDS (For vSphere management, vMotion, Edge TEP, Edge Uplinks,  .....etc)

2*pNics for NVDS (For host TEP)

If that is the scenario, it is mandatory to use two different VLANs for TEP (One for Hosts and one for Edges). I know and I see that one server can communicate with edges within the same subnet.

But think about it from networking and tagging side, you have the same VLAN you need to distribute the traffic sometime from first two uplinks to communicate between hosts through the overlay tunnel and sometimes using the other two uplinks to communicate between hosts and edges.

Therefore, I recommend using two different VLANs for TEP and be careful of the Tagging especially when using Single NVDS deployment for edges (not three NVDS: TEP, Uplink1, and Uplink2).

For more information I am glad to be on service,


If my reply was helpful, I kindly ask you to like it and mark it as a solution

Regards,
Hassan Alkak
Reply
0 Kudos
WarlockArg
Enthusiast
Enthusiast
Jump to solution

I finally solved the problem. I had to open a support ticket with VMware.

The problem was that tunnels between the Host Transport Nodes and Edge Nodes were not up and running. However, in the NSX Manager GUI they appeared as up and in green color!!! There is a bug there that the GUI shows you tunnels UP and running where they are actually down and without connection between the VTEPs. (you can see in the image the the tunnel against the IP 192.168.12.17 is up where it is actually down).

Those tunnels were down because there was no layer-3 connection between the Host Transport Node VTEPs and the Edge Node VTEP. This infrastructure were running on a HP chassis C7000 with two Virtual Connect switches. There was a problem in the configuration of the core switch LAG ports. In the core, the ports that were connected to the Virtual Connect HP switches shouldn't have been configured as a LAG (port-channel) but as single ports.

We unconfigured the LAG and everything began to work.

Thanks for your help.

Guido.