encorele
Contributor
Contributor

Geneve tunnel down between Edge and Compute N-VDS

Hi,

My setup is a shared cluster between Edge and Compute hosts, running NSX-T 2.4.  The Geneve tunnel between the Edge and Compute nodes is reporting down. I can ping the T0"s uplink but cannot ping the prefixes in the T1.  In T0, I can see all the prefixes advertised by T1.  For the Edge Transport logical switch, I've tried using both vlan tagging the same and different from the Compute TEP.  What could be the issue here? 

15 Replies
RaymundoEC
VMware Employee
VMware Employee

EDGE v should be in a different VLAN than TEP from the Edge Transport Node, in this case, assuming you are using Logical Segment to connect the TEP of the EDGE vm.

+vRay
0 Kudos
encorele
Contributor
Contributor

Edge VM's TEP is on different Vlan than the Compute Transport Node's TEP.  Funny thing is from the upstream switch, I can ping the Edge VM's TEP but not the Compute Transport Node's TEPs.  The Compute Transport Node's TEPs can see each other and VM's in the N-VDS Overlay can communicate with each other. 

0 Kudos
encorele
Contributor
Contributor

I think the issue is where I cannot ping the Compute Transport Node's VTEPs from the upstream switches.  So the 2 VTEP cannot route to each other.  When I configure the VTEP pool for Compute Transport Node, I configure the default-gateway but don't remember anywhere I could configure any Vlan tagging.  So when VTEP traffic goes upstream, how does the switch know which Vlan to put it in? 

0 Kudos
RaymundoEC
VMware Employee
VMware Employee

so, for example, TEP of Transport Node/ Edge Transport Node or ESXi with vibs for NSX will be in VLAN 200 EDGE VM eth1 or second interface  for TEP proposes on LS backend on VLAN TZ with the tag of VLAN 400  the reason is that ESXi host will drop Geneve if this traffic is not passing thru the vmnic of the host. please check this link to document starting in page 32 VMware® NSX-T Reference Design

+vRay
0 Kudos
encorele
Contributor
Contributor

In my environment, the Transport Node/Edge Transport node's TEP is in Vlan 100.  The Edge VM's eth1 is attached to a LS in Vlan TZ with tag of Vlan 11.  So they are on different Vlans.  Both of these Vlans' default-gateway are on the upstream Core switch.  From the Core switch, I am able to ping the Edge VM's TEP IP address but not the Transport Node/Edge Transport node's TEPs.  So that mean the Edge VM's TEP cannot route to the Transport Node's TEP thus they cannot establish the tunnel.  So my concern is why I can't ping the Transport Node's TEP from the Core?  Can you ping them in your environment?  I figure the Transport Node's TEPs must be tagged with a Vlan tag (Vlan 100) but that process happen automatically so I don't know how to check whether or not it's there. 

0 Kudos
daphnissov
Immortal
Immortal

It sounds like you have a routing issue. Just because both SVIs or gateways live on the same core switch doesn't by default mean you can route across them. You need to explicitly configure the routing decision across those networks.

0 Kudos
encorele
Contributor
Contributor

I cannot even ping the VTEP while on the Core switch where the default-gateway located which is in the same Vlan.  The arp table shows INCOMPLETED for the IP's.  So I am thinking it's an arp issue.  Any glue on how to troubleshoot ARP?

0 Kudos
daphnissov
Immortal
Immortal

Check your MTU size first as it may be that simple. 1600 is minimum and it must be across the board.

0 Kudos
encorele
Contributor
Contributor

I use the default uplink profile with the Global MTU which I believe 1600.  What about the upstream switches, do they have to support jumbo frame on the ports connecting to the host?

0 Kudos
encorele
Contributor
Contributor

Do you suppose to be able to ping the TEP IP from your upstream switch? 

0 Kudos
daphnissov
Immortal
Immortal

There's no "supposed to" or not. Depends on your networking infra. An uplink profile in NSX-T defines the MTU, but you have to have at least that MTU or greater available in your physical network. Don't guess. Go and look at the switch configuration and ensure you have at least 1600 configured. 9000+ is better.

0 Kudos
encorele
Contributor
Contributor

I have checked MTU setting and everything is at least 1600. 

0 Kudos
encorele
Contributor
Contributor

I got the issue resolve.  I configured the upstream switch ports connecting to the ESxi hosts as trunk.  So I configured them as access port with the transport vlan, it works. 

0 Kudos
RagsRachamadugu
Contributor
Contributor

Good to see the issue getting resolved! Trying to understand the root cause better, was this a case of VLAN mismatch between NSX-T (transport VLAN value in uplink profile) and the VLAN/trunk configuration on the ToR port facing the transport node? NSX-T does support specifying a non-zero VLAN in which case, packets leaving the transport node will be tagged with that VLAN and this does NOT require the ToR port to have access/native VLAN..

Thanks Rags

0 Kudos
dougsco01
Contributor
Contributor

>Compute Transport Node, I configure the default-gateway but don't remember anywhere I could configure any Vlan tagging

I was able to use the uplink profile to apply a vlan tag. The default uplink profiles do not provide a vlan tag. I copied the nsx-default-uplink-hostswitch-profile, included a tag for my tep vlan, then applied the new custom uplink-profile in my Transport Node Profile.

I was experiencing a similar issue with a failed Geneve tunnel. My edges established a tunnel just fine (edge to edge) so i was confident mtu, vlan tag and trunk was correct. My transport nodes would not establish to each other - nor the edges until i Tagged the vlan via the uplink profile.

0 Kudos