My setup is a shared cluster between Edge and Compute hosts, running NSX-T 2.4. The Geneve tunnel between the Edge and Compute nodes is reporting down. I can ping the T0"s uplink but cannot ping the prefixes in the T1. In T0, I can see all the prefixes advertised by T1. For the Edge Transport logical switch, I've tried using both vlan tagging the same and different from the Compute TEP. What could be the issue here?
Edge VM's TEP is on different Vlan than the Compute Transport Node's TEP. Funny thing is from the upstream switch, I can ping the Edge VM's TEP but not the Compute Transport Node's TEPs. The Compute Transport Node's TEPs can see each other and VM's in the N-VDS Overlay can communicate with each other.
I think the issue is where I cannot ping the Compute Transport Node's VTEPs from the upstream switches. So the 2 VTEP cannot route to each other. When I configure the VTEP pool for Compute Transport Node, I configure the default-gateway but don't remember anywhere I could configure any Vlan tagging. So when VTEP traffic goes upstream, how does the switch know which Vlan to put it in?
so, for example, TEP of Transport Node/ Edge Transport Node or ESXi with vibs for NSX will be in VLAN 200 EDGE VM eth1 or second interface for TEP proposes on LS backend on VLAN TZ with the tag of VLAN 400 the reason is that ESXi host will drop Geneve if this traffic is not passing thru the vmnic of the host. please check this link to document starting in page 32 VMware® NSX-T Reference Design
In my environment, the Transport Node/Edge Transport node's TEP is in Vlan 100. The Edge VM's eth1 is attached to a LS in Vlan TZ with tag of Vlan 11. So they are on different Vlans. Both of these Vlans' default-gateway are on the upstream Core switch. From the Core switch, I am able to ping the Edge VM's TEP IP address but not the Transport Node/Edge Transport node's TEPs. So that mean the Edge VM's TEP cannot route to the Transport Node's TEP thus they cannot establish the tunnel. So my concern is why I can't ping the Transport Node's TEP from the Core? Can you ping them in your environment? I figure the Transport Node's TEPs must be tagged with a Vlan tag (Vlan 100) but that process happen automatically so I don't know how to check whether or not it's there.
It sounds like you have a routing issue. Just because both SVIs or gateways live on the same core switch doesn't by default mean you can route across them. You need to explicitly configure the routing decision across those networks.
I cannot even ping the VTEP while on the Core switch where the default-gateway located which is in the same Vlan. The arp table shows INCOMPLETED for the IP's. So I am thinking it's an arp issue. Any glue on how to troubleshoot ARP?
I use the default uplink profile with the Global MTU which I believe 1600. What about the upstream switches, do they have to support jumbo frame on the ports connecting to the host?
There's no "supposed to" or not. Depends on your networking infra. An uplink profile in NSX-T defines the MTU, but you have to have at least that MTU or greater available in your physical network. Don't guess. Go and look at the switch configuration and ensure you have at least 1600 configured. 9000+ is better.
Good to see the issue getting resolved! Trying to understand the root cause better, was this a case of VLAN mismatch between NSX-T (transport VLAN value in uplink profile) and the VLAN/trunk configuration on the ToR port facing the transport node? NSX-T does support specifying a non-zero VLAN in which case, packets leaving the transport node will be tagged with that VLAN and this does NOT require the ToR port to have access/native VLAN..
>Compute Transport Node, I configure the default-gateway but don't remember anywhere I could configure any Vlan tagging
I was able to use the uplink profile to apply a vlan tag. The default uplink profiles do not provide a vlan tag. I copied the nsx-default-uplink-hostswitch-profile, included a tag for my tep vlan, then applied the new custom uplink-profile in my Transport Node Profile.
I was experiencing a similar issue with a failed Geneve tunnel. My edges established a tunnel just fine (edge to edge) so i was confident mtu, vlan tag and trunk was correct. My transport nodes would not establish to each other - nor the edges until i Tagged the vlan via the uplink profile.