Today, suddenly the Geneve tunnels between NSX-T Edges and ESXi have stopped working.
Changes have been made since it was working, but I have no idea what the error was. I tried to revert as much as I could remember, but it's still not working.
Here is a screenshot of the tunnels on one of the NSX-T Edges:
And the tunnels on the ESXi host (all are down to all NSX-T Edges):
I logged into the esxi host to check the vmknics and routes (showing both default routes and vxlan stack routes):
I also tried to ping the addresses from a system connected to the 192.168.20.0/24 network and I am able to ping the edges tunnel IPs (192.168.20.200 and 192.168.20.201), but I cannot ping the ESXi tunnel addresses (192.168.20.202 and 192.168.20.203)
I'm at a total loss.
All help would be appreciated, I really do not want to remove NSX-T and redeploy everything again as it was an adventure to deploy it the first time. (though if it's broken beyond repair then I have no choice)
You need to do a test ping with the proper packet size between the TEPs on your ESXi hosts.You'll probably find that isn't working properly, which means you either have a MTU issue or some other physical networking problem. Use the command vmkping -S vxlan <TEP> -d -s 1572 -c 10 to test.
I've found out the issue. The dvPortGroup that is used for NSX-T has had its VLAN set to 2, it used to be on no VLAN.
When I set it to VLAN 0, the tunnels work again. However, when I do that, I end up with traffic getting mixed up on two different port groups, both with no VLAN set.
Is there a way to make it so that it will work when the VLAN is set to 2? Or that I don't end up having traffic from one dvPortGroup on the other?
Edit: When I try to make an uplink profile to the NSX Edges with VLAN2 the tunnel between the Edges won't connect.
You have to be careful and it can be confusing to setup edges. If you set a VLAN on the uplink profile this means the edge VM will tag packets out the NVDS uplink. If this is connected to a dVPG that is a trunk it can be used, but if the dVPG is on a single VLAN you should not set a VLAN on the edges uplink. If you send us a drawing of you setup we can better help you out and understand how to achieve your ultimate goal in this design.
That is confusing indeed.
Simply put, there's a main LAN on VLAN 0 (192.168.254.0/24) and the separate LAN on VLAN 2 (192.168.20.0/24)
The NSX-T Manager, edge, etc is all on VLAN 2. There's only one ESXi host and it's connected to a switch that has the ESXi port tagged on VLAN2 and untagged on the main LAN.
There are two transport zones. An overlay one and a VLAN one. The ESXi host is only connected to the overlay one I just noticed. (in terms of in the NSX configuration in the NSX-T Manager)
Okay. Could you explain what I have to change so that it would work?
I've set the network for NSX-T on VLAN 20, but I have no idea what I need to change in NSX-T to make it so that it uses that VLAN.
I have gotten the tunnels between the ESXi server and Edge1 and Edge2 working again.
I set up the VLAN on the ESXi host's Overlay N-VDS switch. (I may have set the wrong VLAN initally here as well, as it was VLAN 2. Not 20.) and moved the Edges back to VLAN 0. Now it works again.
Now I need to find out how I can make it so that Edge3 creates its tunnels and communicates with the NSX Manager on the 192.168.20.0/24 network, while the uplink for the Tier 0 routers attached to Edge3 goes to a completely different port group in a different subnet. Is that possible? And if so, how?