- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Last week I wanted to migrate workloads from the N-VDS to the VDS7, following a vSphere 7 upgrade.
The ESXi hosts in my test environment have 4 NICs each, the initial configuraton was vmnic0 + vmnic1 on vds, vmnic2 + vmnic3 on n-vds.
The migration and new uplink assignment was done successfully, at least I did not find an error in the uplink profile configuration of the transport nodes. First the vmnic0 and vmnic1 were used in the profile, after the new vmkernels were online I have assigned the hosts vmnic2 and vmnic3 to the vds.
What immedietally became obvious was, that the overlay tunnel status switched to degraded. Falling back to the n-vds configuration remediated the issue. I have checked my configuration and looked for clues in the documentation, but could not find a solution.
Finally I came upon this blog post, where the summary explains the behaviour well:
"So when you're using and N-VDS or VDS for NSX-T and you're placing an Edge on the same switch you have to put the Edge overlay in a different subnet. The Geneve traffic that originates from the Edge is not allowed to pass a switch that's hosting a tunnel endpoint for ESXi (VMK10)."
Following this advice, I've created a new VDS with only one port group for the overlay traffic, and connected to vmnics of the host to its. After this, the nsx edges nic dedicated to te overlay tunnel connection was attached to this new port group, and the tunnel was re-established.
This poses a problem however, because in production I have hosts with only w nics. This means, that I would have to separate the vmnics somehow between two distributed switches, which woudl resultat in a non-redundand setup.
Is there a way to separate the overlay and endpoint traffic while still placing all vmkernels on the same VDS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is resolved in the nsx-t 3.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah, nice, I've just downloaded the upgrade bundle and will see if it works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you would like a bit more of an understanding as to the what and why this was an issue in the past, you may find this article useful as well.
NSX-T 3.1 Tunnel Endpoints | Inter TEP
VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist
https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've deployed a fresh 3.1 environment, configured my transport nodes, Tier-0, Tier-1 and attached a few segments, but as soon as I add a VM to a segment, the tunnel becomes degraded.
I want to run a "collapsed" *environment now, where the TEP vmkernels run on the vds too and getting this ocnfiguration to work was my goal with the 3.1 update.
I guess the error is somewhere in the transport node (TEP) configuration, which I have to figure out now...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It sounds like you want to configure inter TEP, the above article walks through that and tells you how to configure it and how to test to make sure your tunnels are working.
Have you been through it?
VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist
https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have followed another guide, yes, your link reutrns a 404
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
interesting, if you're still wanting some details around this, try this one it should work.
https://www.lab2prod.com.au/2020/11/nsx-t-inter-tep.html
VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist
https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do your BFD statuses report under the edge cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"0 - No Diagnostic"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you would like some more assistance.
Happy look at it over zoom, just let me know.
We can look at the steps you've tried and work forwards.
VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist
https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the offer, however this is not something that would be allowed in my environment:) I'll have to wait for GSS to finally respond to my SR.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you done all the vmkpings to check it all works?
VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist
https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, normally this would indicate that the tunnel source and destination could not complete at the N-VDS specified MTU. This is pretty common in new builds because the actual MTU check doesn't occur until the tunnel is needed.
It'd be really cool if we could have some kind of "post-implementation network test"!
Anyhow, for the tunnel that is failing you want to run a vmkping against it with the DNF bit set and MTU set to 1600+:
vmkping -d -I vmk10 -s 1600 <the other end>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That will work, so I generally use vmkping ++netstack=vxlan <dstIP> -s 8972 -d
VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist
https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep - the use of `vxlan` in this case feels...distasteful...so I don't use it ![]()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue is resolved for me - when placing the edges on NSX-prepared hosts, the geneve tunnel traffic has to be placed on a separate NSX-segment. So, if someone has the same issue, create a new VLAN NSX Segment, add it to a VLAN TZ, add the TZ to your edges. This way you will be able to select this segment for your tunnel traffic.
The NSX design reference guide has just recently been updated to match NSX 3.0 (no 3.1 still...), I hope the business unit steps up in this regard.
edit: I have no idea how to mark the post as "resolved" after the recent vmtn update, so I'll just mark my answer as the correct one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're hitting the same bug that the rest of us always do when getting started with this platform. Host and Edge TEPs cannot coexist on the same switch with the same physical network adapters.
VMware won't admit this is a bug, but they "fixed" it in 3.1. ![]()
One workaround we were using was to put the Edge TEP on a Standard vSwitch with separate physical NICs, and leave the Host TEPs on the VDS. This works fine, and you can then have a single Geneve VLAN, but traffic between Host and Edge TEP still needs to go through the top of rack switch, even if it's on the same physical host.
If you can't afford the extra physical NICs, then you had to use an external router to route between your Host TEP and Edge TEP networks, which was, of course, ridiculous. After all, we're building software defined networks here!
Again, however, VMware has "fixed" this in 3.1 and you should be able to put everything on a single network now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Technically not a bug, however as you mentioned the feature didn't exist in earlier versions.
The explanation can be seen in the link that i posted earlier.
VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist
https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content