Last week I wanted to migrate workloads from the N-VDS to the VDS7, following a vSphere 7 upgrade.
The ESXi hosts in my test environment have 4 NICs each, the initial configuraton was vmnic0 + vmnic1 on vds, vmnic2 + vmnic3 on n-vds.
The migration and new uplink assignment was done successfully, at least I did not find an error in the uplink profile configuration of the transport nodes. First the vmnic0 and vmnic1 were used in the profile, after the new vmkernels were online I have assigned the hosts vmnic2 and vmnic3 to the vds.
What immedietally became obvious was, that the overlay tunnel status switched to degraded. Falling back to the n-vds configuration remediated the issue. I have checked my configuration and looked for clues in the documentation, but could not find a solution.
Finally I came upon this blog post, where the summary explains the behaviour well:
"So when you're using and N-VDS or VDS for NSX-T and you're placing an Edge on the same switch you have to put the Edge overlay in a different subnet. The Geneve traffic that originates from the Edge is not allowed to pass a switch that's hosting a tunnel endpoint for ESXi (VMK10)."
Following this advice, I've created a new VDS with only one port group for the overlay traffic, and connected to vmnics of the host to its. After this, the nsx edges nic dedicated to te overlay tunnel connection was attached to this new port group, and the tunnel was re-established.
This poses a problem however, because in production I have hosts with only w nics. This means, that I would have to separate the vmnics somehow between two distributed switches, which woudl resultat in a non-redundand setup.
Is there a way to separate the overlay and endpoint traffic while still placing all vmkernels on the same VDS?
If you would like a bit more of an understanding as to the what and why this was an issue in the past, you may find this article useful as well.