The problem is that NSX-T TN offloads IP checksum calculations by default to HW (UCS VIC - M81KR CNA firmware). Unfortunately, CNA from some reason can't calculate correct outer IP checksum for Geneve encapsulated packets. So incoming Geneve packets from TZ A to TZ B are received on the uplink interface of TZ B but with back IP checksum (outer, inner Geneve IP checksum is OK), therefore they are discarded by the system.
One can verify this by capturing incoming packets on TN via nsxcli: start capture interface _uplink1_ direction input file xyz.pcap. Upon transferring the xyz.pcap file from /tmp/ (via winscp or other utility) and loading the xyz.pcap to Wireshark, outer geneve packet IP checksums will be incorrect (turn on Protocol prefs: Validate the IPv4 checksums...).
There is almost none to zero chance that Cisco will fix that for old M81KR CNA, therefore this must be tweaked on ESXi side...
Workaround: turn off IP checksum HW offloading for all NSX-T vmnics on all TNs using Cisco VICs (in this case vmnicX-Y):
esxcli network nic software set --ipv4cso=1 -n vmnicX
esxcli network nic software set --ipv4cso=1 -n vmnicY
Parameter --ipv4cso=1 means IP checksum is done in SW, --ipv4cso=0 that IP checksum is HW offloaded.
Settings are reboot persistent.
To verify that IP checksum calculations are done in SW (vmkernel) run:
esxcli network nic software list
IPv4 CSO = on means IP checksum in SW.
Upon activating IP checksum in SW for NSX-T vmnics Geneve uplinks should go UP instantly (to verify run "nsxdp-cli bfd sessions list").
PS: It seems if you are testing Nested ESXi deployement which uses vmxnet3 with enabled DirectPath I/O same workaround must be applied to virtual vmxnet3 vmnics if they are bound with Cisco VICs (vmxnet3 offloads IP checksum calculations to VIC?).
Regarding performance concerns with SW IP checksum calculation: VM to VM throughput is similar (VMs residing on different B200 M1 blades):
- 9.67 Gbits/sec with DSwitch vs. 9.13 Gbits/sec with NSX-T SDN.
- NSX-T DR L3 routing: 8.04 Gbits/sec.
With this workaround were have successfully tested both NSX-T 2.5 and 3.0 using:
- Cisco B200 M1 blades with M81KR CNA/VIC in 5108 blade chassis
- FI 6100 with UCSM 2.2(8i)
- ESXi 6.5u3
(edge nodes must be on different cluster - newer servers due to AS-NI CPU requirement)
IMHO newer VIC cards like VIC1200 / VIC1300 have/had similar problems with Geneve packets, because previously we were unable to run NSX-T 2.4 on C240-M4 using VIC1300 (geneve tunnels down).
Lastly, I can confirm that NIC HW offloading of Geneve encapsulation is not a requirement for NSX-T 3.0.