Someone experiencing this known issue 2587257?
I think I might be suffering from this in a fresh installation... but no workaround is provided by VMware right now.
The release notes just say "in some cases"... which cases? where is the detailed information?
Any information appreciated.
TCP-based communication between a VM in NSX-T overlay and an external computer in the physical network results in the following, in both directions:
- SYN packet received.
- No more packets exchanged... the application fails to connect (tested RDP, CIFS/SMB, HTTP, HTTPS).
- Only Ping is successful.
We experimented with several connectivity options for the NSX Edge Node: single N-VDS, multiple N-VDS, same port-groups for overlay/vlan, distinct port-groups... even deleted and re-deployed Edge node.
We can sucessfully bring up IPSec VPN tunnel (UDP right???) between NSX-T and a physical Fortigate FW and even against external virtual Sophos FW, but the traffic going through the tunnel behaves exactly the same. Only ping succeeds.
We disabled DFW just in case (although it is allowing Any/Any by default) and reviewed every possible firewall function in the path... every policy is allowing all or FW disabled.
Also tried creating explicit policies allowing desired traffic... same result.
License is NSX-T Data Center Advanced, so there is no IDS/IPS.
To rule out external FW issues, we tested bringing up an IPSec tunnel between a virtual Sophos FW appliance inside the cluster and the external Fortigate without changing configuration, same subnets, same tunnel settings, same computer.
Everything works fine there.
We are opening a support case with HPE VMware team... all VMware licenses were bought through HPE.
The issue is related to Checksum Offload.
1. We discovered, through Wireshark capture by port-mirroring in the physical switches, that the checksum for TCP and UDP packets coming out of NSX-T to the physical network is incorrect.
The switches are delivering the frames to the router, but then in the destination the packets are being discarded because of bad checksum in the Transport Layer header.
ICMP works because Network Layer checksum (IPv4 checksum) is calculated correctly.
2. To confirm the issue, we disabled TSO and CSO for the two external pNICs in one of the ESXi hosts, rebooted the host, and then in the test Virtual Machine we disabled all Offload functions for the VMXNET3 ethernet card in Windows.
After doing this, all traffic works OK !!!
However, this is a workaround... but happy to find the issue.
@montybeatoSo you had to disable both?
a) TSO and CSO for all physical NICs on the ESXi hosts
b) disable all Offload functions for the VMXNET3 ethernet card in Windows.
Would a) be enough for the workaround?
I just did a fresh NSXT 3.1 install and we are having the same issue ! icmp worked fine but all TCP/UDP connections were failing.
We disabled all offloading options in the NIC inside the VM (without changing anything on esxi host) and everything is working now...
What I understand is that disabling offload causes the VMs to use more CPU in order to calculate checksum for every TCP and UDP packet... and that is why offload is enabled by default.
We are looking at the firmware version of the NICs as well as driver/firmware combinations... the minimum supported versions in the VMware Compatibility List can contain known issues/bugs and there are more recent versions.
Please take a look at https://kb.vmware.com/s/article/2030818 and look for your NIC's manufacturer.
I have just updated our ESXi host with the latest mellanox firmwareand it made no difference. I am not certain if it is firmware related.
Did you have an update from vmware support? I also opened a support ticket.
have you configured your NSX-T esxi host mode deployment with a standard switch or enhanced Datapath?
Instructions to configure checksum offload and the Load Balancer are given. In almost all cases, traffic is not forwarded with checksum offload enabled. When TCP/IP receives a packet with an invalid checksum, it discards it. allows promiscuous IP tracing and capture the packets with bad checksums.
@dlapointe Unfortunately we opened the ticket to the NSX-T team... and they do not acknowledge any issue with NSX-T until you prove that everything else in the hosts and ESXi is OK. So we closed that ticket.
We've upgraded driver and firmware, but the issue persists.
Our NICs are all copper, Intel and Broadcom... it doesn't make a difference.
We have vSphere distributed switch (vDS). No Enhanced Data Path.
I heard the issue is resolved in NSXT 3.1.2 (what I got told on my support ticket I have with VMWare).
I did not test it yet, but if you did not upgrade, it might be something to test also on your end