VMware Networking Community
Enter123
Enthusiast
Enthusiast

NSX-T Version 3.2.2.1.0.21487560 - packet fragmentation

Hi everyone,

we have migrated VMs from one VCF stack where we had NSX-T Version 3.1.3.5.0.19068434 to another VCF stack with NSX-T version 3.2.2.1.0.21487560.

We noticed that we cannot open URLs of ILO management interfaces from VMs which have IPs in the new VCF stack. 

We get an error in browser: ERR_SSL_SERVER_CERT_BAD_FORMAT.

If we try to do the same from the same VM, but we change IP to be from traditional VLAN based network, no issue.

If I move VM back to the old VCF stack, continue using NSX-T segment in the old VCF stack, also no issue.

We did a bit of digging and did tests with PING command and noticed that packets get fragmented. But not always. 

For example:

• I tried to ping with packet size 1472 – fragmented
• Tried to reduce packet size to 1400 – worked
• Increased packet size to 1450 – worked
• Increased to 1470 – fragmented
• Decreased to 1468 – fragmented
• Decreased to 1462 – worked
• Increased to 1472 – failed 2x and 3rd time it started working.

This is happening sometimes when we try ping VMs default gateway using packet size 1472! But then it starts working and on the next hop packets are fragmented and after that it starts working and on the 3rd hop it gets fragmented etc.

Of course we opened a ticket with VMware and they didn't find anything in the logs. Network team says physical network devices look fine, all good.

The moment VM is not using NSX-T segment, even if it stays in the same cluster, running on the same host, we don't see the problem.

Has anyone seen or heard about similar issue?

Any ideas how to try to troubleshoot this?

Thanks for any suggestions or ideas what to try to identify the root cause of this.

Reply
0 Kudos
3 Replies
sguadamu1
Enthusiast
Enthusiast

Hello Enter123.

I will ask the following.

Are you using DFW in your infrastructure? 

If you make a new vm, does it fail too?

Are the issues happening in north-south traffic and east-west traffic?

As i understand, if you connect your VM to an overlay segment it fails, but when you connect it it works? 

I wil test the following:

- Put two vms in the same host using the same segment and ping.

- vmotion one vm to a different host and test the ping.

- ping a different vm or reuse one of the previous vms and put it on a different overlay nsx segment. 

Checking this we can determine if you have issues with your T1.

Best Regards.

Reply
0 Kudos
Enter123
Enthusiast
Enthusiast

After a lot of packet capture sessions here and there we could only see that the certificate information comes to destination in "a broken state", but for VMs in VLAN based networks packet retransmission happens so everything "looks" fine. For VMs in NSX-T segments packet retransmission is not happening at all. 

I hope we find out soon what is all this about.... 

Reply
0 Kudos
Enter123
Enthusiast
Enthusiast

It looks like the issue happens on NSX-T segments where ESXi hosts have checksum offloading feature enabled on the NICs.

Tags (1)
Reply
0 Kudos