I am managing a small NSX-T test environment, consisting of 3 ESXi hosts as transport nodes and an overlay transport zone, where a few overlay networks are configured.
The hosts have 4 x 1Gbps physical uplinks (no switches with 10gb available atm...), two connected to the vds, the rest to the n-vds.
The Uplink profile applied to the ESXi Hosts uses both n-vds uplinks in a LB configuration. It uses the global MTU configuration setting of MTU = 1600.
2 Edge Transport nodes are grouped in an Edge Cluster. On the edges, uplink-1 is connected to the transport vlan for the overlay networks, uplinks 2 and 3 are used for communication with the physical switches.
There is almost no load on the environment currently, be it compute, storage or network. The edge nodes are deployed in the Medium sizing configuration.
The problem I am facing is, that transfer rates between vms placed on the vds in the physical vlan network and VMs placed in the overlay network are extremely bad. I can only get something needing low thorughput, like SSH, to work, but during file transfers using SSH or HTTP I get as low as 30kB/s before loosing the connection. Using performance views from the vcenter I did not notice any packet drops neither on the hosts' vmnics nor the VMs' nic. Ping between components returns roundtrip times of <1 ms.
Transfer rates between VMs inside the overlay network are fine, even when they are placed on different ESXi hosts in the cluster. I've also tried different source systems from the physical VLAN to do the transfer tests.
All VMs placed in the overlay, regardles of the ESXi host, seem to be affected. Transfers between systems placed in VLANs on the vds are not negatively affected either.
All switchports connecting the transport nodes are configured consistenlty in regards of VLAN trunks, MTU size being 9216 bytes.
I've used https://spillthensxt.com/how-to-validate-mtu-in-an-nsx-t-environment/ as help and check for MTU consistency acrros all components and could not find an issue. I use 1600 on the vds and as a default value on the nsx-t uplink profiles used on the transport nodes as well as the edge nodes.
I'm kind of at a loss here what to troubleshoot next, and any tips are most welcome:)