VMware Networking Community
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

NSX performance issues

Hi all, I have an NSX home lab running. Here a basic overview of the setup:

My PC is on the 192.168.1.x/24 network. UniFi USG as the gateway

ESXi Hosts, vCSA, NSX Manager, NSX Control cluster are on the 10.0.0.0/24 network, tagged VLAN 2, on a 10Gbit switch. Also on this network (and switch) is a QNAP NAS. Again, using the UniFi USG as the gateway.

I don't think it matters but I am running vSAN and they are using a directly connected network for vSAN and vMotion. Witness traffic is tagged on the 10.0.0.0/24 VMk where the witness appliance resides.

I have two logical networks, 5001 and 5002, 172.16.0.0/24 and 172.16.10.0/24 respectively.

I have one Edge gateway. This has an interface for 5001 and 5002. It also has an interface on the VLAN 2 port group for external traffic.

The VMs on the 5001 and 5002 networks use the edge as their gateway. The edge uses the UniFi USG as it's gateway.

I then have a static route on the UniFi USG which directs 5001 and 5002 traffic to the interface on the VLAN 2 port group of the Edge.

Not the most complex of setups I don't think. I wasn't sure if I needed an Edge for each logical network but it's working fine with just the single one.

Running iPerf tests from host to host I get the expected 10Gbps speed.

Running iPerf tests from the host to the QNAP NAS I get 10Gbps.

Running it from my PC to a VM on a logical network I get 1Gbps (PC is only 1Gbit, as is the UniFi USG).

The issue that I am having is RDP performance from my PC to a VM is poor, it's like it's on 10 frames a second. It does this if the VM is in the VLAN 2 port group of if it's connected to either logical switch.

I'm guessing here that it's the UniFi USG causing the issues? I do have a pfSense appliance I could try I guess.

The second issue I am having is if I do an iPerf test between VMs, either on the same logical network or seperate networks, traffic appears limited, peaking around 5Gbps but around 2-3 average.

This leads me to believe that the issue is in the edge configuration somehow, or is this normal behaviour? I'd have thought I would see the full 10Gbps.

Thanks!

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
Reply
0 Kudos
1 Solution

Accepted Solutions
mdac
Enthusiast
Enthusiast
Jump to solution

There is a lot of overhead associated with VXLAN, unfortunately. Some pNICs have VXLAN offloading capability, which can help, but using large frames (8900 MTU) can make a big difference by reducing packet rate. It's not necessarily the amount of traffic that's the issue, it's the processing of large numbers of frames for encapsulation/de-encapsulation at 1500 MTU. You can also test with the DFW disabled in the cluster to see if that improves performance as well. With an 8900 MTU, you won't have any difficulty hitting line rate.

This post may help:

https://vswitchzero.com/2018/08/02/jumbo-frames-and-vxlan-performance/

My blog: https://vswitchzero.com Follow me on Twitter: @vswitchzero

View solution in original post

14 Replies
vLingle
VMware Employee
VMware Employee
Jump to solution

ChrisFD2​,

Thanks for the detailed breakdown of the environment.  A couple things stand out…

-It appears you are attaching LS’s directly to the ESG, a more common approach would be to deploy a DLR and attach the LS’s there so routing can be distributed in the hypervisors. 

          Add a Distributed Logical Router

          Pg 69, 4.3.5 Enterprise Routing Topology - https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/nsx/vmw-nsx-network-virtu...

-Consider the ESG appliance size in use and the resources of the Host it is running on.

          Add an Edge Services Gateway

-Flows to and from the PC will be handcuffed due to the PC’s 1 Gbps connection.

-Please review the VMWorld Performance session for more details related to recommended configuration (I.e. Jumbo, RSS, etc) - https://www.youtube.com/watch?v=_icR_L5PcYs&t=0s&index=15&list=PLBMoYohMQ37d2TcBGMoO49K5GdnMNHumT

Please KUDO helpful posts and mark the thread as solved if answered.

Regards,
Jeffrey Lingle
Reply
0 Kudos
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

Thank you vLingle​.

I have now shut down the edge and replaced it with a DLR. This is how I had it configured at first but I couldn't get it to work for an unknown reason.

It is now routing properly, but iperf testing between VMs seems to be about 5Gbit/s, so an improvement but not near the 10 which I would expect.

I'm going to build a couple of Linux VMs and test on there to see that it isn't the Microsoft effect.

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
Reply
0 Kudos
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

Okay some testing now in ubuntu.

Two tests, top one is when the VMs are on the same LS but different hosts. Second test is when it's on the same host.

chris@ubnt2:~$ iperf3 -c 172.16.10.60

Connecting to host 172.16.10.60, port 5201

[  4] local 172.16.10.61 port 39676 connected to 172.16.10.60 port 5201

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[  4]   0.00-1.00   sec   575 MBytes  4.82 Gbits/sec  640   1.22 MBytes

[  4]   1.00-2.00   sec   595 MBytes  4.99 Gbits/sec    2   1.07 MBytes

[  4]   2.00-3.00   sec   605 MBytes  5.07 Gbits/sec    0   1.43 MBytes

[  4]   3.00-4.00   sec   605 MBytes  5.07 Gbits/sec    4   1.31 MBytes

[  4]   4.00-5.00   sec   592 MBytes  4.96 Gbits/sec    4   1.18 MBytes

[  4]   5.00-6.00   sec   592 MBytes  4.97 Gbits/sec    0   1.51 MBytes

[  4]   6.00-7.00   sec   594 MBytes  4.99 Gbits/sec    6   1.38 MBytes

[  4]   7.00-8.00   sec   590 MBytes  4.95 Gbits/sec    2   1.25 MBytes

[  4]   8.00-9.00   sec   591 MBytes  4.96 Gbits/sec    1   1.10 MBytes

[  4]   9.00-10.00  sec   601 MBytes  5.04 Gbits/sec    0   1.45 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Retr

[  4]   0.00-10.00  sec  5.80 GBytes  4.98 Gbits/sec  659             sender

[  4]   0.00-10.00  sec  5.80 GBytes  4.98 Gbits/sec                  receiver

iperf Done.

chris@ubnt2:~$ iperf3 -c 172.16.10.60

Connecting to host 172.16.10.60, port 5201

[  4] local 172.16.10.61 port 39680 connected to 172.16.10.60 port 5201

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[  4]   0.00-1.00   sec  2.07 GBytes  17.7 Gbits/sec  524   1.11 MBytes

[  4]   1.00-2.00   sec  1.85 GBytes  15.9 Gbits/sec    0   1.12 MBytes

[  4]   2.00-3.00   sec  1.58 GBytes  13.6 Gbits/sec    0   1.12 MBytes

[  4]   3.00-4.00   sec  1.85 GBytes  15.9 Gbits/sec    0   1.12 MBytes

[  4]   4.00-5.00   sec  1.71 GBytes  14.7 Gbits/sec    0   1.12 MBytes

[  4]   5.00-6.00   sec  1.66 GBytes  14.2 Gbits/sec    0   1.12 MBytes

[  4]   6.00-7.00   sec  1.85 GBytes  15.9 Gbits/sec    0   1.12 MBytes

[  4]   7.00-8.00   sec  1.94 GBytes  16.6 Gbits/sec    0   1.12 MBytes

[  4]   8.00-9.00   sec  1.87 GBytes  16.1 Gbits/sec    0   1.12 MBytes

[  4]   9.00-10.00  sec  1.82 GBytes  15.6 Gbits/sec    0   1.12 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Retr

[  4]   0.00-10.00  sec  18.2 GBytes  15.6 Gbits/sec  524             sender

[  4]   0.00-10.00  sec  18.2 GBytes  15.6 Gbits/sec                  receiver

Windows does not behave like this, it seems to max out around 5Gbps regardless of which hosts the VMs reside on. But Windows to Linux gets to 7-8 Gbps. So it looks like it's a Windows 'feature' or the window sizing or similar with Windows default iperf settings.

I would still like to get Linux traffic between hosts above the 5Gbps. Where can I look at doing more tests, on the hosts maybe?

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

I'm unsure if you have tested standard VLAN perf test prior to VXLAN bandwidth check ? If not, I would recommend perform a quick test on Non VXLAN port groups by placing VM's on same and different host and ensure that you are getting line rate bandwidth. For optimal performance, starting from VM sizing, virtual machine network adapters ,drivers all the way till server&switch matters a lot(What type of drivers , if drivers are updated one or not) in addition to other performance tuning tips like RSS/VXLANOffload/TSO/LRO (TSO and Receive checksum are extremely important for line rate performance)

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

Sreec​ yes I have tested performance VLAN to VLAN on a regular port group hanging of both a DV and S switch, I get the full 10Gbps speed. I get the 5 in both Windows and Linux so it's definitely an NSX related issue.

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

Sorry for the late reply. I hope the test what you did is a simple L2 test with no routing involved ?  Connecting the VM's to same logical switch and checking the matrix by placing them on same host and different host ? Among the performance features(RSS,LSO,etc ..) which were shared in the earlier thread , may i know what features you are leveraging ?

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

Hi Sreec​ - as per my post above, two VMs on a distributed switch on different hosts but same subnet get 10Gb/s.

If I do the same test with the same VMs on the same logical switch but still two different hosts it drops by 50%.

Surely that is pointing to an issue with NSX if I am not mistaken?

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

Yes, i agree with your point. To be honest i have not seen anyone  getting line rate throughput without following the best practices. I will also change the MTU to maximum supported value (9000)  instead of minimum recommended value ( also in the guest -8900)and test the performance- you should see a difference .

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
spirest
VMware Employee
VMware Employee
Jump to solution

I agree too - Seems to be VXLAN related. Since a vxlan is really just a dvportgroup that encapsulates packets with the VXLAN header, it's possible you're losing performance in the encapsulation overhead.

-Can you verify that your VXLAN traffic has the same flow on the physical as your VLAN traffic?
-I was going to recommend checking your VTEP load balancing settings, but that SHOULDN'T matter if you're measuring flows between a single source>dest. Still important for overall NSX performance though. (DVS NIC teaming load balancing options and NSX interoperability - Iwan’s wiki )

-Are your VXLANs on the same VDS as your VLAN-backed portgroups?

If/when you increase MTU to 8900+, could you post an update with the difference in ipperf? I'm interested in the results. There are a lot of articles that claim the increased MTU can sometimes double performance since the host and nics are burning fewer cycles on encapsulation.

VMware doesn't publish max VXLAN throughput for a single flow (probably because there are so many variables that can impact throughput). This article (https://vswitchzero.com/2018/08/02/jumbo-frames-and-vxlan-performance/​) shows someone increasing MTU and more than doubling their VXLAN performance. They also claim that VXLAN performance is generally seen at around 4-7Gbps when using 1500-1600 MTU.

The articles above aren't "official VMware", but they seem like they could help.

Reply
0 Kudos
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

My homelab has been off for a few days now, hence not updating this post. I should have it all running again by the weekend so hope to do some testing so that I can share the results.

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
Reply
0 Kudos
mdac
Enthusiast
Enthusiast
Jump to solution

There is a lot of overhead associated with VXLAN, unfortunately. Some pNICs have VXLAN offloading capability, which can help, but using large frames (8900 MTU) can make a big difference by reducing packet rate. It's not necessarily the amount of traffic that's the issue, it's the processing of large numbers of frames for encapsulation/de-encapsulation at 1500 MTU. You can also test with the DFW disabled in the cluster to see if that improves performance as well. With an 8900 MTU, you won't have any difficulty hitting line rate.

This post may help:

https://vswitchzero.com/2018/08/02/jumbo-frames-and-vxlan-performance/

My blog: https://vswitchzero.com Follow me on Twitter: @vswitchzero
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

It's been a while since I looked into this and I found some time this morning. I thought I'd update to say it was the MTU on the VMs - once changed to 8900 I got the near line throughput.

I wrote a blog on it here:

https://cadooks.com/testing-nsx-network-throughput/

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
hicall
Contributor
Contributor
Jump to solution

Hi Chris,

Your website could not access from my side, is it still available to access?

Thanks,

Haikal S.

Reply
0 Kudos
ChrisFD2
VMware Employee
VMware Employee
Jump to solution

Hi, apologies but I moved the domain and apparently the redirect is broken.

 

https://chrisdooks.com/2019/03/09/testing-nsx-network-throughput/

Regards,
Chris
VCIX-DCV 2023 | VCIX-NV 2023 | vExpert *** | CCNA R&S
Reply
0 Kudos