Hi all, I have an NSX home lab running. Here a basic overview of the setup:
My PC is on the 192.168.1.x/24 network. UniFi USG as the gateway
ESXi Hosts, vCSA, NSX Manager, NSX Control cluster are on the 10.0.0.0/24 network, tagged VLAN 2, on a 10Gbit switch. Also on this network (and switch) is a QNAP NAS. Again, using the UniFi USG as the gateway.
I don't think it matters but I am running vSAN and they are using a directly connected network for vSAN and vMotion. Witness traffic is tagged on the 10.0.0.0/24 VMk where the witness appliance resides.
I have two logical networks, 5001 and 5002, 172.16.0.0/24 and 172.16.10.0/24 respectively.
I have one Edge gateway. This has an interface for 5001 and 5002. It also has an interface on the VLAN 2 port group for external traffic.
The VMs on the 5001 and 5002 networks use the edge as their gateway. The edge uses the UniFi USG as it's gateway.
I then have a static route on the UniFi USG which directs 5001 and 5002 traffic to the interface on the VLAN 2 port group of the Edge.
Not the most complex of setups I don't think. I wasn't sure if I needed an Edge for each logical network but it's working fine with just the single one.
Running iPerf tests from host to host I get the expected 10Gbps speed.
Running iPerf tests from the host to the QNAP NAS I get 10Gbps.
Running it from my PC to a VM on a logical network I get 1Gbps (PC is only 1Gbit, as is the UniFi USG).
The issue that I am having is RDP performance from my PC to a VM is poor, it's like it's on 10 frames a second. It does this if the VM is in the VLAN 2 port group of if it's connected to either logical switch.
I'm guessing here that it's the UniFi USG causing the issues? I do have a pfSense appliance I could try I guess.
The second issue I am having is if I do an iPerf test between VMs, either on the same logical network or seperate networks, traffic appears limited, peaking around 5Gbps but around 2-3 average.
This leads me to believe that the issue is in the edge configuration somehow, or is this normal behaviour? I'd have thought I would see the full 10Gbps.
Thanks!
There is a lot of overhead associated with VXLAN, unfortunately. Some pNICs have VXLAN offloading capability, which can help, but using large frames (8900 MTU) can make a big difference by reducing packet rate. It's not necessarily the amount of traffic that's the issue, it's the processing of large numbers of frames for encapsulation/de-encapsulation at 1500 MTU. You can also test with the DFW disabled in the cluster to see if that improves performance as well. With an 8900 MTU, you won't have any difficulty hitting line rate.
This post may help:
https://vswitchzero.com/2018/08/02/jumbo-frames-and-vxlan-performance/
ChrisFD2,
Thanks for the detailed breakdown of the environment. A couple things stand out…
-It appears you are attaching LS’s directly to the ESG, a more common approach would be to deploy a DLR and attach the LS’s there so routing can be distributed in the hypervisors.
Add a Distributed Logical Router
Pg 69, 4.3.5 Enterprise Routing Topology - https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/nsx/vmw-nsx-network-virtu...
-Consider the ESG appliance size in use and the resources of the Host it is running on.
-Flows to and from the PC will be handcuffed due to the PC’s 1 Gbps connection.
-Please review the VMWorld Performance session for more details related to recommended configuration (I.e. Jumbo, RSS, etc) - https://www.youtube.com/watch?v=_icR_L5PcYs&t=0s&index=15&list=PLBMoYohMQ37d2TcBGMoO49K5GdnMNHumT
Thank you vLingle.
I have now shut down the edge and replaced it with a DLR. This is how I had it configured at first but I couldn't get it to work for an unknown reason.
It is now routing properly, but iperf testing between VMs seems to be about 5Gbit/s, so an improvement but not near the 10 which I would expect.
I'm going to build a couple of Linux VMs and test on there to see that it isn't the Microsoft effect.
Okay some testing now in ubuntu.
Two tests, top one is when the VMs are on the same LS but different hosts. Second test is when it's on the same host.
chris@ubnt2:~$ iperf3 -c 172.16.10.60
Connecting to host 172.16.10.60, port 5201
[ 4] local 172.16.10.61 port 39676 connected to 172.16.10.60 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 575 MBytes 4.82 Gbits/sec 640 1.22 MBytes
[ 4] 1.00-2.00 sec 595 MBytes 4.99 Gbits/sec 2 1.07 MBytes
[ 4] 2.00-3.00 sec 605 MBytes 5.07 Gbits/sec 0 1.43 MBytes
[ 4] 3.00-4.00 sec 605 MBytes 5.07 Gbits/sec 4 1.31 MBytes
[ 4] 4.00-5.00 sec 592 MBytes 4.96 Gbits/sec 4 1.18 MBytes
[ 4] 5.00-6.00 sec 592 MBytes 4.97 Gbits/sec 0 1.51 MBytes
[ 4] 6.00-7.00 sec 594 MBytes 4.99 Gbits/sec 6 1.38 MBytes
[ 4] 7.00-8.00 sec 590 MBytes 4.95 Gbits/sec 2 1.25 MBytes
[ 4] 8.00-9.00 sec 591 MBytes 4.96 Gbits/sec 1 1.10 MBytes
[ 4] 9.00-10.00 sec 601 MBytes 5.04 Gbits/sec 0 1.45 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 5.80 GBytes 4.98 Gbits/sec 659 sender
[ 4] 0.00-10.00 sec 5.80 GBytes 4.98 Gbits/sec receiver
iperf Done.
chris@ubnt2:~$ iperf3 -c 172.16.10.60
Connecting to host 172.16.10.60, port 5201
[ 4] local 172.16.10.61 port 39680 connected to 172.16.10.60 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 2.07 GBytes 17.7 Gbits/sec 524 1.11 MBytes
[ 4] 1.00-2.00 sec 1.85 GBytes 15.9 Gbits/sec 0 1.12 MBytes
[ 4] 2.00-3.00 sec 1.58 GBytes 13.6 Gbits/sec 0 1.12 MBytes
[ 4] 3.00-4.00 sec 1.85 GBytes 15.9 Gbits/sec 0 1.12 MBytes
[ 4] 4.00-5.00 sec 1.71 GBytes 14.7 Gbits/sec 0 1.12 MBytes
[ 4] 5.00-6.00 sec 1.66 GBytes 14.2 Gbits/sec 0 1.12 MBytes
[ 4] 6.00-7.00 sec 1.85 GBytes 15.9 Gbits/sec 0 1.12 MBytes
[ 4] 7.00-8.00 sec 1.94 GBytes 16.6 Gbits/sec 0 1.12 MBytes
[ 4] 8.00-9.00 sec 1.87 GBytes 16.1 Gbits/sec 0 1.12 MBytes
[ 4] 9.00-10.00 sec 1.82 GBytes 15.6 Gbits/sec 0 1.12 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 18.2 GBytes 15.6 Gbits/sec 524 sender
[ 4] 0.00-10.00 sec 18.2 GBytes 15.6 Gbits/sec receiver
Windows does not behave like this, it seems to max out around 5Gbps regardless of which hosts the VMs reside on. But Windows to Linux gets to 7-8 Gbps. So it looks like it's a Windows 'feature' or the window sizing or similar with Windows default iperf settings.
I would still like to get Linux traffic between hosts above the 5Gbps. Where can I look at doing more tests, on the hosts maybe?
I'm unsure if you have tested standard VLAN perf test prior to VXLAN bandwidth check ? If not, I would recommend perform a quick test on Non VXLAN port groups by placing VM's on same and different host and ensure that you are getting line rate bandwidth. For optimal performance, starting from VM sizing, virtual machine network adapters ,drivers all the way till server&switch matters a lot(What type of drivers , if drivers are updated one or not) in addition to other performance tuning tips like RSS/VXLANOffload/TSO/LRO (TSO and Receive checksum are extremely important for line rate performance)
Sreec yes I have tested performance VLAN to VLAN on a regular port group hanging of both a DV and S switch, I get the full 10Gbps speed. I get the 5 in both Windows and Linux so it's definitely an NSX related issue.
Sorry for the late reply. I hope the test what you did is a simple L2 test with no routing involved ? Connecting the VM's to same logical switch and checking the matrix by placing them on same host and different host ? Among the performance features(RSS,LSO,etc ..) which were shared in the earlier thread , may i know what features you are leveraging ?
Hi Sreec - as per my post above, two VMs on a distributed switch on different hosts but same subnet get 10Gb/s.
If I do the same test with the same VMs on the same logical switch but still two different hosts it drops by 50%.
Surely that is pointing to an issue with NSX if I am not mistaken?
Yes, i agree with your point. To be honest i have not seen anyone getting line rate throughput without following the best practices. I will also change the MTU to maximum supported value (9000) instead of minimum recommended value ( also in the guest -8900)and test the performance- you should see a difference .
I agree too - Seems to be VXLAN related. Since a vxlan is really just a dvportgroup that encapsulates packets with the VXLAN header, it's possible you're losing performance in the encapsulation overhead.
-Can you verify that your VXLAN traffic has the same flow on the physical as your VLAN traffic?
-I was going to recommend checking your VTEP load balancing settings, but that SHOULDN'T matter if you're measuring flows between a single source>dest. Still important for overall NSX performance though. (DVS NIC teaming load balancing options and NSX interoperability - Iwan’s wiki )
-Are your VXLANs on the same VDS as your VLAN-backed portgroups?
If/when you increase MTU to 8900+, could you post an update with the difference in ipperf? I'm interested in the results. There are a lot of articles that claim the increased MTU can sometimes double performance since the host and nics are burning fewer cycles on encapsulation.
VMware doesn't publish max VXLAN throughput for a single flow (probably because there are so many variables that can impact throughput). This article (https://vswitchzero.com/2018/08/02/jumbo-frames-and-vxlan-performance/) shows someone increasing MTU and more than doubling their VXLAN performance. They also claim that VXLAN performance is generally seen at around 4-7Gbps when using 1500-1600 MTU.
The articles above aren't "official VMware", but they seem like they could help.
My homelab has been off for a few days now, hence not updating this post. I should have it all running again by the weekend so hope to do some testing so that I can share the results.
There is a lot of overhead associated with VXLAN, unfortunately. Some pNICs have VXLAN offloading capability, which can help, but using large frames (8900 MTU) can make a big difference by reducing packet rate. It's not necessarily the amount of traffic that's the issue, it's the processing of large numbers of frames for encapsulation/de-encapsulation at 1500 MTU. You can also test with the DFW disabled in the cluster to see if that improves performance as well. With an 8900 MTU, you won't have any difficulty hitting line rate.
This post may help:
https://vswitchzero.com/2018/08/02/jumbo-frames-and-vxlan-performance/
It's been a while since I looked into this and I found some time this morning. I thought I'd update to say it was the MTU on the VMs - once changed to 8900 I got the near line throughput.
I wrote a blog on it here:
https://cadooks.com/testing-nsx-network-throughput/
Hi Chris,
Your website could not access from my side, is it still available to access?
Thanks,
Haikal S.
Hi, apologies but I moved the domain and apparently the redirect is broken.
https://chrisdooks.com/2019/03/09/testing-nsx-network-throughput/