VMware Cloud Community
skalaxc
Contributor
Contributor

VMWare VM with a passthrough NIC is not scaling

Hello,

Greetings. I've a standalone ESX host running on ESX5.5 version. And I've a two Intel's 10Gig Network card in the host. When I try

1. Created a Virtual switch with one of the 10G connected in it. A VM with VSwitch NIC - setup A.

2. Configured the other 10G NIC as a passthrough NIC. And created a VM with a PCI Device ( passthrough NIC )  - Setup B

In the setup(2)  I can see NIC driver ixgbe is loaded in the VM. Am sending the data to the application running in both the setup. And I expect high performance in the passthrough mode than VSwitch setup. But unfortunately I see less number of connections/sec in the passthrough mode than VSwitch setup. When the application is loaded I tried to monitor the esxtop data on the ESXi host. And i could see one of the PCPU is 100% ( PCPU varies, but only one at a time ). And the corresponding VM is in the top which high %of used.

Could anyone shed some lights why my PCI Passthrough is not scaling well? Is there any tuning I should do?

0 Kudos
5 Replies
Linjo
Leadership
Leadership

What kind of OS, Application and data?

Are you sure that the NIC is the bottleneck and not CPU for example?

What is the overall goal you are trying to achieve?

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
skalaxc
Contributor
Contributor

Sorry about not giving the clear information. The guest os is :  Ubuntu 14.04 3.13 kernel 64 bit.

Basically what I want to achieve is :  better performance with the pci passthrough NIC ( > CPS than what VSWITCH NIC setup gives ).

Im doubting on the NIC because when I use the same application and data on VSwitch NIC setup I get 40K Connections/per sec but with the Passthrough NIC I only get 25K. ( same number of CPUs and memory in both the setup ) . How would it be CPU/other problems?  ( TBH It was a big surprise for me ).

0 Kudos
MKguy
Virtuoso
Virtuoso

First off, can you give some more details on the type of traffic you're testing with? Is it TCP or UDP connections? What application layer protocol? Have you tested other network benchmarking tools such as iperf as well?

Does the VM only have a single vCPU or did you try to increase it? When you passthrouh a NIC to a guest VM, all the work related to handling network packets that is not directly offloaded to the NIC hardware needs to be done inside the OS as well with its limited computing resources.

If you use a vmxnet3 vNIC however, most of this is being offloaded from the VM to host independently of the computing resources assigned to the VM.

Therefore, make sure the VM has enough CPU resources and assign additional vCPUs. Also make sure the VM's virtual hardware version is up-to-date.

I assume you may also need to tune the ixgbe driver settings to make sure it's using hardware offloading capabilities and multiple CPU interrupt queues/receive side scaling is enabled etc.

For example, on this 4 vCPU VM with a vmxnet3 vNIC you can see that all CPUs handle interrupts for this single network interface with multiple receive and transmit queues:

# cat /proc/interrupts | egrep -i 'eth|cpu'

           CPU0       CPU1       CPU2       CPU3

57:   76566551   70446837   68928406   61311955   PCI-MSI-edge      eth0-rxtx-0

58:   80703330   66993836   64197758   56587761   PCI-MSI-edge      eth0-rxtx-1

59:   52852203   67779552   74430390   81477134   PCI-MSI-edge      eth0-rxtx-2

60:   85093782   65899469   55469451   61844401   PCI-MSI-edge      eth0-rxtx-3

61:          0          0          0          0   PCI-MSI-edge      eth0-event-4

Also check your top CPU stats for a high number of hard (%hi) or soft (%si) interrupts when you run the test.

Next you should examine the NIC offloading settings and make sure at least checksumming and LRO are enabled, or enable others if needed:

# ethtool --show-offload eth0

Offload parameters for eth0:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: off

udp-fragmentation-offload: off

generic-segmentation-offload: off

generic-receive-offload: on

large-receive-offload: on

# ethtool --show-coalesce eth0

# ethtool --show-pause eth

From my experience with tuning heavy traffic Firewalls, increasing the NIC ring buffer sizes also helps with reducing CPU interrupts and gaining higher throughput, increase the values if needed (personally I found 1024 to be a good value but your mileage may vary):

# ethtool --show-ring eth0

Ring parameters for eth0:

Pre-set maximums:

RX:             4096

RX Mini:        0

RX Jumbo:       0

TX:             4096

Current hardware settings:

RX:             256

RX Mini:        0

RX Jumbo:       0

TX:             512

-- http://alpacapowered.wordpress.com
0 Kudos
skalaxc
Contributor
Contributor

Thanks for the reply.. Its really useful.

VM is not with single CPUs. I tried increasing the CPUs.  Yes I agree with what you are saying

"

When you passthrouh a NIC to a guest VM, all the work related to handling network packets that is not directly offloaded to the NIC hardware needs to be done inside the OS as well with its limited computing resources.

If you use a vmxnet3 vNIC however, most of this is being offloaded from the VM to host independently of the computing resources assigned to the VM.

"

So I tried with 4 cpus ( not hyperthreaded ) 16 G memory and 1x10G Ethernet card. And checksumming offload is on, LRO is on. And ring buffer size is set to 1024. rx and tx pause is on.

The weird thing is when I send requests to the application it gives me 95K connection per sec when all the software interrupts are handled by 1CPU ( CPU 0 ), when I pin the interrupts to spread it across all the CPUs

I can see nicely all the CPUs are handling softwareinterrupts but I get only 83K Connection per seconds. Definitely something strange is happening,,

0 Kudos
MKguy
Virtuoso
Virtuoso

If you get better performance with everything pinned to a single CPU, then I suppose in your case CPU cache locality could be more crucial than raw processing power.

This is also mentioned in the first article here, among other tuning points you can try:

http://timetobleed.com/useful-kernel-and-driver-performance-tweaks-for-your-linux-server/

http://dak1n1.com/blog/7-performance-tuning-intel-10gbe/

There are other general performance recommendations like disabling physical host power saving in the BIOS, enabling the latency-sensitive VM option and a lot more you can find in this guide:

https://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf

-- http://alpacapowered.wordpress.com
0 Kudos