I am trying to get a clear view of the typical CPU performance overhead due to network load. Testing shows that this is higher than the typical CPU overhead introduced (2-10%).
Test setup:
phy ---> phy
phy --> esxi 4.1.0/5 + single VM with all resources assigned to it
Test tools:
netperf
Systems:
HP bl460g7, 2 x 5620, 48 GB, 3 Gbps flex Ethernet NIC
VM, 8 (no HT)/16 (HT) vCPUs, 48 GB, 3 Gbps flec Ethernet NIC, running RHEL 5u7
In the test we compared the CPU load on a physical server (2 x 5620, 48 GB) under maximum network load (3 Gbps) to that of a single VM running on a ESXi 4.1.0 and ESXi 5 server.
Under full load the CPU usage of the physical server was about 8%, while the load on the ESXi server was 19% with 16% due to the VM. This load is about 2 times higher. The machine was configured with VT support, no power management and HT in the BIOS and the VM used the vmware paravitualised network driver vmxnet3. Switching HT on/off did not make a huge difference.
An additional test we performaned was to use Direct IO path, where we assigned the network NIC directly to the VM under test. This improved the load slightly to around 14%, which is still almost the double load compared to the physical machine. In this case the VM uses the same network driver as the physical system (be2net). To me these figures seem (too) high.
Has anyone got an answer to the following questions:
1. What is the typical overhead for the CPU load due to network load?
2. What can be causing the significant overhead in the case of using Direct IO path?
3. What can be causing the significant overhead in the case of using the vmxnet3 driver?