VMware Cloud Community
VirtualRookie
Contributor
Contributor

Single threaded performance test runs slower on 2 vCPU than on 1 vCPU

Ok, here's the setup:

  • 1 x HP DL360 gen 9 with:
  • 2 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz CPUs
  • vCenter server Version 5.5.0
  • VMware ESXi, 5.5.0, 3568722, HPE Customized version
  • HT disabled on the Host!

1 VM with 1vCPU, Windows Server 2012 R2.

If I run the Passmark Performance test, Single thread test, I get the following scores:

1vCPU          1915

2vCPU          1178

(average of 5 tests)

The bare metal performance of an E5-2667v4 CPU should be around 1951 passmarks.

Why on earth is the server running slower when it has 2vCPU's?

I would assume that a single threaded application would utilize 1 CPU (from the OS' perspective) 100% and the other CPU 0% (or whatever is used by the OS and other apps).

Other performance tests like Cinebench R15 show similar results. (123 Cinebech's drops to 59 with running on a 2vCPU VM).

Tanks for any inputs!

Br, Bjorn Dirchsen

IT Professional

0 Kudos
1 Reply
ConstantinGhioc
Enthusiast
Enthusiast

This is expected. There is a dedicated lab on this topic in "Optimize and Scale" course from VMware.

This is also documented in Performance Best Practices for VMware vSphere® 5.5

Configuring a virtual machine with more virtual CPUs (vCPUs) than its workload can use might cause slightly increased resource usage, potentially impacting performance on very heavily loaded systems. Common examples of this include a single-threaded workload running in a multiple-vCPU virtual machine or a multi-threaded workload in a virtual machine with more vCPUs than the workload can effectively use.

Even if the guest operating system doesn’t use some of its vCPUs, configuring virtual machines with those vCPUs still imposes some small resource requirements on ESXi that translate to real CPU consumption on the host. For example:

„Unused vCPUs still consume timer interrupts in some guest operating systems. (Though this is not true with “tickless timer” kernels, described in “Guest Operating System CPU Considerations” on page 43.)

„Maintaining a consistent memory view among multiple vCPUs can consume additional resources, both in the guest operating system and in ESXi. (Though hardware-assisted MMU virtualization significantly reduces this cost.)

„Most guest operating systems execute an idle loop during periods of inactivity. Within this loop, most of these guest operating systems halt by executing the HLT or MWAIT instructions. Some older guest operating systems (including Windows 2000 (with certain HALs), Solaris 8 and 9, and MS-DOS), however, use busy-waiting within their idle loops. This results in the consumption of

resources that might otherwise be available for other uses (other virtual machines, the VMkernel, and so on).

ESXi automatically detects these loops and de-schedules the idle vCPU. Though this reduces the CPU overhead, it can also reduce the performance of some I/O-heavy workloads. For additional information see VMware KB articles 1077 and 2231.

„The guest operating system’s scheduler might migrate a single-threaded workload amongst multiple vCPUs, thereby losing cache locality.

These resource requirements translate to real CPU consumption on the host.

You can also check VM Right-Sizing Best Practice Guide, chapter on vCPU SMP.

Constantin