Solved: Multiple vCPUs are very slow

icando9 · ‎12-15-2009

I found whenever I setup using more than one cores, the guest is super slow. Does somebody know the reason? The system spec is Core 2 i7, which has 4 cores, each with hyperthread, i.e. 8 logical CPUs.

This phenomenon occurs on both OpenSolaris and Windows Server 2008 R2.

More specifically, once before I wrote some program, which is purely CPU bounded, which doesn't have any IO and very few memory access, to test the CPU performance. When I compiled and let it run in OpenSolaris guest, running with one thread as fast as if it were running on the host. However, when I use 2 threads, each is significantly slower than single thread. The OpenSolaris guest is configured to use 2 CPUs.

The other observation is when I setup RAID5 server on Windows 2008 R2 guest, it is fast (16-20MB/s) if I set it to 1 CPU. But when I try to setup 2 CPUs, the performance drops signifantly (3MB/s) and the guest is so slow that it didn't response to my mouse click.

So my conclusion is, on my machine, it seems one CPU is fast (nearly native), but multiple vCPUs has performance issue. Doesn't anyone know the reason or know some other thread talking about this? Thanks a lot.

BTW: my host CPU is 4 Core w/ HT, but I setup my guest to 2 Processor, and 1 core for each processor. Will changing to 1 Processor with 2 cores on it be better?

AWo · ‎12-15-2009

It doesn't quite explain the reason.

Not necessarily. It is a rule of thumb, not a law. Depending on how much guests and their number of vCPU's and on the load on the host this may still apply. EVen if all guests except one have only one vCPU this particular guest with two vCPU's still has to wait until two cores are free. That takes always longer than if it has not to wait for two cores.

In general you should always start with less vCPU's as possible. In virtualization environments assigning more vCPU's can degrade overall performance.

In addition, depending on the guest OS a huge load of processor interrupts is generated by the virtual timer devices. Depending on the kernel Linux, for example, can generate 1000 + 1000 x n (n is the number of vCPU's) interrupts per second per guest.

Two guests with such a kernel and each one with two vCPU's generate a load of 6000 interupts per second even while they are doing nothing else.

BTW: does assigning 2 CPUs with 1 core each and 1 CPU with 2 cores make any difference?

Yes it can. Depending on the application. If it makes intensive use of the processor caches you better would like to have both vCPU's running on the cores of one CPU as they can use the same chache then.

AWo

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=

=Send "Posting" to 911 for only $999999,99!=

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =

View solution in original post

AWo · ‎12-15-2009

The reason is that symmetric multiprocessoring is used. That means whenever you assign more than one vCPU to a guest it has to wait until all these vCPU's (or host cores) are available. It has to synchronize them.

Let's assume you have two physical cores and you assign two vCPU's. Everytime when the guest has to schedule work to both vCPU's it must wait until both physical cores are free (that leads to a dleay in the guest). It the graps these cores and then there are no free cores for the host and all its applications (that leads to a delay on the host) until the guest is descheduled again.

A rule of thumb says that yu shouldn't assign more than 50% of vCPU's to a guest than you have physical cores.

Having more that one guest with more than one vCPU leads to more processor contention.

Read this guide: http://www.vmware.com/pdf/vsmp_best_practices.pdf

AWo

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=

=Send "Posting" to 911 for only $999999,99!=

Edited by AWo

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =

icando9 · ‎12-15-2009

If you read my first paragraph, you will see I have 4 physical cores, and 8 logical cores (since each core has HT). According to your rule, I can assign up to 4 cores in guest. It doesn't quite explain the reason.

BTW: does assigning 2 CPUs with 1 core each and 1 CPU with 2 cores make any difference?

AWo · ‎12-15-2009

It doesn't quite explain the reason.

Not necessarily. It is a rule of thumb, not a law. Depending on how much guests and their number of vCPU's and on the load on the host this may still apply. EVen if all guests except one have only one vCPU this particular guest with two vCPU's still has to wait until two cores are free. That takes always longer than if it has not to wait for two cores.

In general you should always start with less vCPU's as possible. In virtualization environments assigning more vCPU's can degrade overall performance.

In addition, depending on the guest OS a huge load of processor interrupts is generated by the virtual timer devices. Depending on the kernel Linux, for example, can generate 1000 + 1000 x n (n is the number of vCPU's) interrupts per second per guest.

Two guests with such a kernel and each one with two vCPU's generate a load of 6000 interupts per second even while they are doing nothing else.

BTW: does assigning 2 CPUs with 1 core each and 1 CPU with 2 cores make any difference?

Yes it can. Depending on the application. If it makes intensive use of the processor caches you better would like to have both vCPU's running on the cores of one CPU as they can use the same chache then.

AWo

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=

=Send "Posting" to 911 for only $999999,99!=

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =

popej · ‎12-15-2009

AWo, paper you quote deals with very different product and is already outdated. See for example: http://communities.vmware.com/docs/DOC-4960

In my opinion Workstation scale quite well with SMP. I can measure clear increase of performance with 2 or 4 vCPU on Core Quad host. Maybe there is problem with hyperthreading? Icnado9, can you run some tests with HT disabled in BIOS?

admin · ‎12-16-2009

AWo, paper you quote deals with very different product and is already outdated. See for example: http://communities.vmware.com/docs/DOC-4960

That document is for ESX, not Workstation. We control the scheduler for ESX (and so can do things more intelligently), but we do not control the scheduler for hosted products like Workstation. I would expect ESX to scale better than Workstation.

jayntguru · ‎02-10-2010

This document is pretty old (from esx 2). Haven't there been changes that would mitigate the vcpu issues with the later versions of esx and the additon of hardware virtualization support?

All

Multiple vCPUs are very slow