VMware Cloud Community
mlgvmware
Contributor
Contributor

Build time significantly slower on multi core VM

Hi,

Specs first:

ESXi 5.0

Host = HP DL385G7, AMD Opteron Processor 6180 SE 2.5 Ghz, 2 sockets, 12 cores each, 24 total cores, 48GB RAM

I know that best performance in VM is achieved when you run single core VMs.  That said the numbers are seeing just don't make sense.

I have a very large build process which build a significant amount of C++ projects, some Java projects, and creates installers at the end of the process.  If I use a 1 CPU virtual client with 4GB memory the total process finishes in 2.5 hours.  If I use a 4 CPU virtual client with 4GB memory the time goes to 6 hours.  I assume something is horribly wrong with my configuration to see this performance hit.

If it is relevant, this is not a multi threaded build.  We use a build tool that does each step in sequence for dependency reasons.

Thanks for any ideas

Mike

Reply
0 Kudos
7 Replies
cmacmillan
Hot Shot
Hot Shot

Mike:

Couple of assumptions:

1) OS has a HAL to use the extra "cores" provide by the VM configuration;

2) OS shows the extra "cores" in use when target load is present;

3) No memory over commit condition on the ESXi host;

You're performance "loss" would make sense if you already had (severe) CPU contention on the host. You could find that your quad vCPUs get scheduled with 1/2-1/4 the frequency of a single vCPU, making single threaded applications 1/2 to 1/4 as "quick" to complete. If you see a condition where the OS schedules 20-30% across all four "cores" versus 100% in a single core (as would assume a single threaded application run) then you might have a use case where single core will easily beat multi-core (based on CPU contention behavior).

It might also make sense if your OS was swapping to disk, but this sounds like classic contention. How loaded is your ESXi host?

Collin C. MacMillan, VCP4/VCP5 VCAP-DCD4 Cisco CCNA/CCNP, Nexenta CNE VMware vExpert 2010-2012 SOLORI - Solution Oriented, LLC http://blog.solori.net If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
mlgvmware
Contributor
Contributor

Answers inline

1) OS has a HAL to use the extra "cores" provide by the VM configuration;

-Yes, Windows 2003 server.

2) OS shows the extra "cores" in use when target load is present;

-Yes, the OS correctly shows 4 cores

3) No memory over commit condition on the ESXi host;

-The host has 48GB memory.  I've allocated a total of 30 to VMs

The load on the server "should" be light.  I've run my tests overnights and on weekends where the other virtual machines should essentially be idle.  I don't yet have access to the host directly to see its performance but will be getting that soon.

Mike

Reply
0 Kudos
cmacmillan
Hot Shot
Hot Shot

How many vCPU per VM? You have 24 cores, with at least 30 vCPU - 60 vCPU if 2 vCPU per VM. Either will guarantee CPU contention... I'm assuming since you've indicated no memory contention that you have under 1.5GB/VM provisioned vRAM...

Here's a test:

     Bump-up the CPU shares of the VM from "Normal" (default) to "High" and run your test again.

It should take approximately half the time as the first run. If it does, this document will explain why:

     http://www.vmware.com/resources/techresources/10131

If you want to remove CPU contention, you can place a reservation on the VM or modify CPU affinity to lock this VM to a specific set of cores and lock-out other VMs from using them. This should guarantee "dedicated" CPU performance if done correctly.

Collin C. MacMillan, VCP4/VCP5 VCAP-DCD4 Cisco CCNA/CCNP, Nexenta CNE VMware vExpert 2010-2012 SOLORI - Solution Oriented, LLC http://blog.solori.net If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
mlgvmware
Contributor
Contributor

I have a total of 10 virtual machines running on the host and they are all 1 CPU machines except for when I am experimenting with 1 vs 4 for this machine.  10 out of 24 cores makes me think that the load should be easy and not an issue.

An interesting development overnight is that I exported this VM from this AMD host and then imported it into an Intel host and the issue went away completely.   The 1 CPU time was within 4% of the 4 CPU time.  The Intel box is a HP Proliant DL380 G6.  2.533Ghz.  2 x 4-core with hyperthreading turned on for a logical CPU count of 16.

I will try setting the CPU to high.  I've already tried reserving CPU and that didn't show any impact.

Right now I'm assuming it is AMD chips.

Mike

Reply
0 Kudos
cmacmillan
Hot Shot
Hot Shot

Sorry, I misread your use of 30 as being VM count  not 30GB vRAM. CPU shares will have no effect in the absence of CPU contention.....

I'm not sure I follow, but it sounds like your 4 vCPU Intel run was within 4% - time wise - of the 1 vCPU Intel run. Is this right? Sounds single threaded....

It is very likely thst your compiler is nit recognizing the AMD processor properly and selecting optimizations that cripple compiler performance. This has been a complaint of AMDs for years against compilers based on Intel's compiler core.

To test this, make sure you're running the compiler in a verbose mode so you can see the options that are being defaulted in your previous test run, and the run a 4 vCPU test on AMD with the compiler options set to strict single threaded behavior. If a single thread run is within 4% of a single core run you might need a patch for your compiler...

Collin C. MacMillan, VCP4/VCP5 VCAP-DCD4 Cisco CCNA/CCNP, Nexenta CNE VMware vExpert 2010-2012 SOLORI - Solution Oriented, LLC http://blog.solori.net If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
mlgvmware
Contributor
Contributor

You are correct that the build is single threaded, and that is by design.  The only reason I am chasing down the issue with performance when adding more cores is so that I can run multiple builds on 1 larger VM instead of many smaller VMs is for software licensing reasons.  Something that is fascinating to me is the 2 pictures of task manager below.  The left task manager is from my AMD box.  Even though my work is basically single threading it is thrashing between virtual cores.  The box on the right is my Intel box, which is now running 8 cores given the 4 core test was good.  The Intel box properly shows 1 core doing the work of the build.

It is clear the AMD one will be slower, I just still don't understand why it is doing this.    

taskmanager.png

I can try testing with a newer compiler on AMD.

Mike

Reply
0 Kudos
cmacmillan
Hot Shot
Hot Shot

Yes, that looks like a classic case. besides compiler updates, microsoft has released windows updates to handle scheduler differences on the newer AMD CPUs. Anand and other sites have reported om this issue. suffice to say this is not an ESXi issue.

once you have the proper updates, you should find that 1 VM per thread on AMD should beat one VM pet thread on Intel when VM count = thread count and vCPU per VM = 1 given parallel completion time dividrd by VM count.

consider awarding points for this thread...

Collin C. MacMillan, VCP4/VCP5 VCAP-DCD4 Cisco CCNA/CCNP, Nexenta CNE VMware vExpert 2010-2012 SOLORI - Solution Oriented, LLC http://blog.solori.net If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos