VMware Cloud Community
cbrou
Contributor
Contributor
Jump to solution

Poor single threaded performance with Sandy Bridge-EP and VMWare?

We just added a new Dell R620 server to our Cluster and noticed that the Passmark benchmarking tool reports very poor single threaded performance compared to our older PowerEdge 1950 hosts. The Passmark tool was run in a 2 vCPU MS Server 2008 R2 VM that was simply vmotioned between the two hosts to perform the before and after testing. The older hosts have x5460 Procs and the new R620 host has E5-2670 procs. When compared to other like passmark benchmarks the "Our-Results x5460" matches up pretty closely but the "Our-Results E5-2670" does not. What might be going on here? Why does the E5-2670 appear to do so poorly with single threaded calculations in my environment? Yet clearly outperform the X5460 in other's (likely non-VMWare) benchmarks?

See the attachment for the results. Again, I am mostly concerned with the singled threaded tests since that is the only one that losses significantly to the somewhat ancient X5460 proc in our environment.

0 Kudos
1 Solution

Accepted Solutions
admin
Immortal
Immortal
Jump to solution

A couple of scattered thoughts...

Do you have the latest Dell BIOS for the R620 (version 1.3.6)?  Have you tried clearing NVRAM (i.e. loading setup defaults in the BIOS)?

View solution in original post

0 Kudos
21 Replies
sparrowangelste
Virtuoso
Virtuoso
Jump to solution

E5-2670  2.60 GHz,  8 core/16 thread

x5460 3.16 GHz  4 cores

the 3.16 GHz has a faster core so I dont think its out of the ordinary.

--------------------- Sparrowangelstechnology : Vmware lover http://sparrowangelstechnology.blogspot.com
0 Kudos
jrmunday
Commander
Commander
Jump to solution

What build number are your ESXi hosts on? Hopefully build number 821926.

If you're not already on this build, can you patch the hosts and re-run the tests.

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
admin
Immortal
Immortal
Jump to solution

When a VM is vMotioned between hosts with significantly different clock speeds, the instruction that reads the time stamp counter has to be intercepted on the target machine to emulate the original clock speed.  This instruction may be used particularly frequently in a benchmark program such as PassMark.

Try a cold migration of the VM, and see if the benchmark numbers improve.

0 Kudos
cbrou
Contributor
Contributor
Jump to solution

Thanks for the suggestion Jim. I just tried this after a cold migration but had the same results.

I tried one additional test that gives interesting results. I switched the testing VM to just 1 vCPU and ran the tests again on the old host and the new host. The x5460 proc (old host) gives about the same result with 1 vCPU but the E5-2670 proc (new host) scores about twice as high with 1 vCPU. See attached.

0 Kudos
cbrou
Contributor
Contributor
Jump to solution

I was fully patched with 5.0 when running the tests initially. I actually upgraded to 5.1 today and ran the tests again with the same results.

0 Kudos
cbrou
Contributor
Contributor
Jump to solution

The number of cores and threads can be removed from the equation because I am just looking at the single threaded test. As for the GHz…. as I understand it higher GHz does not necessarily mean faster processing. This guy<http://www.tomshardware.com/forum/254900-28-what-processor> seems to word it well:

A Hertz is the measure of the time it takes for the CPU to cycle (IE, from a high power state to a low power state). At the start of this cycle, the CPU receives the instructions it is to perform, so a higher cycle rate will give the CPU instructions at a faster rate, theoretically increasing performance.

Instructions Per Cycle is the term used to measure how many instructions a CPU can perform in a single cycle. As such, a slower cycle rate (lower GHz) CPU can be faster then a CPU with a higher cycle rate if it can do more instructions per clock cycle. (Athlon vs Pentium 4)

0 Kudos
admin
Immortal
Immortal
Jump to solution

Can you post (attach) the vmware.log file for the VM run on the new system?

0 Kudos
EdWilts
Expert
Expert
Jump to solution

You're not alone.

I'm currently working an issue with both HP and Intel on an X5670 to E5-2650 comparison.  Same situation as yours - the E5 proc is not as fast as it should be.  Mine are on HP BL460c G7 and Gen8 blades.

We don't have formal benchmarks yet but some users are grumbling.  We've asked both HP and Intel for benchmarks we can run against the blades and in a VM within ESXi to see who the culprit (if either) is but neither HP nor Intel have supplied one.

.../Ed (VCP4, VCP5)
0 Kudos
cbrou
Contributor
Contributor
Jump to solution

See attached

0 Kudos
EdWilts
Expert
Expert
Jump to solution

E5-2670  2.60 GHz,  8 core/16 thread

x5460 3.16 GHz  4 cores


the 3.16 GHz has a faster core so I dont think its out of the ordinary.

In my case, which appears to be similar, I'm comparing the 2.93Ghz X5670 with the 2Ghz E5-2650.  The clock speed specificially came up in my discussion with the local Intel rep and we dove into the SPEC benchmarks.  Although they normally report on the sum of the cores in a single processor, it's easy to divide by the number of cores to see how fast each core is.  In my case, the E5-2650 is supposed to be slightly faster than the X5670 - it's simply doing more work in the same cycle, taking advantage of faster memory, on-chip PCIe, etc.

I'm grabbing my benchmarks fromhttp://www.cpubenchmark.net/high_end_cpus.html

The E5-2670 is rated at 15,565.  Divide that by 8-cores and you're looking at 1,945 per core.

The X5460 is rated at 4,539.  Divide that by 4 cores and you're looking at 1,134 per core.

It would be interesting if we could see the single-stream benchmark results with hyperthreading turned off.

.../Ed (VCP4, VCP5)
0 Kudos
admin
Immortal
Immortal
Jump to solution

Aside from the indications that this was a warm migration, I don't see anything obviously amiss.

The observation you made about single vCPU performance is also hard to explain.  I wonder if ESXi could be scheduling the two vCPUs on a pair of hyper-twins.  Does it help to disable hyper-threading?

0 Kudos
cbrou
Contributor
Contributor
Jump to solution

The last migration I did was a warm migration. The test that I ran last night looked like this:

· Migrate VM back to the old environment and shut it down

· Disable EVC

· Migrate VM to new server while shut down

· Turn VM back on.

Should that have sufficed for a cold migration?

I have some new information that may be of interest.

I set CPU affinity on 0,1 and ran the test. I still had the same poor results but it is easier to see PCPU usage in ESXTOP. See the top part of the screenshot attached. See how the PCPU Used only adds up to 62!

The bottom part of screenshot shows the test with only 1 vCPU assigned to the VM. Notice that the PCPU Used is at 115. Again, with this 1 vCPU setup I get great PassMark single threaded results.

According to VMWare “PCPU UTIL(%) might differ from PCPU USED(%) due to power management technologies or hyper-threading.”.

http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.resourcemanagement....

I have confirmed with Dell though that all power saving features are turned off for the processor (we actually tested with them off and on with the same results). More specifically we modified the C1E and C-States. We tested with them enabled and with them disabled.

0 Kudos
admin
Immortal
Immortal
Jump to solution

cbrou wrote:

The last migration I did was a warm migration.  The test that I ran last night looked like this:

·         Migrate VM back to the old environment and shut it down

·         Disable EVC

·         Migrate VM to new server while shut down

·         Turn VM back on.

Should that have sufficed for a cold migration?

Yes.

I set CPU affinity on 0,1 and ran the test.   I still had the same poor results but it is easier to see PCPU usage in ESXTOP.  See the top part of the screenshot attached.   See how the PCPU Used only adds up to 62!

CPUs 0 and 1 may be hyper-twins.  What if you set the CPU affinity to 0,2?

0 Kudos
cbrou
Contributor
Contributor
Jump to solution

I get the same results with affinity set to 0,2.

0 Kudos
admin
Immortal
Immortal
Jump to solution

Can you try changing the CPU/MMU virtualization settings to "Use hardware support for CPU virtualization (VT/AMD-V) only"?

0 Kudos
cbrou
Contributor
Contributor
Jump to solution

I don’t have that option but I do have some others. I just tried them all but it did not make a difference.

0 Kudos
admin
Immortal
Immortal
Jump to solution

Please enable statistics gathering (under advanced general options, I think), power on the VM, run the single-threaded passmark benchmarks, and power off the VM.  This will create a 'stats' directory under the VM's directory.  Tar and compress the stats directory and mail it to me, and I'll see if I can find anything.

0 Kudos
admin
Immortal
Immortal
Jump to solution

A couple of scattered thoughts...

Do you have the latest Dell BIOS for the R620 (version 1.3.6)?  Have you tried clearing NVRAM (i.e. loading setup defaults in the BIOS)?

0 Kudos
cbrou
Contributor
Contributor
Jump to solution

Ok – I will send those out soon. I just found out that if I put additional load on the VM by running other applications it seems to “wake up” the processor and I get the good results I am looking for. This doesn’t fix my problem or answer my question but it is interesting.

0 Kudos