Storage

View Only

Back to discussions

Expand all | Collapse all

Performance issue using virtualization

Jump to Best Answer

1. Performance issue using virtualization

0 Recommend
sispeo
Posted Oct 16, 2009 02:39 PM

Reply Reply Privately
When comparing our software installed in a physical host and in a comparable VM (same CPU, memory), we notice that the product is two times slower when running in a VM (Guest OS is Windows 2003, using an ESXi 4.0 host. Software uses only one CPU)
As we are not in production environnement, we tried with all others VM powered off. We tested our VM with and without CPU reservation (best results with), with and without memory (quite no difference), and with 1, 2 and 4 vCPU
After several tests, it seems that the problem comes from usage of semaphores : when replacing them by critical sections (but we can't replace all), performances are quite the same between the physical host and the VM. All code is executed with similar performances but when using semaphores, the VM consumes CPU longer than the physical host.
Does anyone already eared something about such a problem ? Is there a reason to explain bad performances when using semaphores under Windows hosted by ESXi 4.0 ?
For example, we wrote a simple program to benchmark semaphores under Windows hosted by ESX (in our lab, it took 10sec on a physical host and 22sec in a VM) :
#include "stdafx.h"
int _tmain(int argc, _TCHAR* argv[])
{
unsigned __int64 nCount;
DWORD nTickCount = ::GetTickCount();
HANDLE hSemaphoreBridgets = CreateSemaphore(NULL, 1, 1, NULL);
for (nCount = 0; nCount < 10000000; ++nCount)
{
WaitForSingleObject(hSemaphoreBridgets, INFINITE);
ReleaseSemaphore(hSemaphoreBridgets, 1, NULL);
}
printf("Duration %d s\r\n", (::GetTickCount() - nTickCount) / 1000);
CloseHandle(hSemaphoreBridgets);
return 0;
}
2. RE: Performance issue using virtualization

0 Recommend
admin
Posted Oct 16, 2009 04:29 PM

Reply Reply Privately
The default execution mode for Windows 2003 is binary translation. You may be measuring system call overheads, though it is not clear to me why a semaphore implementation would require system calls.
If ESX supports VT-x or AMD-V on your hardware and you have SP2 installed in the guest, I would recommend changing the execution mode to 'VT-x or AMD-V.' Then try the experiment again.
3. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Oct 19, 2009 11:33 AM

Reply Reply Privately
Unfortunately, VT-x mode has already been set...
4. RE: Performance issue using virtualization

0 Recommend
admin
Posted Oct 19, 2009 02:37 PM

Reply Reply Privately
Can you upload your benchmark program?
5. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Oct 20, 2009 08:15 AM
| view attached

Reply Reply Privately
Here it comes... it is a 64 bit binary.

Attachment(s)

CPUBenchSem.zip 25 KB 1 version
6. RE: Performance issue using virtualization

0 Recommend
admin
Posted Oct 20, 2009 04:30 PM

Reply Reply Privately
Ah. So you are running Windows 2003 x64? If so, you can ignore what I said about the default execution mode; I was assuming you were running 32-bit Windows 2003.
7. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Oct 20, 2009 05:34 PM

Reply Reply Privately
Sorry, I should have mentionned it before.
8. RE: Performance issue using virtualization

0 Recommend
admin
Posted Oct 20, 2009 10:41 PM

Reply Reply Privately
This seems to be a well-behaved benchmark with low virtualization overheads. I can't really explain your 2x slowdown. Can you tell me which CPU you are using and exactly which Windows release you are testing?
9. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Oct 21, 2009 06:45 AM

Reply Reply Privately
Our ESX hosts are "small" servers as they are used for tests. The one used for this bench is a Xeon 5130 running under ESXi 4.0. Guest OS is Windows Server 2003 64 bits
, Enterprise Edition, Service Pack 2
We are asking environment of our client having same problem under ESX 3.5

10. RE: Performance issue using virtualization

Recommend

sispeo

Posted Oct 21, 2009 09:38 AM

Here is our client ESX's configuration

VMWare installed		ja
OS		Windows Server 2003 EE SP2
BITs per OS		32
Servertyp		HP ProLiant DL585 G5
Prozessortyp		AMD Opteron
Cores (in Klammer verfügbare Cores)		4 (of 16)
CPU-Taktfrequenz		2,3
Hauptspeicher		8 (of 64)

11. RE: Performance issue using virtualization

0 Recommend
admin
Posted Oct 21, 2009 06:29 PM

Reply Reply Privately
So, the problem occurs on both AMD and Intel processors, with both 32-bit and 64-bit versions of Windows 2003, on ESX 3.5 and ESX 4? That sounds pretty widespread. I'm surprised that nothing jumped out at me. I'll file a bug report with our performance team.
12. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Oct 28, 2009 08:06 AM

Reply Reply Privately
May be I was drunk ...
Both guest OS are 32 bits, Windows Server 2003 Enterprise... (one running on ESXi 4.0/Intel, the other one running on ESX 3.5/AMD)
13. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Nov 17, 2009 08:03 AM

Reply Reply Privately
Do you have any news on this subject ? Is there something we can do ?
14. RE: Performance issue using virtualization

0 Recommend
admin
Posted Nov 17, 2009 03:40 PM

Reply Reply Privately
I was unable to replicate your results with the 64-bit benchmark you sentm using Windows 2003 x64. If you package up a 32-bit version of your benchmark, I'll have another look.
15. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Nov 17, 2009 04:00 PM
| view attached

Reply Reply Privately
"This is it !"

Attachment(s)

CPUBenchSem-32.zip 25 KB 1 version
16. RE: Performance issue using virtualization
Best Answer

0 Recommend
admin
Posted Nov 18, 2009 01:19 AM

Reply Reply Privately
I profiled your benchmark and found that it spends most of its time in these three Windows HAL functions:
39.83% hal!KfLowerIrql
19.82% hal!KeRaiseIrqlToDpcLevel
19.07% hal!KeRaiseIrqlToSynchLevel
The hot spots in each function are TPR accesses (0FFFE0080h is the address of the TPR in the local APIC):
hal!KfLowerIrql:
807168e4 890d8000feff mov dword ptr ds:\[0FFFE0080h],ecx
807168ea a18000feff mov eax,dword ptr ds:\[FFFE0080h]
hal!KeRaiseIrqlToDpcLevel:
807168a0 8b158000feff mov edx,dword ptr ds:\[0FFFE0080h]
807168a6 c7058000feff41000000 mov dword ptr ds:\[0FFFE0080h],41h
hal!KeRaiseIrqlToSynchLevel:
807168bc 8b158000feff mov edx,dword ptr ds:\[0FFFE0080h]
807168c2 c7058000feff41000000 mov dword ptr ds:\[0FFFE0080h],41h
Since the local APIC is virtualized, a TPR access typically causes a VM-Exit under hardware virtualization. However, Intel has introduced FlexPriority, which avoids the VM-Exit for all TPR reads and for some TPR writes. Because of this, ESX 4.0 defaults to VT-x for 32-bit Windows 2003 on Intel chips with FlexPriority. Unfortunately, FlexPriority is not a panacea. On native hardware, TPR accesses generally take only a few cycles. With FlexPriority, TPR accesses that do not cause a VM-Exit may still take several hundred cycles. TPR accesses that do cause VM-Exits take several thousand cycles. Fortunately, we still have the option of using binary translation. Under binary translation, TPR accesses generally take tens of cycles.
For this particular workload, you should configure your guest to use binary translation. On my Penryn system, the benchmark runs in 22 seconds using VT-x (with FlexPriority), but it only takes 13 seconds using binary translation. (For completeness, it takes 90 seconds using VT-x without FlexPriority).
Your client's situation is different. AMD has never introduced a technology equivalent to FlexPriority. However, if your client has configured their VM to use hardware MMU support, then the VM will be using AMD-V, which suffers from the same problems as VT-x without FlexPriority. Make sure that they have configured the VM to use software MMU support so that it will execute using binary translation. (The default execution mode for this guest under ESX 3.5 is binary translation.)
17. RE: Performance issue using virtualization

0 Recommend
Scissor
Posted Nov 18, 2009 06:44 AM

Reply Reply Privately
jmattson,
I just want to say how impressed I am with the level of technical detail you provided in your post. Even if your reply doesn't help the original poster, posts like this are the reason why these forums are such a great resource.
Thank you!
18. RE: Performance issue using virtualization

0 Recommend
sispeo
Posted Nov 18, 2009 11:10 AM

Reply Reply Privately
I am really impressed too ! :smileygrin:

I thought I had to set binary translation by setting monitor.virtual_exec to software but hardware value made our benchmark runs in 10 seconds rather than the initial 22 sec.
For our client using AMD based ESX, will we just need to ajust monitor.virtual_exec and monitor.virtual_mmu ?
19. RE: Performance issue using virtualization

0 Recommend
admin
Posted Nov 18, 2009 03:18 PM

Reply Reply Privately
Thanks. I hope you found this information helpful.
ESX 3.5 does not respect monitor.virtual_exec. It only supports hardware virtualization on AMD CPUs with RVI, and you get both AMD-V and RVI by requesting RVI:
monitor.virtual_mmu = "hardware"
You can specifically request binary translation on ESX 3.5 by requesting a software MMU:
monitor.virtual_mmu = "software"
Note that this has changed slightly with ESX 4.0. To specifically request binary translation on ESX 4.0, you need to specify:
monitor.virtual_exec = "software"
20. RE: Performance issue using virtualization

0 Recommend
admin
Posted Nov 19, 2009 12:17 AM

Reply Reply Privately
After the kudos, it's embarrassing to admit this, but I did all of this testing with Windows 2003 RTM. Windows 2003 SP2 has addressed this particular issue. See this Microsoft TechNet article.
After installing SP2, my new timings are 16 seconds for binary translation and only 6 seconds for VT-x (with or without FlexPriority).
To summarize all of these findings: if you are running this kind of a workload on Windows 2003 pre-SP2, you should use binary translation, but on Windows 2003 SP2, you should use hardware virtualization.
21. RE: Performance issue using virtualization

0 Recommend
AlbertWT
Posted Jan 13, 2010 12:37 PM

Reply Reply Privately
Great thread, so for Windows Server 2003 x64 R2 SP2 and above we can enable the MMU optimization according to the Processor type ? (anything not binary/software) ?
Kind Regards,
AWT
22. RE: Performance issue using virtualization

0 Recommend
admin
Posted Jan 13, 2010 03:29 PM

Reply Reply Privately
Great thread, so for Windows Server 2003 x64 R2 SP2 and above we can enable the MMU optimization according to the Processor type ? (anything not binary/software) ?
Yes, for both Intel and AMD hardware.
23. RE: Performance issue using virtualization

0 Recommend
AlbertWT
Posted Jan 16, 2010 12:12 PM

Reply Reply Privately
Thank you Mr. Mattson :smileyhappy:
Cheers.
Kind Regards,
AWT

Storage

Performance issue using virtualization

sispeoOct 16, 2009 02:39 PM

adminOct 16, 2009 04:29 PM

sispeoOct 19, 2009 11:33 AM

adminOct 19, 2009 02:37 PM

sispeoOct 20, 2009 08:15 AM

adminOct 20, 2009 04:30 PM

sispeoOct 20, 2009 05:34 PM

adminOct 20, 2009 10:41 PM

sispeoOct 21, 2009 06:45 AM

sispeoOct 21, 2009 09:38 AM

adminOct 21, 2009 06:29 PM

sispeoOct 28, 2009 08:06 AM

sispeoNov 17, 2009 08:03 AM

adminNov 17, 2009 03:40 PM

sispeoNov 17, 2009 04:00 PM

adminNov 18, 2009 01:19 AMBest Answer

ScissorNov 18, 2009 06:44 AM

sispeoNov 18, 2009 11:10 AM

adminNov 18, 2009 03:18 PM

adminNov 19, 2009 12:17 AM

AlbertWTJan 13, 2010 12:37 PM

adminJan 13, 2010 03:29 PM

AlbertWTJan 16, 2010 12:12 PM

1. Performance issue using virtualization

2. RE: Performance issue using virtualization

3. RE: Performance issue using virtualization

4. RE: Performance issue using virtualization

5. RE: Performance issue using virtualization

6. RE: Performance issue using virtualization

7. RE: Performance issue using virtualization

8. RE: Performance issue using virtualization

9. RE: Performance issue using virtualization

10. RE: Performance issue using virtualization

11. RE: Performance issue using virtualization

12. RE: Performance issue using virtualization

13. RE: Performance issue using virtualization

14. RE: Performance issue using virtualization

15. RE: Performance issue using virtualization

16. RE: Performance issue using virtualization Best Answer

17. RE: Performance issue using virtualization

18. RE: Performance issue using virtualization

19. RE: Performance issue using virtualization

20. RE: Performance issue using virtualization

21. RE: Performance issue using virtualization

22. RE: Performance issue using virtualization

23. RE: Performance issue using virtualization

16. RE: Performance issue using virtualization
Best Answer