Storage

 View Only
Expand all | Collapse all

Performance issue using virtualization

  • 1.  Performance issue using virtualization

    Posted Oct 16, 2009 02:39 PM

    When comparing our software installed in a physical host and in a comparable VM (same CPU, memory), we notice that the product is two times slower when running in a VM (Guest OS is Windows 2003, using an ESXi 4.0 host. Software uses only one CPU)

    As we are not in production environnement, we tried with all others VM powered off. We tested our VM with and without CPU reservation (best results with), with and without memory (quite no difference), and with 1, 2 and 4 vCPU

    After several tests, it seems that the problem comes from usage of semaphores : when replacing them by critical sections (but we can't replace all), performances are quite the same between the physical host and the VM. All code is executed with similar performances but when using semaphores, the VM consumes CPU longer than the physical host.

    Does anyone already eared something about such a problem ? Is there a reason to explain bad performances when using semaphores under Windows hosted by ESXi 4.0 ?

    For example, we wrote a simple program to benchmark semaphores under Windows hosted by ESX (in our lab, it took 10sec on a physical host and 22sec in a VM) :

    #include "stdafx.h"

    int _tmain(int argc, _TCHAR* argv[])

    {

    unsigned __int64 nCount;

    DWORD nTickCount = ::GetTickCount();

    HANDLE hSemaphoreBridgets = CreateSemaphore(NULL, 1, 1, NULL);

    for (nCount = 0; nCount < 10000000; ++nCount)

    {

    WaitForSingleObject(hSemaphoreBridgets, INFINITE);

    ReleaseSemaphore(hSemaphoreBridgets, 1, NULL);

    }

    printf("Duration %d s\r\n", (::GetTickCount() - nTickCount) / 1000);

    CloseHandle(hSemaphoreBridgets);

    return 0;

    }



  • 2.  RE: Performance issue using virtualization

    Posted Oct 16, 2009 04:29 PM

    The default execution mode for Windows 2003 is binary translation. You may be measuring system call overheads, though it is not clear to me why a semaphore implementation would require system calls.

    If ESX supports VT-x or AMD-V on your hardware and you have SP2 installed in the guest, I would recommend changing the execution mode to 'VT-x or AMD-V.' Then try the experiment again.



  • 3.  RE: Performance issue using virtualization

    Posted Oct 19, 2009 11:33 AM

    Unfortunately, VT-x mode has already been set...



  • 4.  RE: Performance issue using virtualization

    Posted Oct 19, 2009 02:37 PM

    Can you upload your benchmark program?



  • 5.  RE: Performance issue using virtualization

    Posted Oct 20, 2009 08:15 AM
      |   view attached

    Here it comes... it is a 64 bit binary.

    Attachment(s)

    zip
    CPUBenchSem.zip   25 KB 1 version


  • 6.  RE: Performance issue using virtualization

    Posted Oct 20, 2009 04:30 PM

    Ah. So you are running Windows 2003 x64? If so, you can ignore what I said about the default execution mode; I was assuming you were running 32-bit Windows 2003.



  • 7.  RE: Performance issue using virtualization

    Posted Oct 20, 2009 05:34 PM

    Sorry, I should have mentionned it before.



  • 8.  RE: Performance issue using virtualization

    Posted Oct 20, 2009 10:41 PM

    This seems to be a well-behaved benchmark with low virtualization overheads. I can't really explain your 2x slowdown. Can you tell me which CPU you are using and exactly which Windows release you are testing?



  • 9.  RE: Performance issue using virtualization

    Posted Oct 21, 2009 06:45 AM

    Our ESX hosts are "small" servers as they are used for tests. The one used for this bench is a Xeon 5130 running under ESXi 4.0. Guest OS is Windows Server 2003 64 bits

    , Enterprise Edition, Service Pack 2

    We are asking environment of our client having same problem under ESX 3.5



  • 10.  RE: Performance issue using virtualization

    Posted Oct 21, 2009 09:38 AM

    Here is our client ESX's configuration

    VMWare

    installed

    ja

    OS

    Windows Server 2003 EE

    SP2

    BITs per

    OS

    32

    Servertyp

    HP ProLiant DL585 G5

    Prozessortyp

    AMD

    Opteron

    Cores (in Klammer verfügbare

    Cores)

    4 (of

    16)

    CPU-Taktfrequenz

    2,3

    Hauptspeicher

    8 (of

    64)



  • 11.  RE: Performance issue using virtualization

    Posted Oct 21, 2009 06:29 PM

    So, the problem occurs on both AMD and Intel processors, with both 32-bit and 64-bit versions of Windows 2003, on ESX 3.5 and ESX 4? That sounds pretty widespread. I'm surprised that nothing jumped out at me. I'll file a bug report with our performance team.



  • 12.  RE: Performance issue using virtualization

    Posted Oct 28, 2009 08:06 AM

    May be I was drunk ...

    Both guest OS are 32 bits, Windows Server 2003 Enterprise... (one running on ESXi 4.0/Intel, the other one running on ESX 3.5/AMD)



  • 13.  RE: Performance issue using virtualization

    Posted Nov 17, 2009 08:03 AM

    Do you have any news on this subject ? Is there something we can do ?



  • 14.  RE: Performance issue using virtualization

    Posted Nov 17, 2009 03:40 PM

    I was unable to replicate your results with the 64-bit benchmark you sentm using Windows 2003 x64. If you package up a 32-bit version of your benchmark, I'll have another look.



  • 15.  RE: Performance issue using virtualization

    Posted Nov 17, 2009 04:00 PM
      |   view attached

    "This is it !"

    Attachment(s)

    zip
    CPUBenchSem-32.zip   25 KB 1 version


  • 16.  RE: Performance issue using virtualization
    Best Answer

    Posted Nov 18, 2009 01:19 AM

    I profiled your benchmark and found that it spends most of its time in these three Windows HAL functions:

    39.83% hal!KfLowerIrql

    19.82% hal!KeRaiseIrqlToDpcLevel

    19.07% hal!KeRaiseIrqlToSynchLevel

    The hot spots in each function are TPR accesses (0FFFE0080h is the address of the TPR in the local APIC):

    hal!KfLowerIrql:

    807168e4 890d8000feff mov dword ptr ds:\[0FFFE0080h],ecx

    807168ea a18000feff mov eax,dword ptr ds:\[FFFE0080h]

    hal!KeRaiseIrqlToDpcLevel:

    807168a0 8b158000feff mov edx,dword ptr ds:\[0FFFE0080h]

    807168a6 c7058000feff41000000 mov dword ptr ds:\[0FFFE0080h],41h

    hal!KeRaiseIrqlToSynchLevel:

    807168bc 8b158000feff mov edx,dword ptr ds:\[0FFFE0080h]

    807168c2 c7058000feff41000000 mov dword ptr ds:\[0FFFE0080h],41h

    Since the local APIC is virtualized, a TPR access typically causes a VM-Exit under hardware virtualization. However, Intel has introduced FlexPriority, which avoids the VM-Exit for all TPR reads and for some TPR writes. Because of this, ESX 4.0 defaults to VT-x for 32-bit Windows 2003 on Intel chips with FlexPriority. Unfortunately, FlexPriority is not a panacea. On native hardware, TPR accesses generally take only a few cycles. With FlexPriority, TPR accesses that do not cause a VM-Exit may still take several hundred cycles. TPR accesses that do cause VM-Exits take several thousand cycles. Fortunately, we still have the option of using binary translation. Under binary translation, TPR accesses generally take tens of cycles.

    For this particular workload, you should configure your guest to use binary translation. On my Penryn system, the benchmark runs in 22 seconds using VT-x (with FlexPriority), but it only takes 13 seconds using binary translation. (For completeness, it takes 90 seconds using VT-x without FlexPriority).

    Your client's situation is different. AMD has never introduced a technology equivalent to FlexPriority. However, if your client has configured their VM to use hardware MMU support, then the VM will be using AMD-V, which suffers from the same problems as VT-x without FlexPriority. Make sure that they have configured the VM to use software MMU support so that it will execute using binary translation. (The default execution mode for this guest under ESX 3.5 is binary translation.)



  • 17.  RE: Performance issue using virtualization

    Posted Nov 18, 2009 06:44 AM

    jmattson,

    I just want to say how impressed I am with the level of technical detail you provided in your post. Even if your reply doesn't help the original poster, posts like this are the reason why these forums are such a great resource.

    Thank you!



  • 18.  RE: Performance issue using virtualization

    Posted Nov 18, 2009 11:10 AM

    I am really impressed too ! :smileygrin:

    I thought I had to set binary translation by setting monitor.virtual_exec to software but hardware value made our benchmark runs in 10 seconds rather than the initial 22 sec.

    For our client using AMD based ESX, will we just need to ajust monitor.virtual_exec and monitor.virtual_mmu ?



  • 19.  RE: Performance issue using virtualization

    Posted Nov 18, 2009 03:18 PM

    Thanks. I hope you found this information helpful.

    ESX 3.5 does not respect monitor.virtual_exec. It only supports hardware virtualization on AMD CPUs with RVI, and you get both AMD-V and RVI by requesting RVI:

    monitor.virtual_mmu = "hardware"
    

    You can specifically request binary translation on ESX 3.5 by requesting a software MMU:

    monitor.virtual_mmu = "software"
    

    Note that this has changed slightly with ESX 4.0. To specifically request binary translation on ESX 4.0, you need to specify:

    monitor.virtual_exec = "software"
    



  • 20.  RE: Performance issue using virtualization

    Posted Nov 19, 2009 12:17 AM

    After the kudos, it's embarrassing to admit this, but I did all of this testing with Windows 2003 RTM. Windows 2003 SP2 has addressed this particular issue. See this Microsoft TechNet article.

    After installing SP2, my new timings are 16 seconds for binary translation and only 6 seconds for VT-x (with or without FlexPriority).

    To summarize all of these findings: if you are running this kind of a workload on Windows 2003 pre-SP2, you should use binary translation, but on Windows 2003 SP2, you should use hardware virtualization.



  • 21.  RE: Performance issue using virtualization

    Posted Jan 13, 2010 12:37 PM

    Great thread, so for Windows Server 2003 x64 R2 SP2 and above we can enable the MMU optimization according to the Processor type ? (anything not binary/software) ?

    Kind Regards,

    AWT



  • 22.  RE: Performance issue using virtualization

    Posted Jan 13, 2010 03:29 PM

    Great thread, so for Windows Server 2003 x64 R2 SP2 and above we can enable the MMU optimization according to the Processor type ? (anything not binary/software) ?

    Yes, for both Intel and AMD hardware.



  • 23.  RE: Performance issue using virtualization

    Posted Jan 16, 2010 12:12 PM

    Thank you Mr. Mattson :smileyhappy:

    Cheers.

    Kind Regards,

    AWT