1 2 3 Previous Next 123 Replies Latest reply on Oct 13, 2011 1:17 PM by AartK

    ESX4 + Nehalem Host + vMMU = Broken TPS !

    mcwill Expert

      Since upgrading our 2 host lab environment from 3.5 to 4.0 we are seeing poor Transparent Page Sharing performance on our new Nehalem based HP ML350 G6 host.

       

      Host A : ML350 G6 - 1 x Intel E5504, 18GB RAM

      Host B : Whitebox - 2 x Intel 5130, 8GB RAM

       

      Under ESX 3.5 TPS worked correctly on both hosts, but on ESX 4.0 only the older Intel 5130 based host appears to be able to scavenge inactive memory from the VMs.

       

      To test this out I created a new VM from an existing Win2k3 system disk. (Just to ensure it wasn't an old option in the .vmx file that was causing the issue.) The VM was configured as hardware type 7 and was installed with the latest tools from the 4.0 release.

       

      During the test the VM was idle and reporting only 156MB of the 768MB as in use. The VM was vmotioned between the two hosts and as can be seen from the attached performance graph there is a very big difference in active memory usage.

       

      I've also come across an article by Duncan Epping at yellow-bricks.com that may point the cause as being vMMU...

      MMU article

       

      If vMMU is turned off in the VM settings and the VM restarted then TPS operates as expected on both hosts. (See second image)

       

      So if it comes down to chosing between the two, would you choose TPU over vMMU or vice versa?

        • 1. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
          depping Champion
          User ModeratorsVMware Employees

          Well it's not broken. When memory is scarce it apparently will start breaking up the Large Pages in Small Pages which will be TPS'ed after a while. It's not only for Nehalem btw, AMD RVI has the same side effect. I've already addressed this internally and the developers are looking into it.

           

           

           

          Duncan

          VMware Communities User Moderator | VCP | VCDX

          -


          Blogging: http://www.yellow-bricks.com

          Twitter: http://www.twitter.com/depping

           

          If you find this information useful, please award points for "correct" or "helpful".

          1 person found this helpful
          • 2. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
            mcwill Expert

            Duncan,

             

            Thanks for the response, and I'll bow to your experience as to whether TPS is still functional in the presence of vMMU, but I'd argue that from a user's perspective something certainly appears broken...

             

            What led me to investigate this was that I have a number of VMs currently alarming due to 95% memory usage, however on investigation within the VM itself windows is reporting

             

            Physical Mem = 1024MB

            In Use = 471MB

            Available = 525MB

            Sys Cache = 630MB

             

            Which can in no way be construed as memory starved.

             

            Regards,

            Iain

            • 3. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
              depping Champion
              User ModeratorsVMware Employees

              I know, as far as I know it's something that's being investigated.... It seems like vCenter reports this info incorrectly. I will contact the developers again and would like to ask you to call support! Have them escalate this to development.

               

               

               

               

              Duncan

              VMware Communities User Moderator | VCP | VCDX

              -


              Blogging: http://www.yellow-bricks.com

              Twitter: http://www.twitter.com/depping

               

              If you find this information useful, please award points for "correct" or "helpful".

              • 4. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                mcwill Expert

                Thanks Duncan, I've raised SR#1303220991 and referenced this thread.

                 

                Regards,

                Iain

                • 5. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                  joergriether Hot Shot
                  vExpert

                  I just came up the exactly same behaviour on a dell r710 equipped with two xeon 5520 quadcore and 36Gig Mem. I created some new W2003 machines, all showing 95-98% Guest Mem Usage in vSphere Client, but inside guest shows it´s pretty normal. AND (and now it becomes bad) when comparing the subjective speed inside the guest it is much slower than the same machine with my previous esx 3.5u4.

                   

                  This is not good.

                   

                  regards,

                  Joerg

                  • 6. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                    depping Champion
                    User ModeratorsVMware Employees

                    Keep me posted!

                     

                     

                     

                     

                    Duncan

                    VMware Communities User Moderator | VCP | VCDX

                    -


                    Blogging: http://www.yellow-bricks.com

                    Twitter: http://www.twitter.com/depping

                     

                    If you find this information useful, please award points for "correct" or "helpful".

                    • 7. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                      Enthusiast

                      Transparent page sharing works only for small pages (we are investigating efficient way to implement it for Large pages). On EPT/NPT capable systems using large pages offers better MMU performance and so ESX takes advantage of large pages transparently. It is possible that you are not getting the same level of TPS benefits on the EPT/NPT systems for this reason.

                      • 8. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                        Enthusiast

                        If you want you can also try disabling the use of large pages

                        goto Advanced Settings dialog box, choose Mem.

                        set Mem.AllocGuestLargePage to 0

                         

                        This should improve TPS.

                        1 person found this helpful
                        • 9. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                          joergriether Hot Shot
                          vExpert

                           

                          Hmmm, i must hardly refute. "The same level of benefits" ist highly understated. Let me tell you this one: Yesterday I tried to start the vmware tools installer on a freshly installed w2003 with 1 gig and 1 cpu out of the box. I did it contemporaneous on an esx 3.5 (dell r710) and on an esx 4 (dell r710), EXACTLY the same machine. Now, the ESX 3.5 machine did it in 23 seconds, the ESX 4 machine did it in 320 seconds. Does that sound good? I repeat: This is NOT good. This has to be fixed asap.

                          best,

                          Joerg

                           

                           

                          • 10. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                            mcwill Expert

                            Thanks I can confirm that after setting Mem.AllocGuestLargePage to 0 and vmotioning the VMs off then back onto the Nehalem host that TPS is again operating with active memory now down at less than 20% for all VMs.

                             

                            Can you confirm if the above setting still uses hardware assist for MMU but with the smaller (TPS friendly) page size, or does it have the effect of turning off hardware assistted MMU?

                            • 11. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                              mcwill Expert

                              Joerg,

                               

                              I'm not experiencing the same performance hit that you are seeing.

                               

                              Performance has been good, and the TPS problem has been the only issue sor far that would stop me pushing ESX4 onto our production environment.

                               

                              Regards,

                              Iain

                              • 12. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                                ufo8mydog Enthusiast

                                Hi there

                                 

                                Perhaps I am a bit slow, but I do not understand the full extent of the problem.

                                 

                                1) Are all VM's (32, 64, Windows, Linux) affected by this vMMU bug on Nehalem hardware?

                                2) Currently it seems that all VMs have vMMU set to "Automatic".

                                 

                                 

                                • When I move to our Nehalem infrastructure should I be setting vMMU to "forbid" and then rebooting?

                                • Or, is the better solution to set Mem.AllocGuestLargePage to 0 (and rebooting) as kichaonline suggested?

                                 

                                • 13. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                                  mcwill Expert

                                  I can confirm all VM types are affected.

                                   

                                  As to what is the best solution... I'll leave that to the more knowledgable members of this community.

                                   

                                  Regards,

                                  Iain

                                  • 14. Re: ESX4 + Nehalem Host + vMMU = Broken TPS !
                                    neyz Novice

                                    Hello everyone,

                                     

                                    I have succesfully managed to upgrade our ltitle farm to vSphere 4. The problem is that since then, all my guest have started to get little red exclamation points. Memory usage goes up to 2GB and just stays there no matter what even if in the guest the reported usage is 50% I am used to the guest memory usage going to the max but then it usually went dow, now it just seems blocked.

                                     

                                     

                                    • I have upgraded vmware tools on the guests

                                    • I have upgraded the vmware virtual hardware

                                    • I have forced the  CPU/MMU Virtualization to use Intel VT

                                     

                                    Guest is Win2K8 with 2GB of ram and  2vCPU.

                                     

                                    I am not sure if this is the same issue as you guys have but it seems kinda weird to me since i didn't have this behavior before the upgrade.

                                     

                                    Cheers !

                                    1 2 3 Previous Next