mcwill
Expert
Expert

ESX4 + Nehalem Host + vMMU = Broken TPS !

Jump to solution

Since upgrading our 2 host lab environment from 3.5 to 4.0 we are seeing poor Transparent Page Sharing performance on our new Nehalem based HP ML350 G6 host.

Host A : ML350 G6 - 1 x Intel E5504, 18GB RAM

Host B : Whitebox - 2 x Intel 5130, 8GB RAM

Under ESX 3.5 TPS worked correctly on both hosts, but on ESX 4.0 only the older Intel 5130 based host appears to be able to scavenge inactive memory from the VMs.

To test this out I created a new VM from an existing Win2k3 system disk. (Just to ensure it wasn't an old option in the .vmx file that was causing the issue.) The VM was configured as hardware type 7 and was installed with the latest tools from the 4.0 release.

During the test the VM was idle and reporting only 156MB of the 768MB as in use. The VM was vmotioned between the two hosts and as can be seen from the attached performance graph there is a very big difference in active memory usage.

I've also come across an article by Duncan Epping at yellow-bricks.com that may point the cause as being vMMU...

MMU article

If vMMU is turned off in the VM settings and the VM restarted then TPS operates as expected on both hosts. (See second image)

So if it comes down to chosing between the two, would you choose TPU over vMMU or vice versa?

0 Kudos
1 Solution

Accepted Solutions
admin
Immortal
Immortal

>>You talk of nesting page tables which I assume means s/w MMU is still running.

Nope Nested paging is hardware MMU feature. Page table structures are maintained by software but hardware (MMU unit) fetches the information does the page table walk to fetch the information from the page table structure (or fetches it from TLB cache if it is already in the cache). Software MMU does not use nested page tables, instead it uses shadow pagetables and the hardware directly walks the shadow pagetables (and there is no additional cost for TLB misses)

>>is there any benefit in running hardware MMU with small pages?

Oh yes absolutely.

>>The "Large Page Performance" article talks of conguring the guest to use Large Pages, this isn't something I had done for any of our guests. Do the tools handle that now or is it still a required step?

There are different levels of large page support. Applications inside the guest can request the OS for large pages and OS can assign large pages if it has contiguous free memory. But OS mapped large pages (i.e Physical pages) may or may not be backed up by the hypervisor by actual large pages (machine pages). For instance guest may think it has 2M chunks but hypervisor may use 4K chunks to map the 2M chunk to the guest, in this case multiple 4K pages have to be accessed by the hypervisor so there is no performance benefit even though guest uses large pages. In ESX 3.5 we introduced the support for large pages in the hypervisor, with this support whenever the guest tries to back up large pages we explicitly go and find large pages to backup the guest large pages. This helps in performance as demonstrated by the whitepaper.

In addition to this, on NPT/EPT machines hypervisor also opportunistically tries to backup all guest pages (small or large) as large pages. For instance even if the guest is not mapping large pages, contiguous 4K regions of the guest can be mapped by a single large page by the hypervisor, this helps in performance ( by reducing TLB misses). This is the reason why you see the use of large pages even though the guest is not explicitly requesting for it.

>>I have to disagree as to whether this is a bug or not. Large pages make 80% of our VMs alarm due to excess memory usage, and negate one of the main differences between ESX and Hyper-V.

There are two issues here. 1) TPS not doing sufficient memory savings when large pages being used - this is not a bug. It is a tradeoff choice that you have to make. The workaround I suggested will help you to choose which tradeoff you want to make on NPT/EPT hardware. The second issue is 2) VM memory alarm - this is a separate issue and it is not dependent on page sharing. VM memory usage alarms turns red whenever the guest active memory usage goes high. Guest active memory is an estimated through random statistical sampling and the algorithm that the hypervisor uses to estimate active memory usage of a VM overestimates active memory when the guest small pages are backed up large pages (since active memory estimate is done with reference to machine pages) and this is a bug. For now you could simply ignore this alarm (since it is a false alarm), I was told that we will be fixing this pretty soon. However note that this will only fix the alarm, the memory usage of the VM will still remain the same.

View solution in original post

0 Kudos
123 Replies
depping
Leadership
Leadership

Well it's not broken. When memory is scarce it apparently will start breaking up the Large Pages in Small Pages which will be TPS'ed after a while. It's not only for Nehalem btw, AMD RVI has the same side effect. I've already addressed this internally and the developers are looking into it.

Duncan

VMware Communities User Moderator | VCP | VCDX

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

mcwill
Expert
Expert

Duncan,

Thanks for the response, and I'll bow to your experience as to whether TPS is still functional in the presence of vMMU, but I'd argue that from a user's perspective something certainly appears broken...

What led me to investigate this was that I have a number of VMs currently alarming due to 95% memory usage, however on investigation within the VM itself windows is reporting

Physical Mem = 1024MB

In Use = 471MB

Available = 525MB

Sys Cache = 630MB

Which can in no way be construed as memory starved.

Regards,

Iain

0 Kudos
depping
Leadership
Leadership

I know, as far as I know it's something that's being investigated.... It seems like vCenter reports this info incorrectly. I will contact the developers again and would like to ask you to call support! Have them escalate this to development.

Duncan

VMware Communities User Moderator | VCP | VCDX

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

0 Kudos
mcwill
Expert
Expert

Thanks Duncan, I've raised SR#1303220991 and referenced this thread.

Regards,

Iain

0 Kudos
joergriether
Hot Shot
Hot Shot

I just came up the exactly same behaviour on a dell r710 equipped with two xeon 5520 quadcore and 36Gig Mem. I created some new W2003 machines, all showing 95-98% Guest Mem Usage in vSphere Client, but inside guest shows it´s pretty normal. AND (and now it becomes bad) when comparing the subjective speed inside the guest it is much slower than the same machine with my previous esx 3.5u4.

This is not good.

regards,

Joerg

0 Kudos
depping
Leadership
Leadership

Keep me posted!

Duncan

VMware Communities User Moderator | VCP | VCDX

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

0 Kudos
admin
Immortal
Immortal

Transparent page sharing works only for small pages (we are investigating efficient way to implement it for Large pages). On EPT/NPT capable systems using large pages offers better MMU performance and so ESX takes advantage of large pages transparently. It is possible that you are not getting the same level of TPS benefits on the EPT/NPT systems for this reason.

0 Kudos
admin
Immortal
Immortal

If you want you can also try disabling the use of large pages

goto Advanced Settings dialog box, choose Mem.

set Mem.AllocGuestLargePage to 0

This should improve TPS.

joergriether
Hot Shot
Hot Shot

Hmmm, i must hardly refute. "The same level of benefits" ist highly understated. Let me tell you this one: Yesterday I tried to start the vmware tools installer on a freshly installed w2003 with 1 gig and 1 cpu out of the box. I did it contemporaneous on an esx 3.5 (dell r710) and on an esx 4 (dell r710), EXACTLY the same machine. Now, the ESX 3.5 machine did it in 23 seconds, the ESX 4 machine did it in 320 seconds. Does that sound good? I repeat: This is NOT good. This has to be fixed asap.

best,

Joerg

0 Kudos
mcwill
Expert
Expert

Thanks I can confirm that after setting Mem.AllocGuestLargePage to 0 and vmotioning the VMs off then back onto the Nehalem host that TPS is again operating with active memory now down at less than 20% for all VMs.

Can you confirm if the above setting still uses hardware assist for MMU but with the smaller (TPS friendly) page size, or does it have the effect of turning off hardware assistted MMU?

0 Kudos
mcwill
Expert
Expert

Joerg,

I'm not experiencing the same performance hit that you are seeing.

Performance has been good, and the TPS problem has been the only issue sor far that would stop me pushing ESX4 onto our production environment.

Regards,

Iain

0 Kudos
ufo8mydog
Enthusiast
Enthusiast

Hi there

Perhaps I am a bit slow, but I do not understand the full extent of the problem.

1) Are all VM's (32, 64, Windows, Linux) affected by this vMMU bug on Nehalem hardware?

2) Currently it seems that all VMs have vMMU set to "Automatic".

  • When I move to our Nehalem infrastructure should I be setting vMMU to "forbid" and then rebooting?

  • Or, is the better solution to set Mem.AllocGuestLargePage to 0 (and rebooting) as kichaonline suggested?

0 Kudos
mcwill
Expert
Expert

I can confirm all VM types are affected.

As to what is the best solution... I'll leave that to the more knowledgable members of this community.

Regards,

Iain

0 Kudos
neyz
Contributor
Contributor

Hello everyone,

I have succesfully managed to upgrade our ltitle farm to vSphere 4. The problem is that since then, all my guest have started to get little red exclamation points. Memory usage goes up to 2GB and just stays there no matter what even if in the guest the reported usage is 50% I am used to the guest memory usage going to the max but then it usually went dow, now it just seems blocked.

  • I have upgraded vmware tools on the guests

  • I have upgraded the vmware virtual hardware

  • I have forced the CPU/MMU Virtualization to use Intel VT

Guest is Win2K8 with 2GB of ram and 2vCPU.

I am not sure if this is the same issue as you guys have but it seems kinda weird to me since i didn't have this behavior before the upgrade.

Cheers !

0 Kudos
mcwill
Expert
Expert

neyz,

Yes it sounds as though you are hitting the same problem. (As a matter of interest what hardware is your host.)

You should be able to solve the problem by using the workaround provided earlier...

On your hosts Advanced Settings set Mem.AllocGuestLargePage to 0 and either restart the VM or migrate the VM off then back onto the host.

Regards,

Iain

0 Kudos
neyz
Contributor
Contributor

Hello,

The modification of this option seems to have resolved the problem. I don't know if this affects the performance of the VM's ?

We are running with the new Dell R610 with 32GB of RAM and 5500's inside.

Thanks !

0 Kudos
depping
Leadership
Leadership

Yes it will have an impact on performance. one of the reasons that there's a performance increase with virtualized MMU is large pages.

Duncan

VMware Communities User Moderator | VCP | VCDX

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

0 Kudos
admin
Immortal
Immortal

Disabling Large pages for guest does not disable hardware MMU. It can however have performance impact if the workload has lot of TLB misses (usually happens when the workload accesses large amount of memory pages and TLB cache is not sufficient to store all those memory references).The additional nesting of page table pages with hardware MMU makes TLB miss cost more expensive (this is by hardware design and it not induced by software). Using large pages reduces TLB miss counts and thats why ESX transparently tries to use large pages for VMs on EPT/NPT hardware. This is an optimization that ESX does to maximize performance when using hardware MMU. The only problem is that when large pages is used, TPS needs to find identical 2M chunks (as compared to 4K chunks when small pages is used) and the likelihood of finding this is less (unless guest writes all zeroes to 2M chunk) so ESX does not attempt collapses large pages and thats why memory savings due to TPS goes down when all the guest pages are mapped by large pages by the hypervisor. Internally we are investigating better ways to address this issue.

So this is not a bug, its a tradeoff between better memory savings versus slightly better performance. Also this issue is not new to ESX 4.0, this should happen with ESX 3.5 as well if you have NPT capable processor. You will see this issue only with ESX 4.0 see if you have Intel EPT capable processor as ESX 3.5 does not uses hardware on Intel EPT processors (it only supports NPT).

For more information on large pages and its impact on NPT see http://www.vmware.com/files/pdf/large_pg_performance.pdf. The artificact that you are noticing is also described in Appendix B. The option that I provided you disables large pages for all guest and it also documented in the whitepaper, I will find out if there is a per VM config option to selectively enable large pages for workloads that might benefit from large pages.

0 Kudos
mcwill
Expert
Expert

Sorry for all the questions, but I'm afraid your response raised more queries...

1) I had assumed hardware MMU replaced software MMU, but this may not be the case? You talk of nesting page tables which I assume means s/w MMU is still running. If this is the case, is there any benefit in running hardware MMU with small pages?

2) The "Large Page Performance" article talks of conguring the guest to use Large Pages, this isn't something I had done for any of our guests. Do the tools handle that now or is it still a required step?

3) I have to disagree as to whether this is a bug or not. Large pages make 80% of our VMs alarm due to excess memory usage, and negate one of the main differences between ESX and Hyper-V.

0 Kudos