Yes it sounds as though you are hitting the same problem. (As a matter of interest what hardware is your host.)
You should be able to solve the problem by using the workaround provided earlier...
On your hosts Advanced Settings set Mem.AllocGuestLargePage to 0 and either restart the VM or migrate the VM off then back onto the host.
Yes it will have an impact on performance. one of the reasons that there's a performance increase with virtualized MMU is large pages.
VMware Communities User Moderator | VCP | VCDX
If you find this information useful, please award points for "correct" or "helpful".
Disabling Large pages for guest does not disable hardware MMU. It can however have performance impact if the workload has lot of TLB misses (usually happens when the workload accesses large amount of memory pages and TLB cache is not sufficient to store all those memory references).The additional nesting of page table pages with hardware MMU makes TLB miss cost more expensive (this is by hardware design and it not induced by software). Using large pages reduces TLB miss counts and thats why ESX transparently tries to use large pages for VMs on EPT/NPT hardware. This is an optimization that ESX does to maximize performance when using hardware MMU. The only problem is that when large pages is used, TPS needs to find identical 2M chunks (as compared to 4K chunks when small pages is used) and the likelihood of finding this is less (unless guest writes all zeroes to 2M chunk) so ESX does not attempt collapses large pages and thats why memory savings due to TPS goes down when all the guest pages are mapped by large pages by the hypervisor. Internally we are investigating better ways to address this issue.
So this is not a bug, its a tradeoff between better memory savings versus slightly better performance. Also this issue is not new to ESX 4.0, this should happen with ESX 3.5 as well if you have NPT capable processor. You will see this issue only with ESX 4.0 see if you have Intel EPT capable processor as ESX 3.5 does not uses hardware on Intel EPT processors (it only supports NPT).
For more information on large pages and its impact on NPT see http://www.vmware.com/files/pdf/large_pg_performance.pdf. The artificact that you are noticing is also described in Appendix B. The option that I provided you disables large pages for all guest and it also documented in the whitepaper, I will find out if there is a per VM config option to selectively enable large pages for workloads that might benefit from large pages.
Sorry for all the questions, but I'm afraid your response raised more queries...
1) I had assumed hardware MMU replaced software MMU, but this may not be the case? You talk of nesting page tables which I assume means s/w MMU is still running. If this is the case, is there any benefit in running hardware MMU with small pages?
2) The "Large Page Performance" article talks of conguring the guest to use Large Pages, this isn't something I had done for any of our guests. Do the tools handle that now or is it still a required step?
3) I have to disagree as to whether this is a bug or not. Large pages make 80% of our VMs alarm due to excess memory usage, and negate one of the main differences between ESX and Hyper-V.
>>You talk of nesting page tables which I assume means s/w MMU is still running.
Nope Nested paging is hardware MMU feature. Page table structures are maintained by software but hardware (MMU unit) fetches the information does the page table walk to fetch the information from the page table structure (or fetches it from TLB cache if it is already in the cache). Software MMU does not use nested page tables, instead it uses shadow pagetables and the hardware directly walks the shadow pagetables (and there is no additional cost for TLB misses)
>>is there any benefit in running hardware MMU with small pages?
Oh yes absolutely.
>>The "Large Page Performance" article talks of conguring the guest to use Large Pages, this isn't something I had done for any of our guests. Do the tools handle that now or is it still a required step?
There are different levels of large page support. Applications inside the guest can request the OS for large pages and OS can assign large pages if it has contiguous free memory. But OS mapped large pages (i.e Physical pages) may or may not be backed up by the hypervisor by actual large pages (machine pages). For instance guest may think it has 2M chunks but hypervisor may use 4K chunks to map the 2M chunk to the guest, in this case multiple 4K pages have to be accessed by the hypervisor so there is no performance benefit even though guest uses large pages. In ESX 3.5 we introduced the support for large pages in the hypervisor, with this support whenever the guest tries to back up large pages we explicitly go and find large pages to backup the guest large pages. This helps in performance as demonstrated by the whitepaper.
In addition to this, on NPT/EPT machines hypervisor also opportunistically tries to backup all guest pages (small or large) as large pages. For instance even if the guest is not mapping large pages, contiguous 4K regions of the guest can be mapped by a single large page by the hypervisor, this helps in performance ( by reducing TLB misses). This is the reason why you see the use of large pages even though the guest is not explicitly requesting for it.
>>I have to disagree as to whether this is a bug or not. Large pages make 80% of our VMs alarm due to excess memory usage, and negate one of the main differences between ESX and Hyper-V.
There are two issues here. 1) TPS not doing sufficient memory savings when large pages being used - this is not a bug. It is a tradeoff choice that you have to make. The workaround I suggested will help you to choose which tradeoff you want to make on NPT/EPT hardware. The second issue is 2) VM memory alarm - this is a separate issue and it is not dependent on page sharing. VM memory usage alarms turns red whenever the guest active memory usage goes high. Guest active memory is an estimated through random statistical sampling and the algorithm that the hypervisor uses to estimate active memory usage of a VM overestimates active memory when the guest small pages are backed up large pages (since active memory estimate is done with reference to machine pages) and this is a bug. For now you could simply ignore this alarm (since it is a false alarm), I was told that we will be fixing this pretty soon. However note that this will only fix the alarm, the memory usage of the VM will still remain the same.
Thanks Kichaonline for accurate information. A small correction -- we are currently investigating ways to fix the high memory usage issue also. Regarding TPS, as noted earlier this shoud not lead to any performance degradation. When a 2M guest memory region is backed with a machine large page, VMkernel installs page sharing hints for the 512 small (4K) pages in the region. If the system gets overcommitted at a later point, the machine large page will be broken into small pages and previously installed page sharing hints helps to quickly share the broken down small pages. So low TPS numbers when a system is undercommitted does not mean that we won't reap benefits out of TPS when machine gets overcommitted. Thanks.
We are seeing the same issue on our ESX Servers that have been upgraded from 3.5 to 4.0. After the upgrade most of our VMs are in alarm for excessive memory but inside the guest OS there is lots of free memory. For example most of our linux servers (CentOS x64) show 1.5GB free inside the guest but Infrastructure client is reporting it is using a full 2.0GB. If i shutdown/reboot the VM it will go down to 1.0-1.5GB of used memory but eventually will creep up but never go down.
We are using quadcore Xeon 5500s with 48GB of ram.
If 3.5 couldn't do hardware MMU with these processors would disabling hardware MMU give us the same performace we were seeing in 3.5? Or are we better off using hardware MMU and setting Mem.AllocGuestLargePage to 0?
It sounds like this may be a seperate issue that TPS, it seams as if ESX4 isn't reclaiming free memory that the Guest isn't actively using.
Rajesh do you have any details regarding the high memory issue that you said is being investigated?
If Im not mistaken Intel Xeon 5500s is Nehalem processor so yes you will notice high memory usage alarm in 4.0 and not in ESX 3.5. .As I mentioned earlier you could ignore this alarm as it is a false positive (happens only when large pages is used) and this issue will be fixed pretty soon.
>>are we better off using hardware MMU and setting Mem.AllocGuestLargePage to 0?
You should always use hardware MMU whenever possible. Setting Mem.AllocGuestLargePage to Zero is a workaround to get instant TPS benefits but also as a side effect it will fix the alarm problem (since large pages gets disabled with this option).
>> seams as if ESX4 isn't reclaiming free memory that the Guest isn't actively using.
That is not correct. ESX reclaims unused (but previously allocated) memory through ballooning and through TPS. The problem is when large pages is used, TPS doesnt kick in instantly, so you woulndt get the instant gratification of noticing TPS memory savings. However when your system is under memory over-commitment, vmkernel memory scheduler will break large pages into small pages transparently and it will collapse it with other shareable pages - this feature called "share before swap" is new to ESX 4.0.So you would still get the same benefits for TPS but only at the time of memory over-commitment.
To summarrize, when you use ESX 4.0 on Nehalem/Barcelona/Shanghai (EPT/NPT) systems,
a) ignore the high memory usage alarm - this will be fixed pretty soon
b) dont worry about TPS - it will kick in automatically when your system is under memory over-commitment
I'm seeing this same issue now as well. We have a start-up environment, one host is a DL380 G5 running ESX 3.5 u4, on hist is a DL380 G6 5540 running esxi 4. I'm seeing the high active memory issue on the 2008 and 2003 server running on the esxi 4 host. I'm hoping to hear of a fix soon as well, we are about to start rolling out a large number of DL380 G6's.