Bill_Morton
Contributor
Contributor

Seeing very low memory sharing with ESXi

Jump to solution

We have recently migrated our environment from ESX4 on HPDL385 G5's to ESXi4.1 on HPDL380G7's and I am seeing what I consider very low memory sharing levels, and I wanted a second opinion on this.

Today I put two of our hosts in maintenance mode to consolidate the cluster a bit just to make sure it was not due to machines beign very spread out all over the place.

A quick background of the setup - the cluster has two resource pools, DevTest and Production.  Both are expandable and unlimited, DevTest has low share values, and Production has high share values.

When I look at the summary of the resource pool, I see pretty much what I expect with DevTest, 11GB shared, but production, which has 5x more machines in it, is only sharing about 1GB.

When I run ESXTop on the host, even while running at 88% memory, with many of the same type of operating system on the machine, I am only seeing 1GB of PSHARE on the counters.  This strikes me as very low, and I need some direction as where to start looking.

I have attached some files of what I am seeing as well.

ESXiTopMemShare is a shot of the memory page on ESXTop showing the low Pshare value.

Dev_test_mem shows the summary of the dev/test resource pool.

Production_memory shows the summary of the production resource pool

Guest_OS shows the type and number of operating systems running on the host that the ESXiTop memory screen shot was taken from - showing indeed there are many similar systems running on the host.

0 Kudos
1 Solution

Accepted Solutions
ChrisDearden
Expert
Expert

That will be your new CPU's using large pages , which wont share - when you hit memory contention , those pages will get converted to small ones which will start sharing.

http://www.yellow-bricks.com/2011/01/26/re-large-pages-gabvirtualworld-frankdenneman-forbesguthrie/

If this post has been useful , please consider awarding points. @chrisdearden http://jfvi.co.uk http://vsoup.net

View solution in original post

0 Kudos
4 Replies
ChrisDearden
Expert
Expert

That will be your new CPU's using large pages , which wont share - when you hit memory contention , those pages will get converted to small ones which will start sharing.

http://www.yellow-bricks.com/2011/01/26/re-large-pages-gabvirtualworld-frankdenneman-forbesguthrie/

If this post has been useful , please consider awarding points. @chrisdearden http://jfvi.co.uk http://vsoup.net
0 Kudos
Bill_Morton
Contributor
Contributor

Chris,

Thank you for the link.  Just to make sure I am understaning the issue presented -

The new processors (nehalem) support large pages 2M vs 4K which ESXi is taking advantage of for the guests.  The good is that the 2M block offers lower latency to the guest, the bad is it is much 'harder' to find identical 2M blocks to share out via TPS which is why I am seeing such low numbers for PSHARE.

I guess I just got used to seeing the larger share numbers from the 4.0 and earlier releases on non large-page supported processors.

We are not planning on our hosts running overcommited unless we have a multiple host failure, so I think I will just try to shift my thinking a bit.

0 Kudos
ChrisDearden
Expert
Expert

Thats pretty much it -I remeber getting an euqally large shock when I realised how much RAM my VM's were consuming compared with my old 3.5 hosts. Its only when you really loads it up that it'll convert back to the small pages , which are much more TPS friendly.

If this post has been useful , please consider awarding points. @chrisdearden http://jfvi.co.uk http://vsoup.net
VMmatty
Virtuoso
Virtuoso

The big shock that many people have is that they used to use the "Available Memory" graph on their ESX/ESXi hosts as a way to do some basic capacity planning.  Now when you start using large pages that goes away, so you have to actually start looking at the memory usage of your guests or use tools like CapacityIQ to do that.

Not sure if this was mentioned in the blog post you referenced, but if you look in esxtop (or resxtop from the vSphere CLI or vMA), you can see a memory counter called COWH.  That stands for Copy On Write Hints, and is roughly the amount of memory that the VMkernel thinks can be shared w/ TPS if large pages were broken up into small pages.  Looking at that can give you an idea of what you might expect to reclaim from TPS if you overcommit memory.  It won't be perfect but should be close.

Matt

My blog: http://www.thelowercasew.com

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos