We are using esx 4.0 build 164009 on some Dell R710 servers, with the Nehalem processors. The problem we are experiencing is that the Active Guest Memory for VM's is reported incorrectly when checked within the VM. This also effect the Memory Usage on the ESX server, so we are nog sure if the server is really using all this memory or that this is reported incorrectly as well.
We are running 9 VM's on one of our hosts, and most of them report guest memory around 90%. One of the VM's (as an example) is reporting the following:
So the Consumed Host Memory is 4175 MB, while the Guest Mem - % value is 97%.
If we look inside the VM's, the Physical Memory is as follows:
Furthermore, if we look at the Consumed Host Memory for all VM's and add that up, we get near the amount of Memory Usage reported for the entire ESX server.
So we see two things that are strange to us.
First, we would expect the Memory usage value for the server to be all the values of Active Guest Memory combined plus a small overhead for ESX itself. If we do this for our server this would be a value of 19332 (excluding the ESX servers memory), while reported is 25075.
Second, we would expect the amount of memory reported as Active Guest Memory for our VM's to be in the vicinity of the total amount of memory minus the free memory reported within the OS.
So is this just false reporting, or does the amount of memory used for the esx server, really include all the free memory reported in the VM's. Normally, we would dare to overcommit assigned memory, as long as there was enough free memory available, however this way we don't.
We have looked at this thread and I see some similarities: http://communities.vmware.com/thread/211585
Any help is appreciated.
We should NOT compare the "Memory statitics Given by the VM when you check it from the VI client" and "the Memory statitics Given by the GOS". ESX have memory features like Page sharing and Balooning which enable ESX machines save memory used by GOS.
I agree that there can be a difference, but that difference would be the other way around, eg. Windows would think it was using more memory than ESX was reporting (when using TPS). Ballooning is just a technique to let Windows decide what to swap on an ESX server where memory is scarce, that is not the case here.
Can you goto ESX-> configuration-> memory from the VI client and check the following
Total = ?
System = ?
Goto VM -> edit settings-> resurces-> Memory and check
check memory usage
Please pointout where you are finding the incorrect memory data.
Total: 32758,6 MB
Virtual Machines: 29890,0 MB
Service Console: 500,0 MB
Reservation = 0 MB
Guest is shown in my first post.
I see the memory usage in the Summary screen of the ESX server and of the VM
Nothing yet. The only link I found that described anything like it (with some additional information, that was really interesting, but didn't lead to a solution) was http://communities.vmware.com/message/1261478
2) VM memory alarm - this is a separate issue and it is not dependent on page sharing. VM memory usage alarms turns red whenever the guest active memory usage goes high. Guest active memory is an estimated through random statistical sampling and the algorithm that the hypervisor uses to estimate active memory usage of a VM overestimates active memory when the guest small pages are backed up large pages (since active memory estimate is done with reference to machine pages) and this is a bug. For now you could simply ignore this alarm (since it is a false alarm), I was told that we will be fixing this pretty soon. However note that this will only fix the alarm, the memory usage of the VM will still remain the same.
Not sure if this means the memory is really used or just falsly reported as being used. And I'm still hesitant to memory overcommiting the servers.
That is exactly where we are at. This is a migration from some older ESX 3.5 servers to new HP G6 servers with 30% more RAM than the old servers. The old servers did not have this issue and were way over committed on memory. I am thinking of opening up a case with VMware but I am concerned about waisting a lot of time with no solution. I have been there before.
So far we have installed 4.0 at 2 client sites
The first site was a completedly new VMWare installation onto three IBM HS22 balde servers. Loads of memory.
Just 1 Windows 2003 VM on a blade is enough to set these alarms off.
2 of the VM's were migrated from an old 3.5 host with half the memory and no memory issues prior to the migration.
The other site was an inplace upgrade of 2 x 3.5 hosts and the replacement of a third host with new hardware and twice the memory.
Some of the VM's on all three hosts are showing this memory alarm. Appears to be mostly DB servers (SQL, Exchange and AD).
THe VM's themselves appear to be functioning perfectly well but its the hosts that are the consern as they appear to be struggling with the memory.
If its a faulty alarm then it would be useful to know! I really hope it is because if its not then there is a really big memory management problem with ESX 4.0!
> but that difference would be the other way around, eg. Windows would think it was using more memory than ESX was reporting
No its not the other way around. Windows expects to be the top level OS, and is this Windows 2008? Windows wants to consume ALL available memory, so 4GB is what it SEES, and therefore makes avaialble, since Windows ASSUMES it's on physical hardware. ESX provides a VIRTUAL machine (hence VM hypervisor to give an OS a machine which emulates REAL hardware).
ESX tries to understand what the OS is using, and only provide the memory that the OS needs, but if Windows allocates the memory as cache, that is still usage. ESX can't predict how Windows will use the memory. So sometimes this is what you get, Windows grabbing the memory, and it will decide how to allocate it. IT's free because it's NOT using it for services, but it could..
> report active memory at near maxium when the guest is actually using half that.
Ah but Windows IS using the memory, for cache, or whatever. IT TAKES the memory then decides what to do with it after. Windows is a bare metal OS just like ESX. So Windows wants to grab the memory and use it.
If you go inside take a deep breath and hold it, your lungs will use the air as needed, but you gulped up as much air as you could hold, and your lungs decide how the air is to be used. Assuming you could hold your breath forever, and your lungs were as big as a house, you could take in a lot of air. If you never exhale your body would become a vacuum, and no air left for anyone else. Just because you are using the air, you consumed the air, and the air provided by the house has no air because your lungs sucked up all the air. So is that air ACTUALLY used or not? It doesn't matter, its gone.
ESX is the house, the air is the memory, and Windows is your lungs. That's a loose interpretation, but that's really what is going on.
> then there is a really big memory management problem with ESX
None of the above. No faulty alarms, no 'memory management' issue in ESX. Windows takes memory, thats what it's designed to do, OS (operating system) it decides what to do with the memory you gave it. You give it 4GB, its taking 4GB. ESX can sometimes share the memory, among other Windows machines, because like kernels use the same DLL's, and memory can be shared when it's duplicated.
If you could see registers in a physical box, ALL the memory is consumed by Windows. That's the nature of Windows and how it works. This is normal, add some more VM's same type, and you will see the memory decrease over time.. Nothing to get excited about. The more similar VM's you have and the longer time that VM runs the memory will work itself out. The memory will be shared among other VM's. Therefore ESX won't need to provide as much per VM.
So if that's the case why has it only "suddenly" happened when we go to ESX4.0?
It didn't alert with ESX2.0, ESX 3.0 or 3.5! THese are the same VM's. So why the differnt behavior?
I agree with Scottca68, also, if this was the case, no memory overcommitment would be possible, while this is one of the USP's from VMware. Also, if you look at the original question, the values are:
Furthermore, the value "Active Guest Memory", is (definition taken from a session at VMworld) defined as "Amount of memory in megabytes actively used by guest operating system and applications" this is opposed to the value "inactive memory", which is defined as: "allocated memory not recently accessed or used".
So the statement that an OS installed in a VM does not have "empty" memory pages is correct, however this does not make it Active Memory.
Finally, if you follow the thread: http://communities.vmware.com/thread/211585 you can see that even VMware is considering this to be a bug, and it aiming to fix it in Update 1 (maybe sooner).
It looks like the question was answered with this KB article.
We have setup a baseline and look to deploy this soon. There also appears to be a patch for ESXi 4.0 as well.
I deployed the patch referenced in the KB article this morning. It fixes the vCenter reporting issues with false alarms, but DOES NOT fix the Page Sharing problems on Nehalem based servers. I quickly turned the mem.AllocGuestLargePage back to 0 after I noticed that page sharing was still not working properly.
I have two Exchange 2010 VMs that used to share close to 2GB of RAM (Resource Allocation tab, Shared ). After the patch and setting mem.AllocGustLargePage to 1, they shared less than 50MB.
The patch does not specifically say it was intended to fix the page sharing problems. VMware has dropped the ball on this one in my mind. vSphere was released in either April or May with a HUGE bug. VMware and Intel were jointly pushing the Nehalem architecture for virtualization. VMware support claims that since Nehalem was new they didn't have access to it. I doubt that is really true given that Intel and VMware have been collaborating for a few years on virtualization.