VMware Cloud Community
nicholasroberts
Contributor
Contributor

Guest OS to vCenter Performance Monitor - Memory Reporting Mismatch

Hi,

I've attempted to perform quite a bit of researching regarding my issue, however, I don't seem to be getting a 'good' answer resulting in action items / a way to resolve my issue.

I have many VMs, which seem to occasionally experience high memory usage, more 90%, or spiking to 100% usage. However, I never get an alert in VMware. In fact, if you check the performance alerts, in either VMware Console, or in vRealize Operations Manager, it only reports I am hitting 70%-80% usage.

Windows Server 2012R2, Task Manager

MSTR_Memory1.png

Real time report in vCenter VMware Console

MSTR_Memory2.png

This is making it difficult for us because we have traditionally relied on VMware reporting for alerting on these events, not Windows reporting.

My research keeps circling me back to Large Memory Paging as the issue. I am reading, that if I turn off LMP, my issue will go away, and reporting will be fine. However, if I turn it off, I may experience reduced performance. As you can imagine, I don't want that. Unfortunately, most of the articles I find regarding this 'issue', are 3-6 years (or more) old; I haven't found anything recent/current.

What I don't know is, how long this has been an issue, however we first noticed it about 2 weeks ago and have noticed it on multiple VMs (at least 10 VMs). The only recent change to our environment was installing the latest patches/updates/builds from VMware about 4 weeks ago. I've been working in this environment for almost 4 years (I've been working with VMware since ESXi 3.5), and 2 weeks ago was the first time I've seen this issue. So, was there a patch released that caused this? Or is this something that we've likely always had, and I am just now noticing?

I am running VMware 5.5.0, 4345813. vCenter on Server 2008R2.

Any help or insight on this matter would be good. I haven't opened a support ticket for this, as I am not sure what, if anything they could do for this.

I'm willing to test turning off Large Paging, but I wouldn't be able to touch any production servers until after Jan 7 (we freeze production changes during the holidays). I also wouldn't know where/how to test performance impact to our applications. Unfortunately, our application teams don't partake too much in testing with the infrastructure team (they usually claim to only have time to move forward with 'projects' (new customers, new projects, etc), not operational stuff).

Thoughts?

Thanks,

NR

8 Replies
Cloud_Infrastru
Contributor
Contributor

Hi,

i have the same issue with a lot of virtual machine both windows and linux.

i attached a screenshot of one vm (example).

utilization memory from vcenter

from vcenter.JPG

recommended size by vrops

from vrops.JPG

use of memory by guest operating sytem (2012R2)

from Windows.JPG

i'm running on vsphere esxi 6.0U2

this machine needed more memory or not? it is possible to reclaim memory or not? 

it's necessary to install the epops agent inside the virtual machine?

thanks

Marco

0 Kudos
vXav
Expert
Expert

The memory usage that you look at can be misleading sometimes, especially when dealing with boxes that cache data in ram like DB boxes.

the active memory usage is actually an estimate by vSphere of the size of the guest working set by looking at the memory accesses with a sample (without involvement of the guest). Which is why this metric doesn’t seem to make any sense for the db boxes as they already have a large amount of memory reserved for the db for fast access (see guest OS view) but not actually accessed at time t (see vSphere view). 

So vSphere sees the guest accessing X GB of memory and says “there are X GB of active mem”. But there are already Y + X GB “active” (Y being the memory not accessed during the sampling period).

vCenter will send an alert when vsphere says “90% have been active (accessed) for 5 minutes”, which is unlikely according to the amount of “static” stuff in memory, or maybe during heavy load periods.

Generally the vCenter alerting is fine but in these cases it is more accurate to have a guest centric monitoring solution with an agent running like nagios/zabbix.

Extract from VMware doc:

“ESX uses a statistical sampling approach to estimate the aggregate virtual machine working set size without any guest involvement. At the beginning of each sampling period, the hypervisor intentionally invalidates several randomly selected guest physical pages and starts to monitor the guest accesses to them. At the end of the sampling period, the fraction of actively used memory can be estimated

as the fraction of the invalidated pages that are re-accessed by the guest during the epoch. ESX uses a statistical sampling approach to estimate the aggregate virtual machine working set size without any guest involvement. At the beginning of each sampling period, the hypervisor intentionally invalidates several randomly selected guest physical pages and starts to monitor the guest accesses to them. At the end of the sampling period, the fraction of actively used memory can be estimated as the fraction of the invalidated pages that are re-accessed by the guest during the epoch.”

nicholasroberts
Contributor
Contributor

Hi,

I completely understand the inactive vs active memory, but in my example, all memory is active right now. The application and server are actively crashing, but VMware doesn't report that. I also understand that if the VMTools aren't up to date, or in a stopped (or crashed) state, it won't report (or report correctly). However, the VMTools are actively running and reporting.

MSTR_Memory3.png

MSTR_Memory4.png

If I put a WMI based monitor on it (What's Up Gold, SCCM or IT360), it realizes and reports the same/similar data. By the method of sampling you mention, this server started spiking memory yesterday, and has not dropped below 98% since it spiked. I understand that installing agents to monitor my environment may be better, but I don't like having to install an agent on every server. While your answer is very detailed and documented, I'm not convinced at this point, it applies to my exact situation. I've seen and have examples of what you describe, but I don't think that applies in this case.

0 Kudos
vXav
Expert
Expert

From a monitoring perspective, I am pretty that what I described applies actually. Your wmi monitoring reports more accurate data because it is guest centric as opposed to vsphere that reports what it sees going through the pipe.

If you are licensed for vrops advanced (as mentionned there), like you said it would be more relevant to install the vRops agent on your top tier VMs and monitor them from there.

If this change in behaviour isn't expected (no changes have been made) then I would try and pin down the issue on the guest OS like check the eventvwr for errors, user sessions disconnected but still here, stuck process, maybe even try to give it a reboot if that's an option.

If this increase in workload can be explained (lot of new rdp users, new service, or whatever) then I would add an extra 5GB of ram to give it some slack and adjust if needed.

0 Kudos
nicholasroberts
Contributor
Contributor

I'm pretty confident that what you described is not 100% accurate for my case...

I'm really looking for someone to comment on the Large Memory Paging. This VM, is not a DB server and is not caching much. I agree with you 100% that the server can use more RAM. I don't need help with the monitoring aspect of Windows or VMware, I need help understanding the Large Memory Paging. I understand active vs inactive (I understand OS memory reserving and caching and how it is reported in VMware).

In the case of this VM, and a few others in my environment (which are NOT DB servers), I am getting mismatched reporting from what the OS is telling us, and what VMware is telling us, without clear reasoning as to why. There is no reason for the mismatch, aside from what my research tells me about Large Memory Paging being to blame. There are articles out there from multiple C-level VMware employees regarding this issue with Large Memory Paging. I need to understand the pros / cons about changing this setting, in an easier to understand format for my simple mind.

I appreciate your help, condescending attitude and willingness to harp on the same issue repeatedly.

0 Kudos
vXav
Expert
Expert

Sorry if I came accross as being condescending I just wanted to point in that direction as this is often the case.

You ran updates and patches recently so, have you tried upgrading the virtual hardware of the VMs (Upgrade VM compatibility) ?

Note that it's important that the VMtools are updated beforehand.

0 Kudos
nicholasroberts
Contributor
Contributor

Yes. I that is all updated (and in the correct order as mentioned).

I was reading some articles from Duncan Epping on his blog (yellow-bricks), but could not come to any final determination. I found a couple other VMware KB articles and forums, but none were current (referring to versions 4.X and 5.0 anywhere from year 2009-2011), I would like to find one that is a little more current to my version and year I live in. It was like the topic / issue went away.

0 Kudos
vXav
Expert
Expert

Hi, I came across this thread again from a while back. Did you ever get to the bottom of it?

0 Kudos