Solved: Re: Ignoring vROPS alarms for memory stress?

jengl · ‎09-16-2016

Hey all,

I am still having difficulties understanding the metrics active memory in vCenter and memory demand in vROPS and how they relate to the metrics from inside the guest OS:

In my understanding, active memory tries to sample the actual memory usage of the VM on the ESXi level (see: Understanding vSphere Active Memory - VMware vSphere Blog). Memory demand in vROPS is derived from this metric with some RAM added for the OS (source: Iwan Rahabok‌´s great book: VMware Performance and Capacity Management - Second Edition).

Also he is saying that RAM should be monitored on the guest OS level (see: How to monitor Windows RAM usage with vRealize Operations 6.1), so this is what we are doing with an Icinga agent inside the guest OS.

The VM is a Windows backup VM with 16 GB RAM who is doing a full backup at Friday night:

For this VM vROPS tells me to extend RAM to 17,34 GB because of memory stress:

And now the same timeframe for this VM from inside the guest OS through the monitoring agent:

As you can see the maximum value of memory used from inside the guest OS is approx. 9 GB.

So my question is can I ignore the alerts from vROPS regarding memory stress and where is the big difference coming from?

Does the guest OS maybe read from the page cache of Windows and this is reflected in vROPS but not counted as used memory inside the guest OS?

And if I lower the RAM for this VM will the backups run a lot slower because of the missing page cache?

For any insights I would be really grateful!

Thanks and regards,

jengl

Iwan_Rahabok · ‎10-15-2016

Regarding "Why should a Linux OS need so much more than a Windows OS?"

I don't know.... I wish I know. It's in my to do list to learn about Linux memory management, but that's rather long down the list 🙂

Regarding "Can you confirm that I can ignore/disable these alarms in vROPS?"

It depends. I will answer this generically so others can benefit:

Do you have a better solution? If not, having some visibility is better than 0. If yes, then turn it off. If you have vR Ops 6.3 and vSphere 6.0U1, then turn off the vCenter Active alarm.

e1

View solution in original post

Iwan_Rahabok · ‎09-16-2016

jengl‌ I'm travelling right now. Apology can't reply properly.

Do review this My VMs are running at over 90% RAM, I need MORE RIGHT NOW!! | Virtual10 by Manny Sidhu.

Just FYI: 6.3 spots ability to get Guest OS RAM without using agent. We added that for the reason you're having. See this What's new with vRealize Operations 6.3 - virtual red dot

e1

jengl · ‎09-19-2016

Iwan Rahabok‌: thanks for answering anyway.

Maybe you can elaborate a little more when you are back home since the mentioned blog doesn't exactly answer my question.

Manny describes quite the opposite situation where vROPS shows lesser RAM than actually used, in my situation vROPS shows more active RAM than the OS and I want to understand why and if I can ignore the vROPS alarms for that.

Regarding the new ability of 6.3: Thats great, I was waiting for that for a long time . Since we are not on vSphere 6 (yet) we cant use the feature right now, but I am looking forward to it. Do you think the stress calculation and oversized VM-reports in vROPS will be changed in a future version to use the new metrics and not active/demand anymore? That would be way better IMHO.

Thanks!

jengl

Iwan_Rahabok · ‎10-08-2016

Difference result can come when

measurement is taken at different level.
different sampling technique, size, frequency, or simply different algorithm is used. vCenter code and Windows code are likely not the same. I do not believe both use every single RAM as it's costly. I think they are sampling.
the 2 levels are not giving the full transparency. Hypervisor does not have awareness of soft page fault, hard page fault, and how Windows divides its RAM. It only knows read or write.

I'm scheduling a blog on Monday 7 am California. Hope that explains it.

In short, I'd recommend:

For Apps that manage its own RAM, use metrics from the Apps.
For others, use metrics from the Guest OS.
Use vR Ops Demand if you have no Guest OS visibility. Do not use vCenter Active nor Consumed.

e1

jengl · ‎10-11-2016

Thanks Iwan Rahabok‌ for coming back and for that blog post.

So if I understand you correct I can ignore or disable these alarms where vROPS tell me to upsize RAM for this VMs, if the OS tells me everything is ok (without Cache in this instance)?

Also do you know anything about my other question: Do you think the stress calculation and oversized VM-reports in vROPS will be changed in a future version to use the new metrics and not active/demand anymore?

Edit: Regarding your blog post (Right sizing VM Memory without using agent - virtual red dot) I have two additional questions:

- Do you have an explanation why VM Demand in Linux is so much higher to VM Active as in Windows? In Windows its almost always some hundred MBs more and in Linux in some cases 20 GB!

- Why do the guest metrics don`t add up to 100 %? Isn´t Used + Cached + Free all there is?

Kind regards,

jengl

Iwan_Rahabok · ‎10-11-2016

Regarding future product question, could you reach out to me via your work email, so we can get other folks involved formally? It is something I've started looking, but unable to comment meaningfully in the open.

I'm not sure why Linux demand is higher than Windows. I have started learning Linux Memory management. No real result so far. If you have a good link, let me know. I have not found any real good one. I hope it's not like Windows, where the info at various MS technite sites are contradicting.

Yes, I spotted the numbers don't tally, hence I highlighted it, hoping for Windows or Linux expert be able to tell us. All these counters are from Guest OS now.

e1

jengl · ‎10-12-2016

Ok Iwan Rahabok‌, will do that.

Regarding the higher Linux demand, I thought demand is a calculated metric from vROPS derived from active + additional MBs for the OS? Why should a Linux OS need so much more than a Windows OS?

Last but not least: Can you confirm that I can ignore/disable these alarms in vROPS?

Kind regards,

jengl

Iwan_Rahabok · ‎10-15-2016

Regarding "Why should a Linux OS need so much more than a Windows OS?"

I don't know.... I wish I know. It's in my to do list to learn about Linux memory management, but that's rather long down the list 🙂

Regarding "Can you confirm that I can ignore/disable these alarms in vROPS?"

It depends. I will answer this generically so others can benefit:

Do you have a better solution? If not, having some visibility is better than 0. If yes, then turn it off. If you have vR Ops 6.3 and vSphere 6.0U1, then turn off the vCenter Active alarm.

e1

jengl · ‎10-18-2016

Iwan Rahabok‌:

Yeah, thats true, some visibility is better than nothing .

What do you mean with your last sentence?: If you have vR Ops 6.3 and vSphere 6.0U1, then turn the vCenter Active alarm.

Regards,

jengl

Iwan_Rahabok · ‎10-18-2016

My bad. Looks like my fingers and brain had intermittent connection, and the 2 eyes supported the fingers 😉

I've corrected the post. It says "turn off" now.

e1

Mike_Gelhar · ‎10-19-2016

I'm not a Linux expert, but one thing I've come to learn because of situations like the one you're describing here, is Linux disk cache. The Linux OS will take any free RAM and turn it into disk cache to help speed up the system. This is expected behavior and should be messed with unless there is a very strong case for the exception. If an application needs more memory, the disk cache immediately releases the RAM and the OS gives it to the application. Because disk cache is always "consuming" RAM for this purpose, the metrics will always show high memory consumption even though demand is much lower. This site explained this well for me: http://www.linuxatemyram.com/ Hopefully someone more knowledgeable on this topic can add or correct my comments.

jengl · ‎10-24-2016

Thanks Mike_Gelhar for your input, but I think Windows does the same and call it Standby.

So Standby in Windows = Cached in Linux.

Demand is a calculated metric in vROPS and I thought I get it, but it seems it is calculated different for Linux and Windows.

Iwan Rahabok‌: Please correct me if I am wrong and thanks for helping me another time with my questions.

Kind regards,

jengl

tuscani · ‎10-25-2016

Standby RAM in Windows is not exactly like cache in Linux. Rather standby RAM is more correlated with available RAM in the Windows world.

Available = All physical memory which is immediately available for use by applications. It wholly includes the free RAM, but also includes most of the cached. Specifically, it includes pages in standby. Those being pages that hold cached data which can be discarded, allowing the page to be zeroed and given to an application for use.

Cached = First thing to note is cached does not include the free portion of memory. And yet you might see that it is larger than the available area of memory. That is because cached includes cache pages on both the standby and modified which is allocated by applications and then removed from the application's working set, usually because it hasn't been used for quite some time. Cache pages on modified have been altered in memory. No process has specifically asked for this data to be in memory, it is merely present as a consequence of caching. Therefore, it can be written to disk at any time (not to the page file, but to its original file location) and reused. However, since this involves I/O, it is not considered to be available.

jengl · ‎12-07-2016

Hi tuscani‌

sorry for the late answer, but I am a little busy this time.

I don't completely get your post:

- Standby RAM in Windows is part of available RAM

- Most of Cached RAM in Linux is part of available RAM, but you are right not all of it

- There is no standby-list in Linux as far as I know

- There is no cached metric Windows as far as I know

Maybe you can clarify a little more.

Regards,

jenlg