Re: ESXi Host Memory Swapping

STK2 · ‎09-04-2012

Hi

I have inherited the vSphere cluster below and I'm seeing Memory swapping which I would like to resolve.

Type of cluster = HA and DRS enabled
Number of hosts = 5 x ESXi 4.1 hosts with 96GB Memory assigned to each host.
Number of VMs = 120

The VMs are a mix of Linux and Windows with configurations ranging from 1 vCPU to 4 vCPUs and Memory configurations ranging from 1GB to 16GB. No reservations or limits are set on individual VMs.

HA Admission Control policy is set to a Percentage of cluster resources configured at 20%.

From esxtop I have noted the MEM overcommit avg:
ESXHost1 = 0.00, 0.00, 0.00
ESXHost2 = 0.00, 0.00, 0.00
ESXHost3 = 0.29, 0.29, 0.29
ESXHost4 = 0.20, 0.20, 0.20
ESXHost5 = 0.40, 0.40, 0.40

I.e. ESXHost3 is overcommitted by 29%, ESXHost4 by 20% and ESXHost5 by 40%.

Three resource pools are configured:
01-High = High shares
02-Normal = Normal shares
03-Low = Low shares

No reservations or limits are set on any of these resource pools.

Ballooning and Swapping shows the following.

Cluster        ResourcePool    SwappedMemory    BalloonedMemory
-------            ------------            -------------                ---------------
Cluster01    01-High                1205                        2228
Cluster01 02-Normal            5353                        15303
Cluster01 03-Low                11024                    24861

From my research it looks like I have two options to resolve the swapping problem:

Right size the VMs, i.e. use a monitoring tool to identify VM's with high memory allocation but low active memory usage and reduce the memory configuration.
Add more physical memory to the ESXi hosts.

Both of these options are feasible in our environment although I am guessing that management will go for option 2.

I have been wondering if anyone might have alternative suggestions as to how I can resolve the swapping issues right now. Reconfigure the resource pools perhaps? If so what configuration would you go for?

I am also keen to understand why Active Memory on each host is low yet we see swapping. The list below gives the average Active Memory over a one week period.

ESXHost1 = 20GB
ESXHost2 = 10GB
ESXHost3 = 13GB
ESXHost4 = 11GB
ESXHost5 = 10GB

The Memory usage as pulled from PowerCLI (RVTools actually) shows:

Host        # Memory    Memory usage %
ESXHost1    98,292.00    81.00
ESXHost2    98,292.00    90.00
ESXHost3    98,292.00    79.00
ESXHost4    98,292.00    77.00
ESXHost5    98,292.00    78.00

Is the swapping caused by the ESXi host using the Memory usage figure above rather than the Active memory figure and determining that the host is under memory pressure which is leading to swapping? Obviously the 20% HA reserve and resource pool demands need to be taken into account too.

I hope someone can shed some light on these questions for me!

Thanks very much

Sean

LuigiC · ‎09-04-2012

What does the esxtop "m" output look like? Lets look at esxhost5. I don't want to tstart looking at all of them at this point.

STK2 · ‎09-04-2012

Hi

Thanks very much for taking a look at my posting. I have included a screenshot of esxtop as requested. If you want the output in a different format please let me know.

I have removed the names of the VMs to comply with company policy.

Thanks again

Sean

admin · ‎09-05-2012

Probably easiest to download and install the vC OPs appliance to analyse your environment and see which, if any, VMs are over allocated resources, the free trial should be enough to give you some good insight into how your environment is running. CapacityIQ is part of vC OPs now and lets you run some nice reports on over-sized VMs, combined with the health/efficiency badges it should tell you which you can cut down as well as providing some good data on how your hosts are performing overall. http://www.vmware.com/products/datacenter-virtualization/vcenter-operations-management/overview.html

STK2 · ‎09-05-2012

Thanks Mittell

I have worked with vCOPS before and I will certainly do this. I am sure that most of the VMs are using nowhere near their configured memory allocation.

This should certainly help with the swapping issue.

Cheers

Sean

STK2 · ‎09-05-2012

Hi

I would like to add a few things to this thread having done some additional research.

I have been working with the 'VMware vSphere Health Check' PowerGUI PowerPack and it shows the following:

Host            TotalMemMB        TotalAssignedMemMB    TotalUsedMB    OverCommitMB    Check
ESXHost1    98292                     87040                                 79857                     0
ESXHost2    98292                     93184                                 90024                     0
ESXHost3    98292                     121392                               81188                     23100                Good overcommit
ESXHost4    98292                     116436                               77280                     18144                Good overcommit
ESXHost5    98292                     132608                               81624                     34316                Good overcommit

The "TotalAssignedMemMB" figure is correct.

I assume that "TotalUsedMB" is a sum of 'vmk' and 'other' as reported by esxtop:

PMEM /MB: 98292 total: 1669 vmk, 79316 other, 17305 free

Quoting http://www.gabesvirtualworld.com/health-check/memory-health-check/:

"The third column “Total Used Memory” shows how much physical RAM is really in use by these VMs. The difference between “Total Assigned Memory” and “Total Memory” (physical RAM) is the value in the fourth column: “OverCommit MB”. As long as the amount of “Total Used Memory” is lower than the total amount of physical RAM you should be safe. Even better, keep “Total Used Memory” around 15% less than the amount of physical RAM."

Quoting http://searchvmware.techtarget.com/tip/Using-the-esxtop-tool-to-identify-VMware-ESX-memory-use:

"The memory used by "other" is officially described as: "everything other than the ESX Service Console and ESX VMkernel." It is not necessarily all memory consumed by the VM. Each VM, for example, also has memory overhead. The amount of overhead memory depends on the type of guest OS,the number of virtual CPUs, configured amount of guest memory and on whether the guest is 32-bit or 64-bit. For example, a dual-CPU virtual machine with 2,048 MB memory will have 126 MB overhead as 32-bit system and 163 MB overhead as a 64-bit system."

Question 1: Why does "TotalUsedMB" differ from the host's Active Memory as reported by the Memory Performance graph?

Statement/Question 2: This PowerPack does not appear to be taking the 20% HA reserve into account. It is my view that once this amount is factored in the host begins to swap. Is this correct?

Thanks

Sean

STK2 · ‎10-18-2012

Hi

I managed to get an answer to this problem. Although the vSphere Client reported swapping this amount is in fact what has swapped in the past. As long as the SWR/s and SWW/s remain at 0.00 in esxtop I am fine.

What is interesting to note is that I placed one of the hosts in maintenance mode, restarted the management agents and exited maintenance mode and after completing this the Swapped counter reset to 0 (which makes sense as all the VMs were evacuated).

I must make mention that VMware support set up a meeting with one of their enterprise support engineers to help me out on this. They really went the extra mile.

jklick · ‎10-18-2012

Darn... you beat me to it! I'm glad you figured it out though. Let me see if I can find another way to earn some points... :smileylaugh:

Besides the lack of active swapping, there's one other thing I noticed in that esxtop screenshot that I've seen several times before in other environments and will most likely result in some more swapping. Look at the MCTLSZ column. That is the amount of ballooning taking place on the respective virtual machines. In many cases, this is simply caused by memory contention at the hypervisor level. However, in some of the extreme cases I'm seeing, they're usually caused by a configured memory limit.

Using the 7th VM in the list as an example, it has 4 GB allocated, 1.2 GB granted, 2.6 GB balooned, 200 MB swapped. My guess is that there's a configured memory limit right around the 1.2 GB mark. By default, a virtual machine is only allowed to balloon out 65% of it's allocated memory - in this case, 2.6 GB. Anything beyond that 65% is forced into host swapping, thus the small amount of swapping to compensate for the 200 MB gap between the configured limit and what ballooning was able to accomplish. As another example, I'm guessing the 5th VM in the list, has a limit configured around 512 MB. I would probably investigate many of the VMs in the list where you see current ballooning.

Not to step on the toes of anyone who has mentioned vC Ops already, I highly recommend giving VKernel's vOPS a try. It's built to show you these sorts of issues right out of the box and explain what you should do to fix them, so feel free to abuse a 30-day trial and let it help you clean up your environment.

Full disclosure: Since I mentioned a VKernel product in this post, I have to let everyone know I'm a VKernel employee or the powers that be can get cranky with me.

@JonathanKlick | www.vkernel.com

All

ESXi Host Memory Swapping