After several in-house tunning engagements, I keep wondering how to measure the impact of memory balooning, swapping, etc, on overall performance. Chiefly I'm wondering how can the effect of those memory management operations translate into slowness. So we have swap-in and swap-out rates, and we have baloon sizes, but I'm wondering what's the effect on slowness. Having a hard time quantifying just that.
It may seem that it all boils down to CPU. When either the Guest OS and/or the ESX are moving memory pages, my humble assumption (please scratch my post for eternity if that's not the case ) is that when memory pages are moving, CPU's are busy moving them (or perhaps waiting idle for other CPU's that are doing that for them, if that's a possibility in VMware logic).
Along that specific line I wonder how shall I estimate the reflection of page moving (i.e. swapping, balooning) on CPU activity.
Obviously this assumes CPU activity impact can provide a fair approximation into the effect of memory management on overall response times.
In vSphere metrics (using the user interface client not esxtop or API's) I consume a multitude of performance quantifiers about both CPU and memory (counters/rates/sizes), whereas there is practically no information about which quantifiers are contributors to one another, or otherwise a clear indication of time spent on all those memory management operations. Mathematically speaking, the amount of free variables in analyzing an ESX actively running many virtual machines, makes it very questionable whether that can be deduced from just watching data that is mostly indirect to this..
I am talking VMware 4 if that matters to this.
It would be of great service and transparency to divulge some solid details about this aspect here!
I thank you in advance for solid information on this practical aspect,
If you're tracking the CPU metrics in your OSes along with the CPU metrics for the VMs themselves, you should have your answer: the difference in the total is "cost of overhead."
If you want to isolate a single VM, you may need to take advantage of CPU affinity to isolate a single VM to a group of cores/memory (NUMA) and that same group of cores/memory (NUMA) to the single VM. You would likely miss some "generalized" overhead (i.e. overhead of software iSCSI, vSwitch, NAS, etc) but the VM-specific overhead (device emulation, virtual machine management, etc.) would be easier to devine.
This would be VERY specific to your platform and would not likely result in any "rule of thumb" anecdote however. Still, helpful to know how your environment performs under targeted workloads... When you add dependent operations like memory balooning, swapping, etc. you get into a lot of complex issues (storage performance, OS-specific swap performance, network performance for NAS/iSCSI, etc.) Just looking at the storage relation, the workload profile of heavy paging could create a "ripple effect" on other system performance that needs to be factored into the equation (if you're looking to predict future performance, etc.)
Thanks for your great response both in the details and overall. Indeed the ripple effect would be hard to isolate.
Concerning the measurement of the effect of memory management, are you sure that the plain difference is the best approximation?
It seems that some of the memory management operations of the ESX would trigger the guest OS to take some memory management actions, and wouldn't that mean that some of the OS utilization is actually part of the balooning/swapping overhead, substantially skewing the results?
My question originally arises in trying to assess the impact of some memory swapping/balooning on overall performance of an ESX server running multiple virtual machines. As VMware tools isn't installed on all guest OS's in this case, I assume I see more swapping than balooning as VMware may rely on the VMware tools for balooning. But I think the original question is the more interesting one, i.e. how to measure impact of memory management on response times or CPU utilization. Would you care to comment just once more?
First, VM swap within the OS takes place regardless of external memory pressures (i.e. host pRAM allocation or deficit). Most guest OSes page inactive pages to swap to increase available memory for active applications and to utilized memory for I/O caching. Applications like MS SQL can/will pre-allocate their memory (under user runtime configuration control.) In the SQL case, you can dynamically allocate, allocate statically but be swap eligible, or allocate statically with no swapping of SQL memory. In Windows there are registry settings to promote similar choices for kernel pages.
Second, VM swap outside the OS requires memory pressure where pRAM is running into an allocation deficit (i.e. memory oversubscription). Memory compression and shared memory pages in ESXi help to reduce these external pressures on memory, but once you hit a wall, active swapping can take place with priorities similar to OS swap based on the ESXi virtual machine manager. This will NOT happen if memory for the VM is reserved.
(YellowBricks has a short article that might be of interest, including a few reference links: http://www.yellow-bricks.com/2008/06/16/swapping-esxtop-and-procvmwareschedmem/)
Third, memory ballooning helps to resolve external memory pressure across the ESXi host by taking advantage of swap-ready memory pages within the guest OS as a means of granting-back inactive memory (pRAM) to the host for allocation by another VM. This requires the VMware Tools to be installed in every VM that will grant-back memory using ballooning. While VMs without VMware Tools cannot release their swap-ready memory back to ESXi for reallocation, they can benefit from this behavior in other VMs to reduce memory performance bottlenecks that would otherwise result from VM swap (ESXi) conditions.
So, when you add up the implications of the above you get a lot of variables to factor into your performance equation; too many for my taste. The best practice for performance is to avoid the need for VM swap (ESXi) where applications are sensitive to memory performance. Fortunately, it is easy enough to create an environment where these conditions take place without too much trouble. The vSphere Memory Resource Management white paper is a great source of ideas (and explainations) here:
It might not be readily obvious, but using a dummy VM with memory reservation can help create external memory pressure on a test VM. This can help you generate the "tests" you need in a controlled way. Likewise (as in the white paper) you can test with/with out ballooning to see the impact. Simply stated, if you're running hosts where VM swap (ESXi) is a dominant factor, you're going to need more pRAM.
The great thing about vSphere's inclusion of these resources is that they minimize the performance impact when they are needed.Given an environment without them, you're only choice would be fewer workloads or tighter guest memory provisioning...
Thanks again Collin,
I will go deeper into those articles you mention.
Insofar the VMware metrics for memory operations are very obscure in the VMware user interface, so I can't pinpoint whether I have a memory swapping problem or not. I will be looking into it more, and find a way to measure the total CPU overhead on the ESX at question, under the partially vague assumption that most of it goes into memory management. Just curious - any major differences between VMware 4 and 5 in the concerns of memory management?