VMware Cloud Community
Number774
Contributor
Contributor

How much CPU should ballooning use?

This is a follow-up from my other thread "VM says CPU is maxed but host is nearly idle". I've had lots of help from Ken Cline there, but I think it's drifted far enough that I want a new title.

My system is a Dell 2950 with 8x2GHz CPUs and 20Gb RAM. It's running ESX 3.5.

The workload consists of:

1xWindows 2003 with 512Mb all reserved, 1 vCPU. Doesn't work heavily.

1xWindows 2003 server, 2Gb all reserved, 4xvCPU. Has peaks of load in the daytime.

2xWindows 2003 Server, 8Gb, none reserved, 4xVCPU. These are software build machines building two versions of our source code; the builds take nearly an hour each, and normally run once every 24 hours.

1xWindows 2003 Server, 3Gb with 1Gb reserved. This is a controller for the automated test system and also acts as a network file server for the test clients. Some of these are real computers, but the others are

4xWindows XP workstation, 1Gb none reserved. The automated tests take some hours, despite being distributed; they involve enough IO to soak the CPUs in the server above (peaks at 6GHz total, which is why it's multi-CPU) and also do a lot of CPU intensive work on their own.

We want responsiveness from the first two.

Total RAM requested is therefore 25.5Gb + overheads, perhaps 25% more than the physical RAM.

I expected that when a build runs the build VM will get its 8Gb and run through. It then goes idle, so I expect ballooning to cut in and steal its memory so that the other build machine or the automated tests can get all their requested memory. (I was surprised to find that memory sharing is capable of crawling through the memory in the system and sharing enough of it so that when idle no ballooning is required. Quite impressive really.)

When ballooning is in progress, the System process on the VM being ballooned eats lots of CPU cycles. This can be as high as 6GHz worth of CPU. I can't tell exactly what it's doing, but it's definitely linked to ballooning - if I change the reservation on one of the build VMs to 4Gb while the automated tests are in progress, its CPU use drops to <100MHz. This seems an awfully high amount of CPU. Why? Is there anything I can do about it?

Thanks

0 Kudos
5 Replies
Ken_Cline
Champion
Champion

OK, a couple things real quick.

1. At the end of your other thread you said "There's no process in there that looks like it belongs to the VMWare driver" - the balloon driver is called vmmemctl, so if it were ballooning taking CPU, that's where you would see it. All that vmmemctl does is, when asked by the vmkernel, it will allocate RAM within the VM (essentially does a malloc) and releases that RAM back to the vmkernel. This could cause memory pressure on the guest OS and cause paging to occur within the guest OS. Do you have the correct version of VMware Tools installed?

2. You can't trust the CPU utilization numbers provided by perfmon. The way the Windows calculates CPU utilization is that when it enters its idle loop, it increments a time counter. At the end of an epoch, it will subtract that counter value from the total amount of "wall clock" time that has elapsed. This works fine in a physical world, where the entire system is dedicated to the OS and Windows' view of "wall clock time" matches reality. In a VM, when Windows enters its idle loop, it gets descheduled. This causes the CPU utilization numbers to be skewed.

So...basically, the ballooning process should take essentially zero CPU cycles. It may cause the guest OS to become memory constrained and thus begin paging or otherwise begin some processing activity caused by a percieved shortage of RAM.

Ken Cline

Technical Director, Virtualization

Wells Landers

TVAR Solutions, A Wells Landers Group Company

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
Number774
Contributor
Contributor

OK, a couple things real quick.

1. At the end of your other thread you said "There's no process in there that looks like it belongs to the VMWare driver" - the balloon driver is called vmmemctl, so if it were ballooning taking CPU, that's where you would see it. All that vmmemctl does is, when asked by the vmkernel, it will allocate RAM within the VM (essentially does a malloc) and releases that RAM back to the vmkernel. This could cause memory pressure on the guest OS and cause paging to occur within the guest OS. Do you have the correct version of VMware Tools installed?

ESX server 3.5.0, build 64607. The tools have the same version. I do not see vmemctl as a process; I can see VMwareService, VMWareTray and VMWareUser.exe. None of them have used much CPU (<10 minutes, as opposed to system's 3hrs 40 in the VM I just lookd at)

2. You can't trust the CPU utilization numbers provided by perfmon. The way the Windows calculates CPU utilization is that when it enters its idle loop, it increments a time counter. At the end of an epoch, it will subtract that counter value from the total amount of "wall clock" time that has elapsed. This works fine in a physical world, where the entire system is dedicated to the OS and Windows' view of "wall clock time" matches reality. In a VM, when Windows enters its idle loop, it gets descheduled. This causes the CPU utilization numbers to be skewed.

I suspected as much.

So...basically, the ballooning process should take essentially zero CPU cycles. It may cause the guest OS to become memory constrained and thus begin paging or otherwise begin some processing activity caused by a percieved shortage of RAM.

And yet...

Perhaps I need to raise a formal bug report on this.

Thanks

0 Kudos
Number774
Contributor
Contributor

I've been digging.

Vmmemctl.sys is a driver, not an app, and so I wouldn't expect to see it running, not even as a service. Device manager shows "VMware server memory controller" (its other name) in the list of devices hidden in the system, so it's there all right. I don't know which process - if any - it will run in, nor what Windows will charge the time to as far as Process Explorer is concerned. I wouldn't be surprised to find it was System.

0 Kudos
Number774
Contributor
Contributor

OK, this is a botch, but it works!

I've created an application that allocates RAM and makes sure it is zero. As we're running 32-bit, and the per process limit is small, it only allocates a gigabyte per process and I run several copies simultaneously.

At the end of our build process, when I know the VM is going to be idle for the next 20-something hours, I ask for a total amount of memory which is about a gig less than the memory allocation for the VM. Windows happily throws away all its cached disk data to be given to these processes. The processes then quit. Windows now has a large amount of memory in its free pool, which is all zero. VMWare's scanner comes along, spots all these zero pages, and maps them all to the zero page - which means that the VM is now using only a gig or so of RAM.

0 Kudos
Ken_Cline
Champion
Champion

An interesting (and ingenius) hack. Much more effort than your average user would go through - but it demonstrates that you've developed a much better understanding of how memory management works in an ESX environment.

Ken Cline

Technical Director, Virtualization

Wells Landers

TVAR Solutions, A Wells Landers Group Company

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos