Solved: Am i running out of ram or is anything else limiti...

CatHat · ‎03-04-2010

Hello again.

I have been running esxi4 for some weeks now and recently i stumbled into a problem i cant explain. The machine running ESXi has 4GB memory and works perfectly. Theres currently 3 VMs running on the machine, but when i try to add the fourth machine all hell breaks lose and im not sure why; but i expect its the memory.

My current memory consumption is: (note that i have sum up all the memory resources form my different vms)

total memory given to all machine is 3348 mb <-- i know this is the total assigned memory for all machines
total active guest memory is ~730 mb <---this is the memory currently in use by all my vms and applications on them?
consumed host memory is ~2119 <--- this is memory the vms have allocated(cached)? Is this locked or can a VM that needs more memory allocate from this?

The total memory for my VMs is 3561 MB (under the Configuration tab), my memory overhead space is 2276 MB (under the Resource Allocation tab).

Now the main question is; When i add the fourth machine (an win2008 with 1 gb ram) the currently running vms gets extremely stiffy. For instance the latency for a ping to a vm form my workstation goes from >1 to about 200 ms :S. Iv checked the performance monitor; disk, cpu, and system is not in any way spiking or overloading. The memory consumed is not even near the max limit in the performance monitor.

Well i do not understand why this is happening. I thought that even if i configured an amount of VMs to a total memory of above 4 GB i can still run the vms with good performance since they are not using their whole assigned ram at the same time; but is this wrong?

Sorry for the many question but hopefully they will be easy to answer for you more experienced users out there!

J1mbo · ‎03-15-2010

ESX cannot 'make up' CPU resource. It needs to assign to each guest whatever you have told it, in this case there are two 3 vCPU and one with 1 vCPU. At any particular point in time (ESX works with time slices up to about 30ms I think) ESX can schedule time to the 1 vCPU guest, fine, and might needs a CPU for itself (vSwitch or storage processing, and other ESX functions like TPS scanning since the RAM is so overcommitted), leaving 2 cores free. But there are no other guests able to run until ESX finishes with the CPU. Now one of the 3 vCPU guests can start, OK - but then needs some storage or network etc etc - ESX has to either suspend it or the other guest. The 3rd guest hasn't even had a look-in yet....

Please award points to any useful answer.

View solution in original post

rshondell · ‎03-04-2010

Just out of curiosity, how many physical CPUs are in your ESXi host, and how many total vCPUs have you assigned to all the guests on that host? It sounds on the surface like you aren't oversubscribed on memory, but you may be oversubscribing your CPU, at least from a scheduling/interrupt perspective.

A good metric to view from the VM side for this is called CPU Ready. It is essentially the number of milliseconds that the guest's virtual CPU is ready to perform work but is waiting on the host's physical CPU to schedule it. The lower the number here, the better. There's no definitive number that is necessarily good or bad, but ready times that are in the hundreds (or thousands) of milliseconds will probably be felt as slow or sluggish behavior in the guest.

You can view this in vCenter with the Performance tab on one of your VMs. Select Advanced chart option, and in the 'Switch To' dropdown box, select CPU. Click the Chart Options link just to the left of that dropdown. In the counters section near the lower right, unselect one of the currently checked options and then check CPU Ready (you can only have 2 options selected). Click Ok and you should be able to see the real-time representation of CPU Ready time for that guest.

If this is your problem, you should either reduce the number of vCPUs in your guests (if you have any SMP guests) or increase the number of CPUs in your host. Or add another host.

CatHat · ‎03-04-2010

Thanks for the reply!

I checked the CPU ready for my hosts. the resulst is avarage:133,176 and 32.Dosent seem to be that high so i guess its something else 😕

My cpu is a AMD 630 4x cpu. The total number of vcores in all vms are 7 (3,3,1).

If i start the new VM and then turn it of it takes about 1~3 min before the latency goes back to normal levels for some of the other VMs. 😕

rshondell · ‎03-04-2010

Did you check the CPU Ready time after you had started the 4th VM? As you mentioned, things are fine with 3 VMs and not so great with 4. Perhaps the 4th is pushing it over the edge.

Assuming your 4th VM has somewhere between 1 and 3 vCPUs, that's somewhere between 2:1 and 2.5:1 vCPU:pCPU ratio (8 vCPUs to 4 pCPUs. Depending on the workload, that's right about where I've seen some performance hits. Certain applications, like terminal services, tend to be very CPU needy so perhaps you just have some resource intensive workloads. And those 3 vCPU guests will be especially "noisy" on the scheduler. As a (very) general rule, the more vCPUs you have, the more work has to be done to schedule them amongst everything else.

From your description of the memory in the host, it doesn't sound immediately like a memory issue, but you can check that in vCenter as well. Make sure you start up the 4th VM so you are experiencing the problem and then check both CPU Ready and Memory Balloon number in the advanced performance graphs for your VMs. Memory ballooning is a sign of memory shortages on your ESXi host and is in the default advanced memory graph. I'm assuming you have up to date VMware Tools here as well.

Those are two of the first statistics that can point to resource shortages. But definitely gather the statistics while the problem is happening.

J1mbo · ‎03-04-2010

Worth checking the CPU but I don't think it's your issue. In general though avoid using more than 1 vCPU for any guest unless it will really benefit from it as it will be using 'excess' cores to run the OS idle thread when it doesn't instead of being available for other VMs.

At the host level (i.e. click on the server name on the left in vSphere client), performance tab, memory, then add swap used (not swap in or out), balloon and shared total. Then start the new VM and see what happens.

Make sure you have vmware tools instead on all guests, so that ESX can reclaim memory from them nicely using the balloon driver.

I'd guess that vswapping is occuring, literally crucifying your disk(s) and the running VMs, in order to service the need to get another 1GB out of it.

Please award points to any useful answer.

CatHat · ‎03-05-2010

Thanks for the reply both of you.

I will check what you suggested when the weekend is over so i can look into it properly.

Thanks so far!

CatHat · ‎03-08-2010

Iv done some testing now and here's the results:

With the following machines running: Winxp (1vcpu) win2008(2vcpu) pfsense(1vcpu) and then i start ubuntu server(1vcpu).

J1mbo: the "memory swap used" is quite low, when i started the fourth machine i did see a spike at 312 924 kb and after that its stable at 143 048 kilobyte. Not that high so i think thats out of the question :/?

rshondell: Iv checked the cpuready when the 4:th machine starts. my results are as follows:

Windows XP : cpu ready goes from around 10 ms to 15ms; not that much.
Windows 2008: cpu ready goes from 27 ms to 60 ms.
ubuntu (the newly started vm): cpu ready goes to average 6 ms and topped at 75 ms.

When all 4 vms are running the ping is still unstable but not as bad as it is when the fourth vm just started. When i boot the fourth VM ping to the other 2 windows VMs goes to ~400, after a while it settles down to between <1 to 24 ms. When i remote control the windows VMs through VNC they are notably less time responsive. Also I ran a CPU stress test on the win2008 VM without pings getting worse, makes me think its not the cpu limiting :/.

Its really awkward, cant find any indicator in the performance monitor thats spiking/overloading. Any more suggestions to my problem :(?

J1mbo · ‎03-15-2010

My cpu is a AMD 630 4x cpu. The total number of vcores in all vms are 7 (3,3,1).

Sorry only just spotted that line. The other poster is correct, there is massive CPU bottleneck here. A single quad-core box simply cannot support this, since it can only run 1 VM at a time, and this is why you have waits. Action plan:

- Reset all VMs to 1 vCPU

- Reduce guest RAM assignment as much as possible for each guest, so that total allocated is less than 3 GB

Please award points to any useful answer.

CatHat · ‎03-15-2010

That's an interesting statement there!

So even if all my vCPU's are almost idle on the currently running VMs and i start a new one this new VM cant utilize the cpu power thats not in use since the cores are already assigned to the other VMS (or something like that)?

J1mbo · ‎03-15-2010

ESX cannot 'make up' CPU resource. It needs to assign to each guest whatever you have told it, in this case there are two 3 vCPU and one with 1 vCPU. At any particular point in time (ESX works with time slices up to about 30ms I think) ESX can schedule time to the 1 vCPU guest, fine, and might needs a CPU for itself (vSwitch or storage processing, and other ESX functions like TPS scanning since the RAM is so overcommitted), leaving 2 cores free. But there are no other guests able to run until ESX finishes with the CPU. Now one of the 3 vCPU guests can start, OK - but then needs some storage or network etc etc - ESX has to either suspend it or the other guest. The 3rd guest hasn't even had a look-in yet....

Please award points to any useful answer.

CatHat · ‎03-16-2010

I see, thank you for the detailed explanation

I will try limiting the vcpus for all my hosts to 1 and see what happens.

All

Am i running out of ram or is anything else limiting me?