Solved: Performance issues after random amounts of time

arensa · ‎02-18-2010

Hello,

I am hunting for a performance problem for quite a while now and would like to ask the community for some hints.

But first things first:

We have installed ESXi 4.0.0 (219382) on a Dell PowerEdge R710 with one Intel Xeon CPU X5560@2.80GHz and 8 GB of RAM. A PERC 6/i controller is configured on RAID 5 for storage. A variety of Ubuntu (6.06, 8.04) and Fedora installations are running as guest systems.

When booting a guest the subjective experienced performance is fine and using hdparm, we get disk read rates 35-60 MB/sec. However, after an indefinite amount of time, the performance of the system drops down and hdparm reports disk read rates of about 1 MB/sec. This happens on random guests and after random times. The only working solution here is to shutdown the guest and boot again (reboot does not work here).

I already studied available documentation like http://communities.vmware.com/docs/DOC-3930 and followed the recommendations but I had no success so far. Checking the performance of the ESXi host using vSphere Client and esxtop, no heavy load on the CPU or disk queues can be noticed. The guest file systems are not aligned, but I doubt that this would be the reason for such a huge performance hit.

What takes me aback is the fact of the random amounts of time. Could enabling VT-x in BIOS after installing ESXi play a role here?

I would appreciate any hints.

Thanks,

Andreas

J1mbo · ‎02-19-2010

But ESX needs memory for itself, about 1GB, and there is memory overhead for each guest (for example a 1GB guest has about 128MB overhead). ESX will also try to keep some free (in an 8GB machine ESX will typically attempt to keep RAM utilisation at about 6.5GB or less in my experience).

These factors combined are why you see balloon and swap figures.

Your options are either to reduce the RAM allocated or to add physical RAM. Another option in some cases is to move the vSwap files to a decent SSD.

Please award points to any useful answer.

View solution in original post

marcelo_soares · ‎02-18-2010

At the moment the issue happen, try taking a look at the /var/log/messages, it will give us, maybe, a hint on what is going on. The CPU spike is comprehensible as the disk seems to be slow.

Marcelo Soares

VMWare Certified Professional 310/410

Technical Support Engineer

Globant Argentina

Consider awarding points for "helpful" and/or "correct" answers.

Marcelo Soares

J1mbo · ‎02-18-2010

What is the total RAM allocated to running guests? During a period of poor throughout, what are the memory balloon and memory swapped figures for ths affected guest?

Please award points to any useful answer.

arensa · ‎02-18-2010

The total amount of RAM allocated to the guests is 7.5 GB

A currenctly affected Guest reports these figures:

Memory Ballon: 987156 kbytes

Memory Swap In: 7370360 kbytes

J1mbo · ‎02-18-2010

So there you are then, you have too much RAM allocated, too little physical RAM, and vswapping as a result.

arensa · ‎02-18-2010

Well, I have a misunderstanding here:

The physical machine has 8GB of RAM and I assigned roughly 7.5 GB to all guests in total. The machine in question running slowly has only assigned 1.5 GB of RAM in its settings. So how can I have to much RAM allocated?

What I did not have done is to set limits to the resources. Is that a possible source of failure?

J1mbo · ‎02-19-2010

But ESX needs memory for itself, about 1GB, and there is memory overhead for each guest (for example a 1GB guest has about 128MB overhead). ESX will also try to keep some free (in an 8GB machine ESX will typically attempt to keep RAM utilisation at about 6.5GB or less in my experience).

These factors combined are why you see balloon and swap figures.

Your options are either to reduce the RAM allocated or to add physical RAM. Another option in some cases is to move the vSwap files to a decent SSD.

Please award points to any useful answer.

arensa · ‎02-19-2010

I will give it a try and see if this alleviates the performance problems.

Sorry for keep asking dumb questions: If your scenario is right, I should see a high load on CPU and Storage on the physical machine, right? The problem is, I don't see that load...

J1mbo · ‎02-19-2010

When swapping the CPU load will likely be minimal as processes are waiting for paging.

Disk load should be have a reasonable rate in IOPS terms (esxtop, "d", in CMDS/s). Although OS disk counters may well show minimal counters, since the guest OS is not aware of vSwapping (although it will be aware of it's own paging mechanisms, which the balloon driver exercises).

Also - no such thing as a dumb question, in my book anyway.

All

Performance issues after random amounts of time