VMware Cloud Community
BenBrazil
Enthusiast
Enthusiast

High CPU Ready and Wait Values

All,

I am troubleshooting an environment where we are seeing very high CPU Ready and CPU Wait values. I am looking for some advice, pointers on what metrics to look at and suggestions on what you think the underlying issue may be. Also, if my assumptions so far seem to be incorrect, please let me know.

The environment is as follows:

ESXi 3.5 update 5

8 Hosts in the cluster

All Hosts are HP Proliant BL495c G5 Servers in Blade c7000 enclosure.

Each host has 2 quad core processors and 64 gigs of RAM.

Virtual Connect for Network and SAN connectivity hosted on an EMC SAN.

Scratch Config location is currently set to SAN storage. This will be moved to local SSD on the ESX Hosts. Could this be an issue? We do see that the Scratch Config location is heavily using the SAN.

This cluster is used for the VDI environment. There are currently 224 VDI's in the cluster and the VDI's are distributed across each Host. Each VDI is configured in exactly the same way:

1 CPU

1 gig of RAM

Windows XP

The issue that we are seeing is that performance is very poor. On investigation it can be seen that CPU Ready and Wait times are very high.

I have attached a spreadsheet of the CPU Ready and Wait times.

The physical CPU's utilization on the hosts do not go above roughly 80%.

So, from these readings it is clear that the VM Guests are suffering from poor performance because of the high CPU Ready values. This suggests to me that there is a lack of CPU resources available to the cluster.

However, I also believe that the high CPU Wait values will also be causing the high CPU Ready values as the VM Guest will have locked the CPU cycles during the CPU Wait duration and therefore preventing those resources from being available the rest of the cluster.

I dont believe that we have a memory issue on our VM Guests. We have 64 gigs of RAM on each Host and each VDI has 1 gig of RAM. I dont see any ballooning on the VDI's. This therefore leaves Disk I/O and network I/O. 

I am now looking at Disk I/O. What should I be looking at for Disk I/O? What values would show me that there is an issue?

Can you also advise what an acceptable value should be for CPU Wait? It is easy to find general rules about CPU Ready, but not so much for CPU Wait.

Many thanks,

Ben

Reply
0 Kudos
7 Replies
BenBrazil
Enthusiast
Enthusiast

Hi,

I have also found that the CPU Wait value dramatically decreases when CPU intensive activities on the Guest occur. For instance when Word, Excel, Powerpoint etc are all opened at the same time CPU Ready increases, which is understandable, however the CPU Wait dramatically decreases. So, in an idle state, the VM Guest has a very high CPU Wait figure, however when the CPU is required to perform intensive activities the Wait value decreases. I dont really understand why this is happening. CPU Wait occurs when the CPU is waiting for memory, disk I/O or network I/O. If there are bottlenecks on these resources I would have expected for the CPU Wait to increase when the CPU is asked to perform a task and not to decrease.

Any help would be most appreciated.

Thanks,

Ben

Reply
0 Kudos
AWo
Immortal
Immortal

VMware Tools are installed?

AWo

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
Reply
0 Kudos
BenBrazil
Enthusiast
Enthusiast

Hi,

Yes. VMWare Tools are installed on all Guests.

Reply
0 Kudos
AWo
Immortal
Immortal

Have you gone through this: http://communities.vmware.com/docs/DOC-9279

What does %IDLE show when the wait time increases. %WAIT includes %IDLE.

Is Hyperthreading available/enabled?

AWo

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
Reply
0 Kudos
BenBrazil
Enthusiast
Enthusiast

Hi Thanks for your reply.

I've been reading through that document.

When Wait times increase so do Idle times. Here is an example on a particular VM:

Wait               Idle

440               46

446               52

450               59

455               61

458               64

463               67

Hyperthreading is not enabled on these servers.

Would you agree that Wait times are extremely high here?

Reply
0 Kudos
FredPeterson
Expert
Expert

Wait times, when evaluated alone, are meaningless.  Just like any other raw performance metric.

When nothing is happening on a system, your wait times are going to be pegged at their highest and that is basically a "good" thing.

High value WAIT times are a problem when disk usage or network usage are 'high'.  WAIT times are a measure of IO bottlenecks.  The CPU is WAITing on another resource.

So in other words, as long as WAIT times decrease proportionally when activity goes up, all is well.

Disk latency and network latency (or throughput) need to be measured additionally to determine what IO resource could be causing a high WAIT value.

CPU Ready Time values are entirely a factor of CPU Resource Scheduling.  Trust me, get *off* ESX 3.5  The CPU scheduler performance improvements even in 4.0 (and 4.1 especially) are enormous and if v5 has improved even more (since support of 32 vCPU) I can't imagine the boost it gives.

Reply
0 Kudos
VeyronMick
Enthusiast
Enthusiast

A lot of HP blades ship with power management enabled which can have an impact on performance.

If you haven't checked it make sure that power management isn't set to Dynamic in the BIOS, it should be set to maximum performance.

I'm sure you've already done this but it can be a really quick fix if you've missed it.

Reply
0 Kudos