Re: high %WAIT times

tdubb123 · ‎07-14-2012

any idea why there waould be such a high %WAIT time on all the vms here?

depping · ‎07-14-2012

you are only seeing part of the picture. this includes all processes linked to the world. if you expand one of them and look at the actual vcpu worlds and then subtract %idle you have the time it actually is sitting in the queue waiting for IO to finish.

vastro · ‎07-17-2012

Check this out:

http://kb.vmware.com/kb/1017926

tdubb123 · ‎07-17-2012

is there a bottleneck here? the 2 vcpus are frequently maxed out.

kooltechies · ‎07-17-2012

Do you actually encounter a performance problem here , the %RDY is quite low which doesn't indicate much of a performance problem.

Blog : http://thinkingloudoncloud.com || Twitter : @kooltechies || P.S : If you think that the answer is correct/helpful please consider rewarding points.

tdubb123 · ‎07-17-2012

cpu would max at 100%. when that does, %USED is at 100%+ why would %USED be over 100%?

yes the server is running slow during that time. while the cpu is not pegged at 100%, it does fluctuate between 80-100%.

should i try adding 2 more cpus to it?

kooltechies · ‎07-17-2012

In my opinion the vSphere hosts PCPUs are not utilized heavily they are at an average of around 20%. you can try adding more CPUs but be aware that it may or may not make the situation better as the cpu co scheduling will have it's own lag.

Duncan will have a better suggestion for you.

Blog : http://thinkingloudoncloud.com || Twitter : @kooltechies || P.S : If you think that the answer is correct/helpful please consider rewarding points.

mal_michael · ‎07-17-2012

If a VM does a heavy I/O, %used can be greater than 100%.

Take a look on Interpreting esxtop 4.1 Statistics.

Michael.

Gkeerthy · ‎07-17-2012

as mentioned by the vastro

http://kb.vmware.com/kb/1017926

by seeing your screen shot, the % wait for all VMs are too high

Wait, %WAIT:

This value represents the percentage of time the virtual machine was waiting for some VMkernel activity to complete (such as I/O) before it can continue.
If the virtual machine is unresponsive and the %WAIT value is proportionally higher than %RUN, %RDY, and %CSTP, then it could indicate that the world is waiting for a VMkernel operation to complete.
You may observe that the %SYS is proportionally higher than %RUN. %SYS represents the percentage of time spent by system services on behalf of the virtual machine.
A high %WAIT value can be a result of a poorly performing storage device where the virtual machine is residing. If you are experiencing storage latency and timeouts, it may trigger these types of symptoms across multiple virtual machines residing in the same LUN, volume, or array depending on the scale of the storage performance issue.
A high %WAIT value can also be triggered by latency to any device in the virtual machine configuration. This can include but is not limited to serial pass-through devices, parallel pass-through parallel , and USB devices. If the device suddenly stops functioning or responding, it could result in these symptoms. A common cause for a high %WAIT value is ISO files that have been left mounted in the virtual machine accidentally that have been deleted or moved to an alternate location. For more information, see Deleting a datastore from the Datastore inventory results in the error: device or resource busy (101....
If there does not appear to be any backing storage or networking infrastructure issue, it may be pertinent to crash the virtual machine to collect additional diagnostic information.

Also check the storage peroformance...

From the vcenter you can get the latency reports, and IOPS report, write rate etc, refer below section i took from the vSphere Datacenter Administration Guide for vsphere 4.1, page 120

Disk I/O Performance

Use the vSphere Client disk performance charts to monitor disk I/O usage for clusters, hosts, and virtual

machines. Use the guidelines below to identify and correct problems with disk I/O performance.

The virtual machine disk usage (%) and I/O data counters provide information about average disk usage on a

virtual machine. Use these counters to monitor trends in disk usage.

The best ways to determine if your vSphere environment is experiencing disk problems is to monitor the disk

latency data counters. You use the Advanced performance charts to view these statistics.

n

The kernelLatency data counter measures the average amount of time, in milliseconds, that the VMkernel

spends processing each SCSI command. For best performance, the value should be 0-1 milliseconds. If the

value is greater than 4ms, the virtual machines on the ESX/ESXi host are trying to send more throughput

to the storage system than the configuration supports. Check the CPU usage, and increase the queue depth.

n

The deviceLatency data counter measures the average amount of time, in milliseconds, to complete a SCSI

command from the physical device. Depending on your hardware, a number greater than 15ms indicates

there are probably problems with the storage array. Move the active VMDK to a volume with more

spindles or add disks to the LUN.

n

The queueLatency data counter measures the average amount of time taken per SCSI command in the

VMkernel queue. This value must always be zero. If not, the workload is too high and the array cannot

process the data fast enough.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

tdubb123 · ‎07-18-2012

A high %WAIT value can be a result of a poorly performing storage device where the virtual machine is residing. If you are experiencing storage latency and timeouts, it may trigger these types of symptoms across multiple virtual machines residing in the same LUN, volume, or array depending on the scale of the storage performance issue.

I do not understand why it says a %WAIT time can be a result of a poor storage performance. I am running esxtop on different hosts connected to emc and now netapp storage with almost nothin on the aggregate. the wait times between the emc and netapp is the same.

All

high %WAIT times