9 Replies Latest reply on Jul 18, 2012 2:52 PM by tdubb123

    high %WAIT times

    tdubb123 Master

      any idea why there waould be such a high %WAIT time on all the vms here?

        • 1. Re: high %WAIT times
          depping Champion
          User ModeratorsVMware Employees

          you are only seeing part of the picture. this includes all processes linked to the world. if you expand one of them and look at the actual vcpu worlds and then subtract %idle you have the time it actually is sitting in the queue waiting for IO to finish.

          • 3. Re: high %WAIT times
            tdubb123 Master

            is there a bottleneck here? the 2 vcpus are frequently maxed out.

            • 4. Re: high %WAIT times
              kooltechies Expert

              Do you actually encounter a performance problem here , the %RDY is quite low which doesn't indicate much of a performance problem.

              • 5. Re: high %WAIT times
                tdubb123 Master

                cpu would max at 100%. when that does, %USED is at 100%+ why would %USED be over 100%?

                 

                yes the server is running slow during that time. while the cpu is not pegged at 100%, it does fluctuate between 80-100%.

                 

                should i try adding 2 more cpus to it?

                • 6. Re: high %WAIT times
                  kooltechies Expert

                  In my opinion the vSphere hosts PCPUs are not utilized heavily they are at an average of around 20%. you can try adding more CPUs but be aware that it may or may not make the situation better as the cpu co scheduling will have it's own lag.

                   

                  Duncan will have a better suggestion for you.

                  • 7. Re: high %WAIT times
                    mal_michael Master
                    vExpert

                    If a VM does a heavy I/O, %used can be greater than 100%.

                    Take a look on Interpreting esxtop 4.1 Statistics.

                     

                     

                    Michael.

                    • 8. Re: high %WAIT times
                      Gopinath Keerthyrajan Expert

                      as mentioned by the vastro

                       

                      http://kb.vmware.com/kb/1017926

                       

                      by seeing your screen shot, the % wait for all VMs are too high


                      Wait, %WAIT:

                      • This value  represents the percentage of time the virtual machine was waiting for  some VMkernel activity to complete (such as I/O) before it can continue.

                      • If the virtual machine is unresponsive and the %WAIT value is proportionally higher than %RUN, %RDY, and %CSTP, then it could indicate that the world is waiting for a VMkernel operation to complete.

                      • You may observe that the %SYS is proportionally higher than %RUN. %SYS represents the percentage of time spent by system services on behalf of the virtual machine.

                      • A high %WAIT value can be a result of a poorly performing storage device where the  virtual machine is residing. If you are experiencing storage latency and  timeouts, it may trigger these types of symptoms across multiple  virtual machines residing in the same LUN, volume, or array depending on  the scale of the storage performance issue.

                      • A high %WAIT value can  also be triggered by latency to any device in the virtual machine  configuration. This can include but is not limited to serial  pass-through devices, parallel pass-through parallel , and USB devices.  If the device suddenly stops functioning or responding, it could result  in these symptoms. A common cause for a high %WAIT value is  ISO files that have been left mounted in the virtual machine  accidentally that have been deleted or moved to an alternate location.  For more information, see Deleting a datastore from the Datastore inventory results in the error: device or resource busy (1015791).

                      • If there does not  appear to be any backing storage or networking infrastructure issue, it  may be pertinent to crash the virtual machine to collect additional  diagnostic information.

                       

                       

                      Also check the storage peroformance...

                       

                      From the vcenter you can get the latency reports, and IOPS report, write rate  etc, refer below section i took from the vSphere Datacenter Administration Guide  for vsphere 4.1, page 120

                       

                       

                       

                      Disk I/O Performance

                      Use the vSphere Client disk performance charts to monitor disk I/O  usage for clusters, hosts, and virtual

                      machines. Use the guidelines below to identify and correct  problems with disk I/O performance.

                      The virtual machine disk usage (%) and I/O data counters provide  information about average disk usage on a

                      virtual machine. Use these counters to monitor trends in disk  usage.

                      The best ways to determine if your vSphere environment is  experiencing disk problems is to monitor the disk

                      latency data counters. You use the Advanced performance charts to  view these statistics.

                      n

                      The kernelLatency  data counter measures the average amount of time, in milliseconds, that the  VMkernel

                      spends processing each SCSI command. For best performance, the  value should be 0-1 milliseconds. If the

                      value is greater than 4ms, the virtual machines on the ESX/ESXi  host are trying to send more throughput

                      to the storage system than the configuration supports. Check the  CPU usage, and increase the queue depth.

                      n

                      The deviceLatency  data counter measures the average amount of time, in milliseconds, to complete a  SCSI

                      command from the physical device. Depending on your hardware, a  number greater than 15ms indicates

                      there are probably problems with the storage array. Move the  active VMDK to a volume with more

                      spindles or add disks to the LUN.

                      n

                      The queueLatency  data counter measures the average amount of time taken per SCSI command in  the

                      VMkernel queue. This value must always be zero. If not, the  workload is too high and the array cannot

                      process the data fast enough.

                      • 9. Re: high %WAIT times
                        tdubb123 Master

                        A high %WAIT value can be a result of a poorly performing storage device where the  virtual machine is residing. If you are experiencing storage latency and  timeouts, it may trigger these types of symptoms across multiple  virtual machines residing in the same LUN, volume, or array depending on  the scale of the storage performance issue.

                         

                        I do not understand why it says a %WAIT time can be a result of a poor storage performance. I am running esxtop on different hosts connected to emc and now netapp storage with almost nothin on the aggregate. the wait times between the emc and netapp is the same.