1 2 Previous Next 20 Replies Latest reply on Feb 26, 2014 8:18 AM by NickSousa

    CPU Ready Time

    xadamz23 Enthusiast

      I'm trying to understand why I am seeing high CPU ready time.  I have a host with 32 physical processors (no HT).  I have 8 VMs each with 4 vCPUs so a total of 32 vCPUs.  From time to time I see very high ready time on 2 or 3 VMs.  I've seen it up to 40% with esxtop.  But I am not overprovisioning vCPUs on this host at all.  I understand that the Hypervisor itself needs "some" CPU resources but I wouldnt think it would need so much to cause that high of ready time.

       

      Also I am not putting any limits on CPU either.

       

      Any ideas as to why I am seeing such a high ready time even though I am not oversubscribing CPUs?

        • 1. Re: CPU Ready Time
          mcowger Champion

          Things other than CPU overlap are included in RDY time.

           

          Poor storage performance can cause this....

          • 2. Re: CPU Ready Time
            lakey81 Enthusiast

            If you aren't on at least 5.0 Update 1 there was a bug with AMD processors (I'm assuming you have AMD procs since you said no HT) that did not properly balance VM's and could introduce rdy time even though you have more pCPUs than vCPUs.  Also in esxtop keep in mind unless you expand the VM you are looking at that 40% rdy time will be cumulative for your 4 vCPUs so that it's really probably about 10% per vCPU.

            • 3. Re: CPU Ready Time
              RParker Guru

              lakey81 wrote:

               

              If you aren't on at least 5.0 Update 1 there was a bug with AMD processors (I'm assuming you have AMD procs since you said no HT) that did not properly balance VM's

               

               

              I have AMD, I have HT, Windows will see logical processors as "hyper-threaded" regardless of manufacturer.. so I am curious why would you assume AMD?  AMD has the same capability.... as Intel.  Not a valid assumption.

               

              Maybe they simply "disabled" HT in the BIOS.

              • 4. Re: CPU Ready Time
                Gopinath Keerthyrajan Expert

                as mentioned by Matt - the ready time high value is not only due to CPU contention ..also if there is high disk latency.. then also it will appear..

                 

                so check the STORAGE section from the esxtop... and refer the http://communities.vmware.com/docs/DOC-9279 for more details..

                 

                disk latency can cause all mess..in the vsphere...

                 

                • "%WAIT"

                The percentage of time the world spent in wait state.

                 

                This  %WAIT is the total wait time. I.e., the world is waiting for some  VMKernel resource. This wait time includes I/O wait time, idle time and  among other resources. Idle time is presented as %IDLE.

                 

                +Q: How do I know the VCPU world is waiting for I/O events?+

                +A:  %WAIT - %IDLE can give you an estimate on how much CPU time is spent in  waiting I/O events. This is an estimate only, because the world may be  waiting for resources other than I/O.+ +Note that we should only do this  for VMM worlds, not the other kind of worlds. Because VMM worlds  represent the guest behavior the best. For disk I/O, another alternative  is to read the disk latency stats which we will explain in the disk  section.+

                 

                +Q: How do I know the VM group is waiting for I/O events?+

                +A:  For a VM, there are other worlds besides the VCPUs, such as a mks world  and a VMX world. Most of time, the other worlds are waiting for events.  So, you will see ~100% %WAIT for those worlds. If you want to know  whether the guest is waiting for I/O events, you'd better expand the  group and analyze the VCPU worlds as stated above.+

                 

                +Since  %IDLE makes no sense to the worlds other than VCPUs, we may use the  group stats to estimate the guest I/O wait by "%WAIT - %IDLE - 100% *  (NWLD - NVCPU)". Here, NWLD is the number of worlds in the group; NVCPU  is the number of VCPUs. This is a very rough estimate, due to two  reasons. (1) The world may be waiting for resources other than I/O. (2)  We assume the other assisting worlds are not active, which may not be  true.+

                • 5. Re: CPU Ready Time
                  xadamz23 Enthusiast

                  Well I can say we dont have the best san in the world but the good news is we are currently migrating to one that performs much better.  I have never read anything that said CPU RDY was anything but a VM waiting to be scheduled on a CPU though.

                   

                  I am on 5.0 U1 on AMD processors.

                   

                  My guess based on what you all have said is that it has to do with disk latency.  We do experience high disk latency at certain times throughout the day.  I learned something new today about RDY time so thanks for that. 

                  • 6. Re: CPU Ready Time
                    rickardnobel Virtuoso

                    Adam wrote:

                     

                    Any ideas as to why I am seeing such a high ready time even though I am not oversubscribing CPUs?

                     

                    Could you take screenshots of the CPU and Memory views in ESXTOP while you have the problem and post them here?

                    • 7. Re: CPU Ready Time
                      mcowger Champion

                      You can actually get a little more granular and see what makes up RDY.

                       

                      If you look at the %CSTP value, thats the most clear picture of CPU scheduling penalties - its part of %RDY (along with other stuff like %WAIT).

                      • 8. Re: CPU Ready Time
                        lakey81 Enthusiast

                        Actually my bad it's the september 2012 patch "ESXi500-201209001" that fixed that issue. 

                        • 9. Re: CPU Ready Time
                          xadamz23 Enthusiast

                          So after digging into it more I dont think it has to do with storage latency.  I'm attaching the esxtop output.  I've expanded one of the VMs.  Based on the output any ideas?

                          • 10. Re: CPU Ready Time
                            mcowger Champion

                            Are these EPCCTX0X systems 8vCPU systems - they look like it.  Above you said they are 4 vCPU.

                             

                            Either way, these RDY times aren't bad.  You have to remember that %RDY is the aggregate of all the combined vCPUs in the system.  So if you see 20% RDY on a 4vCPU system, each one is only 5%, which is a reasonably healthy number.  If these are 8vCPU, you are at 2.5%, which is perfectly healthy....

                            • 11. Re: CPU Ready Time
                              xadamz23 Enthusiast

                              No there are 8 VMs each with 4 vCPU.  The physical server has 32 cores and no HT.  I understand that 5% per vCPU isnt terrible, but I guess I would expect it to not be that high since I am not oversubscribing CPUs at all.   There are 32 vCPUs total and 32 total physical cores.  Again I realize the hypervisor itself is using processor 0, but not heavily so I still wouldnt expect to see these high of RDY values.

                              • 12. Re: CPU Ready Time
                                xadamz23 Enthusiast

                                I'm starting to wonder if lakey81 is right.  I do in fact have AMD processors and I probably dont have the patch that he mentioned.  I'll try and get the hosts patched next week and see if the problem goes away.

                                • 13. Re: CPU Ready Time
                                  lakey81 Enthusiast

                                  One way you can kind of check in esxtop is switch to the memory view an enable the NUMA stats.  I believe is NHN which is numa home node and that will tell you which numa node/socket the VM is running on.  Normally they should be spread fairly evenly based on load over all your physical processors but in the case with this bug it would not move VMs around and favor 1 to 2 nodes.  In my case with 4 cpu blades it would load everything up on 0 and 1 and rarely use nodes 2 and 3 which caused major issues with rdy time.

                                  • 14. Re: CPU Ready Time
                                    xadamz23 Enthusiast

                                    So I applied the patch that lakey81 mentioned, but it didnt fix the problem.  After I applied the patch I had 8 VMs running on a host and RDY was fine.  I vMotioned 4 VMs off of the host and then back on to it and RDY time is now terrible for 3 of the VMs.  I've attached the RDY and Numa stats.

                                     

                                    Any other ideas?  Maybe I should apply all available ESXi patches?

                                    1 2 Previous Next