VMware Cloud Community
Blicks
Contributor
Contributor

CPU LATENCY Counter ON ESXI 5.0.

I am having trouble understanding
the results from the latency counter on the vcentre for our guest VM.

 

My understanding of the latency counter
is that it will increase when there is contention for physical CPU resources
and that it is a measurement of percentage of time that a guest has to wait to
get access to the CPU resource.  However,
all our hosts have more CPU physical cores than the total number of VCPU’s of
our guests.  Therefore, if my
understanding is correct, there should be minimal latency and there should be
no contention for physical CPU resources.
We are seeing a number of guest VM wares with a latency of above 10%
which I have read shows significant degradation of performance.

However, our CPU Ready counter
are well below 500 milliseconds.  Can
anyone shed any light as to why this might be the case?

27 Replies
rickardnobel
Champion
Champion

Which exact counters do you study? And do you use ESXTOP or the graphical performance information through vSphere Client?

My VMware blog: www.rickardnobel.se
0 Kudos
joshodgers
Enthusiast
Enthusiast

In case your still having issues, I have a few blogs which may assist

Common Mistake - Using Reservations to Solve CPU Ready

How much CPU ready is OK?

High CPU Ready with Low Utilization

Josh Odgers | VCDX #90 | Blog: www.joshodgers.com | Twitter @josh_odgers
0 Kudos
Blicks
Contributor
Contributor

Hi thanks for coming back to me. It not the CPU Ready counter that we are
having an issue with its the CPU latency. There are tonnes of post on CPU Ready
but almost nothing on Latency. Our CPU Ready on ESxTOP is 0.05 but our Latency
is 25% on Vcentre Noone including VMware support have been to explain what the difference
between the two . Given we have more physical cores then Vcpu and no
reservations I am struggling to understand why the latency counter is so high

0 Kudos
jamesl73
Contributor
Contributor

I'm troubleshooting the same thing.  Do you have any more information to share?

0 Kudos
Blicks
Contributor
Contributor

Hi,

What I didn’t say in the original post was it was the NWORKS (Veeam) SCOM
that was raising the alert. I spent an month going back and forth with Nworks
and Vmware, Vmware where certain there was not issue with our environment,
indeed I was seeing no issues and we had more physical cores then VCPU so I couldn’t
see why there would be an issue. Nwork maintain that this is a vaild counter
but couldnt give me any documentation so I decided to override the counter

0 Kudos
jamesl73
Contributor
Contributor

Same here, I'm using Veeam as well.  The Veeam alert is accurate, the data matches what I can see in vCenter performance.  It's just a matter of what does that metric mean and what's a valid threshold for it.  I'm getting about 3,000 of them per day in a 1200 VM environment.  For now I'll try bumping the threshold from 20% to 50%.  Thanks for the follow up!

thelights
Contributor
Contributor

I'll jointhe crowd too - I'm using Veeam's MP here too and I get the same alert. Here is what VMware told me:

  • CPU Latency rises when the VM cannot run on the best core (where it was running), while Ready rises when none of the cores in entire motherboard is available. So Latency will go up before Ready will go up. But the 2 counters do not seem to move in tandem.

Now what I don't have yet is at what point is the percentage too high. e.g. Is 20% an issue or is it 50%? Or how do you fix it!

0 Kudos
Blicks
Contributor
Contributor

That would make sense, however we have more actual cores then we have VCPU, so I would expect it to always be able to use the best core .... but I still get silly high values of 500ms

0 Kudos
thelights
Contributor
Contributor

500ms? Latency is measure as a %?

But same here - more pCPUs than vCPUs and we have the issue. Do you have Intel's hyper-threading turned on? We do.

We also see the issue on VMs with 1 vCPU.

0 Kudos
frdnguyen
Contributor
Contributor

As far as I know cpu latency is an index (let's call it that) based on cpu ready, swapwait and power regulation (C-state). So If you cpu ready is ok, and swap wait is 0, then it is your power regulation.

You can change the setting into the bios and let the os (ie ESX) control the power otherwise you can configure it to something like high performance.

0 Kudos
Blicks
Contributor
Contributor

This is interesting where did you get this information from ?

Please consider whether you really need to print this email

Blick Rothenberg LLP

Chartered Accountants

16 Great Queen Street

Covent Garden

London WC2B 5AH

Main: +44 (0)20 7486 0111

Fax: +44 (0)20 7935 6852

Web: www.blickrothenberg.com<http://www.blickrothenberg.com/>;

Twitter: @BlickRothenberg<http://twitter.com/BlickRothenberg>

Map: Blick Rothenberg Map<http://www.blickrothenberg.com/Contact-Us/Contact-Details>

Blick Rothenberg LLP is a UK limited liability partnership registered in England and Wales under number OC377158. A list of members (who we refer to as partners) is available at the registered office address above. Blick Rothenberg LLP is authorised and regulated by the Financial Conduct Authority to carry on investment business.

Blick Rothenberg LLP is an independent member of BKR International www.bkr.com<outbind://27/www.bkr.com>;

The contents of this e-mail are confidential to the sender and the addressee. If you are not named above as an addressee, any distribution, copying or use of this e-mail or the information in it is strictly prohibited. If this e-mail has been received by you in error, please notify the sender and then delete it and any copies of it.

0 Kudos
frdnguyen
Contributor
Contributor

like I said it is not official, but i found this in an old post  "ready, cstp, ht busy time and effects of dynamic voltage frequency scaling" (https://communities.vmware.com/message/2162570). So my memory got it partly wrong. And some veeam forum,Virtual Machine Compute Latency Analysis, how does it work? | view topic...

Still the information is absolutly not documented by vmware :

vmware api on counters : VMware vSphere 5.1

I had the problem and check all statistics. All of them were ok, only power regulation was unclear. You can see it in esxtop, hit p for power mgmt.

thelights
Contributor
Contributor

I've got the power control set to OS Control on my HP servers - I changed the power management from balanced to high performance, and I don't really see a significant change in latency. e.g. I have a file server with a 20% average latency, and it still seems around 20% (although it has got a lot of jitter on its latency stats).

0 Kudos
thelights
Contributor
Contributor

Ok, so we have it fixed. Speaking with another VMware tech, I was told the latency was a measure of 3 things:

  1. CPU ready.
  2. CPU swap wait.
  3. Power settings.

For us, ready was not an issue per the charts and we weren't swapping (this host was not overcommitting CPU or memory). To fix it, we had to go into the BIOS of our servers and change the power setting to "OS Control" (this is for a HP server, YMMV for other brands). Even leaving the power mode to "balanced" in ESXi itself was fine. Moving VMs back to the host with this setting made effectively reduced latency to 0, it also decreased ready time and our utilization went up too. So a good result!

esx05.PNG

admin
Immortal
Immortal

My understanding of the latency counter is that it will increase when there is contention for physical CPU resources and that it is a measurement of percentage of time that a guest has to wait to get access to the CPU resource.  However, all our hosts have more CPU physical cores than the total number of VCPU’s of our guests.  Therefore, if my understanding is correct, there should be minimal latency and there should be no contention for physical CPU resources.  We are seeing a number of guest VM wares with a latency of above 10% which I have read shows significant degradation of performance.

However, our CPU Ready counter are well below 500 milliseconds.  Can anyone shed any light as to why this might be the case?

use the esxtop command to determine if the ESX/ESXi server is being overloaded.

Examine the %READY field for the percentage of time that the virtual machine was ready but could not be scheduled to run on a physical CPU.

Under normal operating conditions, this value should remain under 5%

see my EE Article

HOW TO:  Performance Monitor vSphere 4.x or 5.0

also see best practice Guide

http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

    

0 Kudos
thelights
Contributor
Contributor

Look at my post #14. Latency is a combination of:

  1. CPU Ready
  2. CPU Swap
  3. CPU Powerstate.

So either you're short on memory, or your power settings are wrong in the server BIOS.

0 Kudos
Shocko
Enthusiast
Enthusiast

I'd have to say I'm not sure I agree with the last post. I have the following:

  • 16 logical core host ( 4 dual core CPUs with Hyper-threading enabled)
  • 4 dual vCPU VMs running on that host with full memory reservations

I still see Latency/Ready/CoStop times against all VMs but swapwait is 0 as expected due to reservations. Since the host is under committed I don't understand why I'm seeing latency/ready/cotop when there should always be a pCPU ready. Could this be due to the core/HT setup or some overhead from the CPU scheduler?

0 Kudos
frdnguyen
Contributor
Contributor

Hello,

Overhead on cpu scheduler looks very unlikely. You should at least see some cpu ready time.

If everything looks ok except cpu latency check your bios power management (C state) parameter (on esx host, not VM) . You should have several options from "Vendor driven" to something like OS controlled (in this case ESX). Switch it to OS controlled. Try this and see what happens.

0 Kudos
Shocko
Enthusiast
Enthusiast

Hmm, I checked all my iLO2 cards (using HP BL460c servers for my ESXi 5.1 hosts) and they are all set to 'Static Hight Performance' . In fact, they don't appear to be licensed for dynamic power capping etc. so I don't thing the VMWare Power Management could be putting them into low power states etc.

0 Kudos