VMware Cloud Community
CoffeeBlack
Contributor
Contributor

Thoughts...High vCPU Usage and a higher lvl of CPU latency

Environment: 6.5

CPU 2.2 Ghz

Lets say we have a server X setup with 2 x 10 cores.

You only have 2 VMs on the host to begin with.

VM 1 asks for 14 cores.

     -no reserve, normal shares

VM 2 asks for 10 cores.

     -this vm asks for 10Ghz res and asks for a higher share

During normal operation, and daylight hours, VM 1 uses 90% of it's CPU allowance on average.

VM 2 uses 60% on average.

CPU latency runs between 5-25%...What is the most likely potential problem?  Also what are a couple of potential solutions?

(Note: these VMs were migrated from another 6.5 host, which ran with 2 x 14 cores @ 2.4ghz, with little to no cpu latency)  Please feel free to write the answers in as blunt a way as possible cough.  :grinning_face_with_big_eyes:

Reply
0 Kudos
3 Replies
Alex_Romeo
Leadership
Leadership

Hi,

I suggest you give a reading to the document (from page 103):

Alessandro Romeo

Blog: https://www.aleadmin.it/
Reply
0 Kudos
CoffeeBlack
Contributor
Contributor

My answer would have been more along the lines of:

Well if you are seeing cpu latency and you are actively blocking out 10ghz to vm2 (making it unavailable to any other vm), and increasing the share value to vm2.  Then when vm1 makes a request for cpu resources there is a higher chance that it is going to get denied and you end up with cpu latency as a result (more or less dependent on the load or requests from vm2 and vm1, basically more so when both are above 50% vcpu usage).  So basically you should have purchased a server with more available cores...Or, you need to drop the reservation and equalize the share value (or potentially just equalize the share value).  Or simply drop the number of cores being handed to the VMs such that they can split up the physical CPUs equally rather then thrashing over some of them.

I would think dropping from 14 to 10* cores would fix it as well...but that's a little bit of a close shave if the vm1 needs a lot of processing power.  It just kinda depends on how much the latency is causing additional higher looking cpu usage right?

Would any of you have further thoughts given my answer(s)?

Reply
0 Kudos
CoffeeBlack
Contributor
Contributor

I'd further some of that...regarding how cpu requests are made in 6.5+.

If i'm requesting 14 cores (and i haven't changed the advanced options), because its greater then or equal to 10 cores, it is going to be split in to 2 numa nodes of 7 and 7.  So what does that mean when you have another request for 10 cores, with the changes to reservation and shares?  I assume it still gets split in to 2 numa nodes of 5 and 5, and that one of those 5's gets almost all of the vcpu it wants all of the time (due to the reservation of 10ghz, and at 2.2ghz * 5 or 11ghz for one numa node), but then the rest gets whatever it wants most of the time due to the increased share value.

So taking a step back...

10 cores * 2 sockets

numa nodes: 4 nodes, 1 at nn at 5 cores, another at 5 cores, 7 cores and 7 cores, for the two vm's. 

Lets say, nn 1 goes to CPU socket 1, and nn 2 goes to CPU socket 2...

Socket 1 gets:

nn 1

nn 3

Socket 2 gets:

nn 2

nn 4

nn3 requests CPU, but can't get any because 10ghz on socket 1 is already taken and can't be given out...so it then requests resources from Socket 2...On socket 2 it asks for resources, but its now fighting against nn2 which has a higher share value and nn4 which is requesting a high % of resources already...

It may be that its smart enough to realize that putting nn1 and 2 on the same socket may be the best option overall, but even then...it still runs poorly right (since you have 2 nn asking for 7 cores on a 10 core socket)...

Reply
0 Kudos