NSITPS
Contributor
Contributor

CPU ready times - ESX 4.0

Hi

Would you say a cpu ready time of 300ms is high enough to cause sluggish behavior in an SQL 2000 VM with 2vCPUs?

I have not checked the cpu ready time via esxtop as I believe this can not give you historical data?????

Thanks

0 Kudos
25 Replies
cchesley
Enthusiast
Enthusiast

The short answer is yes, that can cause noticeable performance with your SQL server. ESX top does not keep historical data. You will need to configure vCenter to keep the performance data longer or use a solution such as VKernel's Capacity Analyzer to get this historical analysis.

Chris






http://www.vkernel.com

http://www.vkernel.com
0 Kudos
proden20
Hot Shot
Hot Shot

You should trend your CPU Ready times with vCenter. Get a baseline CPU ready time when the server is performing acceptably and compare it against times of unacceptable performance.

Also, try to isolate the VM from running on the same host as other SMP VMs.

0 Kudos
NSITPS
Contributor
Contributor

The scenario:

1 x Dell R710 dual quad 2.96Ghz & 32GB of RAM – Storage,

Dell Equallogic PS6000 running RAID10 (I/O and throughput not coming close to

upper limit) with enterprise class Nortel switches in between (flow control etc

all configured)

This is a temp environment (next two weeks) soon to be 3 x

R710 and 2 x Equallogic Arrays.

1 x SQL VM – 2vCPU

1 x Exch 2k7 VM – 2vCPU

1 x F&P VM – 1vCPU

1 x SAN MGMT VM – 1vCPU (doing very little)

Although users have not complained about the performance of

the Exchange VM the CPU ready time is similar to that of the SQL VM (250ms).

Keeping the SMP VMs apart is not an option until a migration

to the new full blown environment (next two weeks).

1, How can I reduce the CPU ready time on the SQL VM to

prevent performance issues?

2, Seeing as the ESX server is not overcommitted (vCPU vs

Core count) why am I experiencing 300+ ms CPU ready time?

0 Kudos
proden20
Hot Shot
Hot Shot

I'm not sure CPU ready is your problem, and I don't think a value in that range is uncommon. Try setting a CPU reservation for the full value of two cores and see if that improves your performance. If it does not, then CPU ready is not your bottleneck. You may need more cores to cascade across, or have another issue.

0 Kudos
NSITPS
Contributor
Contributor

Proden - I take your commments on board and will test asap.

Can I ask this:

When looking at cpu ready times there is an object for the VM it's self and an object for each vCPU - The VM object is exactly double that of the 2 x vCPU objects - Does this mean that I should be concentrating on the actual vCPU results as opposed to the VM results? - If so then the vCPU results are 1/2 that of the VM which would lead me to think we don't have a issue with cpu ready?

0 Kudos
proden20
Hot Shot
Hot Shot

Well I don't think there is much that monitoring individual vcpu's will do for you, the whole VM must wait for cores to become available. When you have a VM with multiple cores, the vmkernel has to coschedule all of those cores, even if it only needs to execute on one. What I'm thinking is that reserving the full value of 2 cores will guarantee that CPU time on the host. I'm curious as to whether that will knock your CPU ready time down or even improve your SQL performance.

Was this machine ever physical?

Was this machine faster on VMWare before you added the other VM's?

Like I said earlier, I've seen apps (especially Oracle) use 50-60% of the 2 vCPU's given to it and users would complain. I would bump it up to 4 cores and voila, the thing is flying, using only 20% of each core. However, the coscheduling drawback I mentioned above is important too, and will boost CPU ready times for VM's on the same host. Use SMP only when it actually boosts performance.

NSITPS
Contributor
Contributor

Today I have been using esxtop to see real-time during the poor performance times on this customer site - Using esxtop cpu ready is not an issue at all (0.4% max).

The VM was physical and P2V'd (yes I know it is best to build from scratch following best practice etc, however this was not possible) over the weekend along with the other VMs - This being the case I never had the opportunity to test the system with just the SQL VM alone.

What I have noticed via esxtop is the %USED and % RUN values for this VM can be quite high (on average 45, sometimes as high as 130).

I have arranged to increase the vCPU within this VM from 2 to 4 this evening (by the way when this was physical it had 2 x dual core CPUs).

I am well aware I will not get the performance that I did on a physical server, however I must reduce this sluggish behaviour considerably.

Could I have your thoughts pls?

Thanks

0 Kudos
proden20
Hot Shot
Hot Shot

I am by no means an esxtop expert. Perhaps someone else can chime in. I recommend browsing the knowledge base for articles like this (esxtop):

http://kb.vmware.com/kb/1005362

I'm approaching this strictly from an observational troubleshooting standpoint. You can reserve the CPU allocation or up the count for this VM now for troubleshooting purposes, just to determine whether CPU is the problem. Doing this should give you performance very near a bare-metal installation. Large reservations could be detrimental to things like HA admission control slots when you cluster but you are not there just yet. If I were in your shoes, and I was convinced it was a CPU bottleneck, I would try to determine if in fact this VM needs more CPU reservation / core count than you've given it, then decide whether the server is a good virtualization candidate at all. Historically, I've found that databases require higher core counts or reservations in order to virtualize well, but you'll want to determine that for each particular VM before overallocating.

If you had 4 physical cores before I wouldn't be surprised if SQL performed much faster with 4 cores and some reservation, but I haven't seen your other metrics (memory, disk, network).

0 Kudos
NSITPS
Contributor
Contributor

The increase to 4 vCPU did not help - I also reserved the full GHz of all cores to the VM which again did not help - All other metrics are fine - watching the CPU today I note that it rarely crosses 30%, this being the case I am going to reduce the vCPU count down to one as a last attempt (I remember I guy mentioning he runs SQL 2000 VMs with a single vCPU as this gave the best performance) - If this does not help I will face the fact that this is not a good candidate and look to move back physical.

Will update tomorrow

0 Kudos
BenConrad
Expert
Expert

Are you also running Windows 2000? Is the CPU utilization high even when nothing is happening in the VM? Assuming you have multi vCPU and are running the multi-proc HAL in the guest.

If so, check out http://support.microsoft.com/kb/919521 and check out monitor.idleLoopSpinUS (VMWare KB 1730) as the fix.

Ben

0 Kudos
NSITPS
Contributor
Contributor

Hi

No, OS is W2K3R2 - Also running correct HAL.

Thanks

0 Kudos
proden20
Hot Shot
Hot Shot

I posted a KB article which should help you use esxtop to assess whether SMP is slowing you down....observation is obvious though.

0 Kudos
NSITPS
Contributor
Contributor

Can I have the link to the KB pls?

0 Kudos
proden20
Hot Shot
Hot Shot

http://kb.vmware.com/kb/1005362

Did 1vcpu work out for you?

0 Kudos
NSITPS
Contributor
Contributor

1 vCPU has certainly helped - No user complaints today, however a few users when asked did point out that some db queries were a little

slow but not unacceptable (not sure if this is just users being users as they know we had a problem). This is a huge improvement on 2 and 4 vCPU which

caused regular egg timers/no responding etc.

We are going to let this run until mid next week and review again before

making a decision (all being well).

Thanks for your help - Will update next week

0 Kudos
NSITPS
Contributor
Contributor

1 vCPU has certainly helped - No user complaints today, however a few

users when asked did point out that some db queries were a little

slow but not unacceptable (not sure if this is just users being users

as they know we had a problem). This is a huge improvement on 2 and 4

vCPU which

caused regular egg timers/no responding etc.

We are going to let this run until mid next week and review again before

making a decision (all being well).

Thanks for your help - Will update next week

0 Kudos
kopper27
Hot Shot
Hot Shot

I forgot to post this esxtop metric guide

prety helpful

http://www.yellow-bricks.com/esxtop/

it will give an idea about usual metrics we don't know

0 Kudos
Diljo
Contributor
Contributor

0 Kudos
NSITPS
Contributor
Contributor

Update - 1vCPU does not seems to have helped this cause.

However, I have been doing some further investigation around NUMA and have noted the following:

- Out of the 4 VMs on this host the 3 production (sql,exch,f&p) are all running on NUMA node 0

- NUMA node 0 has no free memory

- The problematic sql VM is accessing 96%!!! Of memory remotely from node 1

- There have been no migrations between nodes

1st The fact that 96% of memory is remote indicates is major bottle neck??

2nd According to all documentation the ESX scheduler should have migrated this VMs to the other NUMA node when a certain amount of memory is located remote - And I would think that 96% warrants this!! - Can anyone explain why this has not been migrated?

I am intending on setting CPU/NUMA affinity this evening to see if this helps with my SQL VM issue...

0 Kudos