Hi
Would you say a cpu ready time of 300ms is high enough to cause sluggish behavior in an SQL 2000 VM with 2vCPUs?
I have not checked the cpu ready time via esxtop as I believe this can not give you historical data?????
Thanks
The short answer is yes, that can cause noticeable performance with your SQL server. ESX top does not keep historical data. You will need to configure vCenter to keep the performance data longer or use a solution such as VKernel's Capacity Analyzer to get this historical analysis.
Chris
You should trend your CPU Ready times with vCenter. Get a baseline CPU ready time when the server is performing acceptably and compare it against times of unacceptable performance.
Also, try to isolate the VM from running on the same host as other SMP VMs.
The scenario:
1 x Dell R710 dual quad 2.96Ghz & 32GB of RAM – Storage,
Dell Equallogic PS6000 running RAID10 (I/O and throughput not coming close to
upper limit) with enterprise class Nortel switches in between (flow control etc
all configured)
This is a temp environment (next two weeks) soon to be 3 x
R710 and 2 x Equallogic Arrays.
1 x SQL VM – 2vCPU
1 x Exch 2k7 VM – 2vCPU
1 x F&P VM – 1vCPU
1 x SAN MGMT VM – 1vCPU (doing very little)
Although users have not complained about the performance of
the Exchange VM the CPU ready time is similar to that of the SQL VM (250ms).
Keeping the SMP VMs apart is not an option until a migration
to the new full blown environment (next two weeks).
1, How can I reduce the CPU ready time on the SQL VM to
prevent performance issues?
2, Seeing as the ESX server is not overcommitted (vCPU vs
Core count) why am I experiencing 300+ ms CPU ready time?
I'm not sure CPU ready is your problem, and I don't think a value in that range is uncommon. Try setting a CPU reservation for the full value of two cores and see if that improves your performance. If it does not, then CPU ready is not your bottleneck. You may need more cores to cascade across, or have another issue.
Proden - I take your commments on board and will test asap.
Can I ask this:
When looking at cpu ready times there is an object for the VM it's self and an object for each vCPU - The VM object is exactly double that of the 2 x vCPU objects - Does this mean that I should be concentrating on the actual vCPU results as opposed to the VM results? - If so then the vCPU results are 1/2 that of the VM which would lead me to think we don't have a issue with cpu ready?
Well I don't think there is much that monitoring individual vcpu's will do for you, the whole VM must wait for cores to become available. When you have a VM with multiple cores, the vmkernel has to coschedule all of those cores, even if it only needs to execute on one. What I'm thinking is that reserving the full value of 2 cores will guarantee that CPU time on the host. I'm curious as to whether that will knock your CPU ready time down or even improve your SQL performance.
Was this machine ever physical?
Was this machine faster on VMWare before you added the other VM's?
Like I said earlier, I've seen apps (especially Oracle) use 50-60% of the 2 vCPU's given to it and users would complain. I would bump it up to 4 cores and voila, the thing is flying, using only 20% of each core. However, the coscheduling drawback I mentioned above is important too, and will boost CPU ready times for VM's on the same host. Use SMP only when it actually boosts performance.
Today I have been using esxtop to see real-time during the poor performance times on this customer site - Using esxtop cpu ready is not an issue at all (0.4% max).
The VM was physical and P2V'd (yes I know it is best to build from scratch following best practice etc, however this was not possible) over the weekend along with the other VMs - This being the case I never had the opportunity to test the system with just the SQL VM alone.
What I have noticed via esxtop is the %USED and % RUN values for this VM can be quite high (on average 45, sometimes as high as 130).
I have arranged to increase the vCPU within this VM from 2 to 4 this evening (by the way when this was physical it had 2 x dual core CPUs).
I am well aware I will not get the performance that I did on a physical server, however I must reduce this sluggish behaviour considerably.
Could I have your thoughts pls?
Thanks
I am by no means an esxtop expert. Perhaps someone else can chime in. I recommend browsing the knowledge base for articles like this (esxtop):
http://kb.vmware.com/kb/1005362
I'm approaching this strictly from an observational troubleshooting standpoint. You can reserve the CPU allocation or up the count for this VM now for troubleshooting purposes, just to determine whether CPU is the problem. Doing this should give you performance very near a bare-metal installation. Large reservations could be detrimental to things like HA admission control slots when you cluster but you are not there just yet. If I were in your shoes, and I was convinced it was a CPU bottleneck, I would try to determine if in fact this VM needs more CPU reservation / core count than you've given it, then decide whether the server is a good virtualization candidate at all. Historically, I've found that databases require higher core counts or reservations in order to virtualize well, but you'll want to determine that for each particular VM before overallocating.
If you had 4 physical cores before I wouldn't be surprised if SQL performed much faster with 4 cores and some reservation, but I haven't seen your other metrics (memory, disk, network).
The increase to 4 vCPU did not help - I also reserved the full GHz of all cores to the VM which again did not help - All other metrics are fine - watching the CPU today I note that it rarely crosses 30%, this being the case I am going to reduce the vCPU count down to one as a last attempt (I remember I guy mentioning he runs SQL 2000 VMs with a single vCPU as this gave the best performance) - If this does not help I will face the fact that this is not a good candidate and look to move back physical.
Will update tomorrow
Are you also running Windows 2000? Is the CPU utilization high even when nothing is happening in the VM? Assuming you have multi vCPU and are running the multi-proc HAL in the guest.
If so, check out http://support.microsoft.com/kb/919521 and check out monitor.idleLoopSpinUS (VMWare KB 1730) as the fix.
Ben
Hi
No, OS is W2K3R2 - Also running correct HAL.
Thanks
I posted a KB article which should help you use esxtop to assess whether SMP is slowing you down....observation is obvious though.
Can I have the link to the KB pls?
http://kb.vmware.com/kb/1005362
Did 1vcpu work out for you?
1 vCPU has certainly helped - No user complaints today, however a few users when asked did point out that some db queries were a little
slow but not unacceptable (not sure if this is just users being users as they know we had a problem). This is a huge improvement on 2 and 4 vCPU which
caused regular egg timers/no responding etc.
We are going to let this run until mid next week and review again before
making a decision (all being well).
Thanks for your help - Will update next week
1 vCPU has certainly helped - No user complaints today, however a few
users when asked did point out that some db queries were a little
slow but not unacceptable (not sure if this is just users being users
as they know we had a problem). This is a huge improvement on 2 and 4
vCPU which
caused regular egg timers/no responding etc.
We are going to let this run until mid next week and review again before
making a decision (all being well).
Thanks for your help - Will update next week
I forgot to post this esxtop metric guide
prety helpful
http://www.yellow-bricks.com/esxtop/
it will give an idea about usual metrics we don't know
As this is a P2V ed Server I think this KB Article is help ful
Update - 1vCPU does not seems to have helped this cause.
However, I have been doing some further investigation around NUMA and have noted the following:
- Out of the 4 VMs on this host the 3 production (sql,exch,f&p) are all running on NUMA node 0
- NUMA node 0 has no free memory
- The problematic sql VM is accessing 96%!!! Of memory remotely from node 1
- There have been no migrations between nodes
1st The fact that 96% of memory is remote indicates is major bottle neck??
2nd According to all documentation the ESX scheduler should have migrated this VMs to the other NUMA node when a certain amount of memory is located remote - And I would think that 96% warrants this!! - Can anyone explain why this has not been migrated?
I am intending on setting CPU/NUMA affinity this evening to see if this helps with my SQL VM issue...