VMware Cloud Community
TurboIT
Contributor
Contributor
Jump to solution

Poor VM performance. %RDY issues

I just upgraded an environment to ESXi 4.1, and within a week I started getting complaints that the some VMs were slow. Using ESXTOP the %RDY time is generally great, as in single digits. Then there are times where it spikes up to 80% or hangs around 50% for 30 seconds or so. I noticed not all VMs have VMTools up to date, could this possibly cause CPU scheduling issues?

I never monitored this environment in the past, so I dont know if they were having any CPU contention previous to this upgrade or not, I'm strictly going by user feedback.

Reply
0 Kudos
1 Solution

Accepted Solutions
FredPeterson
Expert
Expert
Jump to solution

I just upgraded an environment to ESXi 4.1, and within a week I started getting complaints that the some VMs were slow. Using ESXTOP the %RDY time is generally great, as in single digits. Then there are times where it spikes up to 80% or hangs around 50% for 30 seconds or so. I noticed not all VMs have VMTools up to date, could this possibly cause CPU scheduling issues?

I never monitored this environment in the past, so I dont know if they were having any CPU contention previous to this upgrade or not, I'm strictly going by user feedback.

No, the back-dated VMware Tools would not interfer. In fact you don't even need the VMware Tools installed to have a normally functioning VM. You just miss out on advanced features and performance gains unrelated to the CPU. The CPU is the CPU there is no driver for it specific to VMware. Ready values are pretty specific - the VM has something to do CPU wise and its waiting for CPU access.

How many physical cores are in the host and what are they (Intel 5500 or 7300 etc)? How many virtual machines? How many virtual CPUs are assigned to the host? How many of the VMs are multi-CPU? What is the most vCPU per VM on the host?

View solution in original post

Reply
0 Kudos
6 Replies
FredPeterson
Expert
Expert
Jump to solution

I just upgraded an environment to ESXi 4.1, and within a week I started getting complaints that the some VMs were slow. Using ESXTOP the %RDY time is generally great, as in single digits. Then there are times where it spikes up to 80% or hangs around 50% for 30 seconds or so. I noticed not all VMs have VMTools up to date, could this possibly cause CPU scheduling issues?

I never monitored this environment in the past, so I dont know if they were having any CPU contention previous to this upgrade or not, I'm strictly going by user feedback.

No, the back-dated VMware Tools would not interfer. In fact you don't even need the VMware Tools installed to have a normally functioning VM. You just miss out on advanced features and performance gains unrelated to the CPU. The CPU is the CPU there is no driver for it specific to VMware. Ready values are pretty specific - the VM has something to do CPU wise and its waiting for CPU access.

How many physical cores are in the host and what are they (Intel 5500 or 7300 etc)? How many virtual machines? How many virtual CPUs are assigned to the host? How many of the VMs are multi-CPU? What is the most vCPU per VM on the host?

Reply
0 Kudos
TurboIT
Contributor
Contributor
Jump to solution

8 physical cores (4 x AMD Opterton 8218s). Approximately 50 VMs per host. All VM's are single vCPU.

Reply
0 Kudos
FredPeterson
Expert
Expert
Jump to solution

In other words, pushing just over 6:1. Great consolidation ratio - but it sounds like it might be too much for the host. Is this a VDI implementation or just a standard virtualization setup? Also the 8212 is "only" 2GHz and 4 years old. BIG changes in CPU from Intel and AMD since then. Maybe look at a CPU upgrade if possible. I know AMD tends to be more customer friendly about sockets when it comes to CPU upgrades.

I think your next step is to analyze CPU usage statistics - but I think you may just end up needing to get another host (or CPU upgrade) regardless of what stats gathering says about CPU usage.

You only need to have like 10-15% of the VMs on the host running 50-100% CPU for the host to get bogged down.

Also, I can't believe the upgrade caused this, if anything it should have improved.

J1mbo
Virtuoso
Virtuoso
Jump to solution

Agreeing entirely with what has been posted already, perhaps CPU resource limits might help in the interim.




http://blog.peacon.co.uk

Please award points to any useful answer.

Unofficial List of USB Passthrough Working Devices

Reply
0 Kudos
TurboIT
Contributor
Contributor
Jump to solution

That is kinda what I was afraid of. Trying to explain to others that resources may be tapped, even when under the summary tab the host shows 30-50% used, is hard to do. There is a heck of a learning curve to troubleshooting performance issues. At so many levels things look great, only to find that one reading somewhere else completely counters everything else you've seen.

Thanks for the input guys.

Also, this cluster is for VDI, not general virtualization. I have users screaming at me Smiley Happy

Reply
0 Kudos
FredPeterson
Expert
Expert
Jump to solution

Thats the tricky thing about CPU utilization. Looks fine on the surface, theres room to grow!

The CPU might not be that utilized, but if you have 50 VMs all vying for 2% CPU at the same time hey thats great 100% utilization! Nope! That 2% is going to go by so quickly for each VM, the actual utilization will be lower - but the CPU contention, where Ready Values come into play, has been pushed up big time.

A good comparison for management is a turnstile - you know the thing that counts people as they walk through and it flips over letting one person through at a time. When you have 8 of them, 8 people can go through immediately, 16 people with a little wait, 32 people some wait, 50 people, constantly in line? Considerable wait times, even if one lane speeds up and people move lanes and people move efficiently and share nicely, as one person exits, another replaces him. Its a FIFO-BIL - First In First Out - Back In Line Smiley Happy The turnstile itself has to flip around, you can't piggyback Smiley Happy That minor timeslice for the turnstile to accept another person could be considered the context switch event of the virtual world - it exists, its minor, but its part of the big picture.

You may also want to consider VDI guest OS optimization if you havn't already. http://www.brianmadden.com/blogs/ronoglesby/archive/2010/09/22/does-os-quot-tuning-quot-help-vdi-per... The article uses Windows 7 as the guest, but many of the same ideas, and services, are pretty much the same.

Good luck.