We have loaded quad-socket dual-core IBM x3650 hosts with between 30 and 36 VMs on each. 2/3 of the VMs are SLES 9, the others are W2K3 sp2. The majority of the VMs are idle with only 2-3 per host doing anything regularly. The hosts average 30%-40% utilization. Typing in an SSH session on the linux VMs is horrible and when I put load on the systems, such as starting DB2, it takes 10 times longer than on an empty host and ready times start to climb. The Windows, VMs, however seem to deliver a much better experience. The CPU utilization looks like I have headroom, but the experience seems otherwise. I'm starting to think we are hitting a wall due to the Linux timer interrupts. I've read all the performance, ready-time and timekeeping docs as well as the VMWorld ESX CPU scheduling doc and audio; they mention that the excessive Linux interrupts can cause scalability issues, however I can't find any hard numbers. Can any of you provide any guidance.
You're raising a very valid issue here. The windows system timer is 100Hz and the linux timer in some of the newer distros is now 1000Hz (including SLES). Changing the timer interrupt to 100Hz and using UP kernels instead of SMP seems to have quite dramatic effects on the consolidation ratio. Please also have a look at the following bug report I filed regarding this same issue in CentOS, and what you can gain by changing the kernel:
Thanks Lars. You have definately shown that the lower timer interrupts lower cpu and ready times for the VM. I don't think recomping the kernel is an option for us, however we'll look into it.
Do the clock interrupts effect scaling in a different way than "normal" cpu loads? Does the ESX scheduler process these clock interrupts differently than a normal cpu load? For example, an idle Linux VM is consuming approximatly 6%. If all of these VMs were lowered to a 100Hz clock rate and then had an additional normal cpu load on them to bring the utilization up to 6%, would the processor schedule them the same?
It just seems like my 8 core host running at 40% utilization should have more headroom than I am experiencing.
If you can't customize your kernel, you could also see from that report that adding the kernel parameters "nosmp noapic nolapic" would make the cpu and ready times be approx half of what they were originally.
The parameter you should be watching the most carefully is the %READY times for your VMs. %READY tells you how often VM doesn't get scheduled when it wants to be scheduled. High load on the ESX server can also bring up %READY, but the limit on esx3 is much higher (80% avg load) than it used to be on esx2 (60%) before %READY starts to rise dramatically.
%READY is a very important metric. Unfortunately, it's very hard to monitor with Virtual Center. esxtop from the command line on a host will show real-time statistics.
I've had many times (especially in the 2.5.x days) where my host was only showing 50% CPU utilization, but performance was terrible on multiple VMs. Checking %READY always showed the poor-performing hosts with values in the teens and twenties - waiting for a free CPU cycle
It really bugs me that we cannot monitor this metric very well with the tools VMware provides and that DRS does not appear to take it into account at all.