I have a situation where a particular program that I run spawns off a whole bunch of processes and then the system grinds to a halt as all the processes cause the CPU to peg at 100% wait time. On a physical computer this problem is not so bad and the startup time is several seconds. This problem is also not present on VMware Server 2. However, in ESXi the startup time of this program is around 6 minutes because the CPU is stuck in the waiting state. When I check the CPU In the VMware client it is reported as running around 70MHz the whole six minutes of startup
So, my question: Is there a way to mitigate the issue of having the CPU stuck in the wait state? Why do I not see this issue in VMware Server?
What are the specs of - 1) your ESXi host? 2) your VM running this program? 3) how many VM's do you have running on your ESXi host and how many of them are multi-processor VMs?
Kyle
Are you using an 2/4 SMP virtual machine? Did you create the vm new, or did you run a conversion? If you are running an SMP virtual machine, then slide the vm down to a single CPU vm and see if the boot time improves.
-KjB
If they are in WAIT state, you need to figure out what they are waiting on? Is it disk? a mutex lock? network?
--Matt
Let me see if I can answer all the questions.... The VMware host is a Dell 1950 with two quad core CPU's running at 3.1GHz with 32GB RAM. The 1950 has four hard drives in a RAID 5 configuration. The individual VM's are all single core with 1.5GB RAM. I have 25 of these VM's per host. Each of them is running CentOS 5.2 with VMware tools installed. I created the initial VM from scratch, but did not install an OS. I then exported that "template" and imported it as many times as I needed. In each of those new VM's an OS was installed from scratch. Once the processes are up and running they are not very processor or memory intensive, it is just the initial startup. I agree that this is a programming issue and I am trying to get the developers to take a look, but the developers for this project are not known to be very friendly. Would it help speed the process if the VM CPU was running faster than 70MHz or so during this wait period?
How busy is your ESX host? You're basically taking your 8 cores, and slicing them up 25 ways. So, you're oversubscribing on the cores, which is usually fine, provided you have the capacity that you need. You say the vm cpu runs at 70 MHz, but what is the ESX host look like during that bootup phase? Are there networking issues involved which would cause the CPU to wait while the app comes up? Does it timeout on a connection before it will fully come up?
-KjB
Looking at the ESX host performance I actually don't see any issues. There is a slight CPU and network spike right after I hit enter the start the program, then they both settle down again to their nominal baseline. The ESX host shows all eight cores hovering around 7% average usage. The peak network usage is 67KBps, which is well below the 1GBps link speed. Disk performance is nominally 280KBps with a brief spike to 623KBps.
Again, I agree that this is probably due to bad programming and I suspect there are a lot of resource locks pending during this time. However, I think that maybe ESX could handle CPU wait time better and throttle up the CPU speed to help accelerate the process that is holding the resouce captive.
Have you run some traces to see where the process is hanging? Something like strace would help on that front. Other than that, ESX isn't limiting you to 70 MHz on boot. I have several servers that will spike to 3 GHz on boot, and then go back down after it's up.
-KjB
Ok. It's a Redhat core and there's no real resource constraints.
Is the application started with init.d or a user script?
If it's init.d based did you make sure it's the last to load on runlevel 5 ?
and do you have any wait's on it in the /etc/inittab e.g. l6:6:wait:/etc/rc.d/rc.5 ?