We use NAGIOS to graph the CPU consumed by all of our VM's, it's very apparent that over time, with no activity on the VM, the amount of CPU consumed per 5 minutes increases?
This is on VM's running both Linux and Windows.
Any ideas on this, has anybody else noticed it?
I have the exact same behavior, with the same load, my vm's CPU usage on the host increase all the time. After two month it is now 5x the normal amount. Shutting down the vms and restarting them bring back a "normal" CPU usage. Restarting the guest OS is not enough, the vmware guest process needs to be stopped and a new process created.
There is defenetively something very wrong here.
RHEL4 host with Linux and Windows server 32bits guests (1 CPU per guest).
Hi,
I'm not sure I understood well how use monitor your VM... Does nagios look the cpu time consumed by the vm thread or does it look at the global cpu consumed inside a vm?
I'm running long time vms and just begin to migrate from VMS 1 to VMS2... So this worries me a bit:)
I have exactly the same problem. I graph my host's CPU usage with Cacti, and here are my results:
Has anyone found a solution, or evern a workaround to this problem? I've just bought a replacement CPU for the host server - faster clock speed and Intel-VT is supported, so I am planning to migrate all VMs to 64-bit, to see if that has any effect.
Just wondering if VMWare has picked up on this one yet? This particular bug is becoming unworkable here.
I'm having this problem as well, as seen in this thread:
http://communities.vmware.com/thread/217345?tstart=0
This problem is becoming chronic, as I have to reboot all of my vmware servers about once a month to keep acceptable performance. Anyone have any ideas?
Yep. Same problem here. When I was on 1.x with the same number of VMs and the same server my host stayed up for months (almost a year!) with no issues. Now with server 2.x it looks like I may have to hard reboot (power on/off) the VMs every 2-4 weeks as my CPU usage starts climbing dramatically. This is the only fix that works. The funny (or frustrating thing) is that the VMs themselves CPU usage will report 0-5% or so, but the vmx process on the server reports 25%. This seems to indicate a problem with vmware server itself. Sounds like a big bug and reduces my confidence in server. And I used to LOVE vmware server pre 2.x days
BTW my layout is 64bit RHEL 5.x on the host, 16GB RAM, 8 VMs with mix of 32/64bit Windows and Linux. Both my production and development server show the same symptoms and they are different hardware.
One of my graphs (using Zabbix - this is user %CPU over time):
Hmmm, this is getting really frustrating now.
I'm just about to move my last VM over to 64-bit, in the hope that that will cure the problem. I'm not entirely hopeful that it will, though.
Rob
Please let us know your results. It would interesting if the problem was a result of mixing 23/64 bit guests - not a whole lot of help for those of us that have to run 32 guests, but at least we have a possible culprit.
To vmware : Is there any official method of reporting bugs to vmware besides hoping they see this in the forums? I'll gladly provide a detailed report if it helps get this fixed.
I think you need a support contract to have a chance of them fixing your bug in a reasonable timeframe.
I think our vmware server support contract was like 300 bucks or something cheap.
It doesn't seem like this problem is isolated enough to warrant having to pay to fix it. It seems like everyone's problem. Are you not having this problem, LucasAlbers?
I have seen similar problem, but in my case it's the memory that's reaching the end.
I have a system with 32 GB memory and only 5 Active Clients.
After a while (1 Week) all the memory is used, but when I check the memory us of the clients it doesn't add up?
Is the CPU using you all see contributed to swapping? Perhaps we all have the same error, and it's a memory leak somewhere?
HelgeAlg, I don't think that issue is related (or at least we do not see it here). Memory usage remains consistent for us at 99% almost all the time.
CPU on the other hand continues to grow at a constant rate.
You can see at the start of week 41 we stop one VM, another at the end of week 41 and yet another at the start of week 42. And yet, growth continues at a rate of around 10% a week.
This is with no load change on the individual VMs at all!
Of my 4 vmware 1.0 server's and 4 gsx 3.2 server's, I have not encountered this problem. They get rebooted once every 1 to 2 years, when we have catastrophic power failures caused by a hardhat digging through the electrical grid.
Of my two vmware 2.0 servers's they run client vm's that get started and stopped on a weekly basis, so they don't run long enough to encounter this issue.
Can't you restart the vm server service without restarting the vm's? I remember kicking the service when the hostd or what not vmware host service was high, without restarting the vm's. At that point does the cpu utilization for the server drop?
This is tricky issue to resolve, I think the only method that jumps to mind is to compare the timings of an strace run of the host process.
But even then the information is too coarse to be of much use.