I've been running ESXi 3.5.0 since february/march now with 5+ debian guests on a HP Proliant G3 (1xCPU - Xenon 2.7, 2.5gb mem, 140gb raid5) made from a template ready with vmtools and custom 2.6.29 kernel but been having some problems, but I'm not sure where to look for clues. The host is running avg on 50% cpu and 1500mb ram utilized.
2 of my vm's random crashes with no possible interaction method without doing a shutdown/start/reset using the infrastructure client. I can open the console for the non responsive VM, but there is no response within it. There is no reason to think there is a pattern in when they do stop working as far as i know.
VM1: Nagios with 100 hosts and a total of 100 service checks.
VM2: MySQL 5.0. Databases for nagios (own vm), cacti (own vm) and wiki (own vm)
I was thinking in the direction that the database interaction going over tcp was causing the problem, but I can't see anything in any logs that support it. I have tried to disable the nagios database support and running it totally local, but it still crashed without any notice from 1-10 days uptime. There is no load on the vm's and they are not running out of memory.
The MySQL vm crashed yesterday again and the only thing that screams for attention is this from the vm vmware.log (after reboot):
May 22 06:09:56.077: vcpu-0| TOOLS unified loop capability requested by 'toolbox'; now sending options via TCLO
May 22 06:10:13.461: vcpu-1| GuestRpc: Channel 2, conflict: guest application toolbox tried to register, but it is still registered
I'm not sure what my next step should be at this point so hopefully some of you have some pointers
Thank you in advance!
Are you using VMI paravirtualization?
Are VMware Tools installed?
Note also that your physical system is very "small" (1 CPU, only 2,5GB RAM) so you can have some critical performance issue.
**if you found this or any other answer useful please consider allocating points for helpful or correct answers
Thank you for your reply Andre,
I did not have VMI enabled as of the last crash, but enabled it after making my post here. Also enabled the debug log fuction. I enabled VMI for all my running VM's at the same time.
I installed VMware Tools on the template image, and it seem to load correctly on all of the clones. So yes, Its running.
esxtop doesn't state any clear problems concerning cpu / memory, but that could of course be wrongly read by me. Since my esxi lic only supports 1 CPU there isn't much I can do about the CPU situation at the moment. RAM, as I said there is no clear facts that support it being a RAM problem. 1500 MB in use on AVG and the VM's that crash does not seem to use the reserved amount (512MB). The remaining VM's have 256MB reserved but are very idle 99% of the time.
I am eager to check back tomorrow, hopefully to find everyone up and running. If so It might be VMI to thank for that. If not, the log might be able to tell more this time around.
Would appreciate more thoughts if you guys have any
Random vm crashes are also signs of hardware problems. Is it possible to perform hardware diagnostics? memtest can be run to verify all of the memory is good. Granted you're not "seeing" any memory issues, but that doesn't mean you don't have any.
Post moved to the ESXi forum
If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points
Tom Howarth VCP / vExpert
VMware Communities User Moderator
Contributing author for the upcoming book "[VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment|http://my.safaribooksonline.com/9780136083214]”. Currently available on roughcuts