I would reccomend creating a simple Bash script that when run, puts the contents of a "ps -ef" and perhaps a "pstree -pG" into some text files. Then set this up as a cron job to run every so often. (Depending on how often the crashes occur)
Then, when the system locks up, you can go back to these files that you created, and see what was going on.
Here is a VERY rough idea of what I mean:
ps -ef >> /var/log/process_crash.log
pstree -pG >> /var/log/pstree_crash.log
Note, I have not tested the above, as I am not near a shell to do so.
1. Log into the Host OS and disable as many services as possible. Do the same on the guest OS's.
2. Run memtest86+ on the Host machine. http://www.memtest.org/ . I had an older box running VMWare Server with one VM, and it kept crashing. I ran memtest on it, and found that some of the RAM was bad.
Also, is the Host OS running anything other than VMWare? If so, get rid of it. The Host OS should be as minimal as possible.
I've already disabled all services not necessary or being used. The hosting company switched hardware yesterday so that should rule out memory problems if that is the case.
Should I remove modules in the kernel like the sound card since they are not being used anyhow?
Just to clarify there are a number of VM guests with small amounts of RAM that are running Windows OS, on continuously for a week then crash.
I appreciate that the physical host is crashing however it may simply be that a number of VM guests crashing in rapid succession is having an effect on VMWare Server and therefore the host. The only way to prove that one is to check the Windows logging on the VM guests and see what Windows on each VM guest thinks is going on and whether these fall in line with times in the linux logs on the host.
I am not sure this has been helpful but the logs might explain more about why the crashes are occuring.
Hi, I am having the same issue on my Debian Host.
It is a complete crash. I have physical access to the host, and not even the numlock is responding.
I have setup vmware-server on the same physical hardware, but with diffirent linux os for the host, but the issues remains the same. The host os's I have tried include fedora7, opensuse10.3, redhat9 and archlinux.
I am able to run as many linux guest machines as I want and all is well, except maybe speed. but as soon as I run a Windows2003 guest os, even on it's own, my HOST will crash for no apparent reason...
I have attached the logs for your interest:
The host works fine before it "just dies"!
I have attached a linux guest log just for fun - it doesn't teach me anything.
I have also attached the windows2003 log - for what it is worth.
Any addition for feedback or suggestion that anybody might have will be greatly appreciated
Logs.zip 1.9 K
Well it seems that there was nothing i could do to solve the issue accept install a different OS.
All the issues we were having was with Centos 5.1 installed.... installed Ubuntu 8.04 and host has been up with no issues for about 2 months now.
Not sure what the go was, possibly some issues with hardware/drivers that centos didnt like. All is well now though, so unless you are completely against it, i would recommend trying Ubuntu 8.04 TLS and see if your issues are resolved.