I have a ESX Server with 4 vms on it that keeps rebooting itself intermittently. This only started a few weeks ago. I was wondering if there are any log files that I could have a look at to see why the reboot is happening. Have Vmotioned my vms onto the other ESX box in the cluster while I investigate the issue
We had a similar situation with a DL585 G2 doing the same thing ( rebooting for no reason ) ..
I uninstalled the Insight Manager ver 7.6 and installed 7.7.0-115 and we have had no issues so far and its been about 2 weeks.
ESX 3.01, 32039
Thanks
Since all the VMs are off the box, why don't you try reinstalling ESX? Should be a snap.
Also, post your configuration. I've read some posts here about QLogic HBAs causing random reboots.
Esx is on a HP BL25 blade with qlogic hbas. Was thinking of restoring ESX on this box. Are there any gotchas in doing that. Have heard horror stories about shared areas on the SAN being reformatted during the restore or something like that.
You don't need to worry about the SAN LUNs during a server rebuild, but to give yourself piece of mind just ensure the fibre card is disconnected during the install.
When you've finished, reconnect the fabric and you'll see the LUNs straight away.
Personally I'd just reinstall ESX, not restore it. Much quicker, guaranteed "clean". That's what we do at least when we have an issue.
Sorry that should have been reinstall not restore
timw18,
Check into updated HP firmware. We have seem similar "unplanned" ASR's and it turned out to be USB resource conflicts on our DL380's. The latest firmware addressed this problem. (the latest firmware for the bl25's also mentions updates to USB)
J
Will try updating to the latest firmware 7.70-0 and see how we go.
We've had 3 asr's in the last 4-6 weeks or so on BL25's running 2.53. Each one was related to bad dimms (Kingston). Replacing them resolved the problem. Check in sim or the homepage for any hardware problems. For us it was easy to determine as VC showed less memory available on the host after the asr.
Cheers Rob, no Kingston mem mods in our Blade and no available memory loss. Cant seem to be able to get onto insite manager on the ESX Server. Was working fine before which is strange!
Most likely is memory issue. even thought you are not using Kingston RAM or do not have memory lost. Try replacing the RAM first if possible.
Will have to do the work on this on Saturday as have had to move a prod server back on the blade for now and this can only be done out of hours. Will try the firmware upgrade first and then monitor the server. If it is still bouncing I will look at changing the memory. Thank you all for your help on this issue.
All of the CPU steppings the same on the blade?
Have a look to HP Sim agent version (if any).
In some cases we solved the problem upgrading from 7.60 to 7.70
Yes hardware issues most likely memory.
Could you post the logs....?
LOGS! Did you say LOGS?
How?
Where?
Did install Hp sim manger 7.60 on there and it was all working fine but since this problem started it hasn't been accessible. Will try reinstalling 7.70 as you suggested. You said you upgraded. Can I install 7.70 over the top?
Message was edited by:
timw18
I haven't changed the CPU settings at all since the server was first configured and this issue has only just started a few weeks ago
You should be able to look in the System Management Homepage if this is an HP server with the agents installed. Open your browser to serverip:2381. Log in the homepage using the root account and you should be able to determine if there are any hardware failures or if it was a hardware problem causing the reboot. If not, then see what you can find in the esx logs /var/log ... vmkernel, vmkwarning, messages dmessage, etc, etc ,etc.
This brings up a whole new problem. I am using winscp to access the files on the ESX server. I know that root has ssh disabled so added an new user on the ESX server and created it the same as root and put it in the sshd group. Using that user I can see the log files but when I try to view them I get permission denied. HP sim manager was working but is now not accessible.