timw18
Enthusiast
Enthusiast

ESX Server intermittently rebooting

Jump to solution

I have a ESX Server with 4 vms on it that keeps rebooting itself intermittently. This only started a few weeks ago. I was wondering if there are any log files that I could have a look at to see why the reboot is happening. Have Vmotioned my vms onto the other ESX box in the cluster while I investigate the issue

0 Kudos
1 Solution

Accepted Solutions
vmware_lic
Enthusiast
Enthusiast

We had a similar situation with a DL585 G2 doing the same thing ( rebooting for no reason ) ..

I uninstalled the Insight Manager ver 7.6 and installed 7.7.0-115 and we have had no issues so far and its been about 2 weeks.

ESX 3.01, 32039

Thanks

View solution in original post

0 Kudos
29 Replies
ORRuss
Enthusiast
Enthusiast

Since all the VMs are off the box, why don't you try reinstalling ESX? Should be a snap.

Also, post your configuration. I've read some posts here about QLogic HBAs causing random reboots.

timw18
Enthusiast
Enthusiast

Esx is on a HP BL25 blade with qlogic hbas. Was thinking of restoring ESX on this box. Are there any gotchas in doing that. Have heard horror stories about shared areas on the SAN being reformatted during the restore or something like that.

0 Kudos
MR-T
Immortal
Immortal

You don't need to worry about the SAN LUNs during a server rebuild, but to give yourself piece of mind just ensure the fibre card is disconnected during the install.

When you've finished, reconnect the fabric and you'll see the LUNs straight away.

Paul_B1
Hot Shot
Hot Shot

Personally I'd just reinstall ESX, not restore it. Much quicker, guaranteed "clean". That's what we do at least when we have an issue.

0 Kudos
timw18
Enthusiast
Enthusiast

Sorry that should have been reinstall not restore

0 Kudos
hicksj
Virtuoso
Virtuoso

timw18,

Check into updated HP firmware. We have seem similar "unplanned" ASR's and it turned out to be USB resource conflicts on our DL380's. The latest firmware addressed this problem. (the latest firmware for the bl25's also mentions updates to USB)

J

0 Kudos
timw18
Enthusiast
Enthusiast

Will try updating to the latest firmware 7.70-0 and see how we go.

0 Kudos
Rob_Bohmann1
Expert
Expert

We've had 3 asr's in the last 4-6 weeks or so on BL25's running 2.53. Each one was related to bad dimms (Kingston). Replacing them resolved the problem. Check in sim or the homepage for any hardware problems. For us it was easy to determine as VC showed less memory available on the host after the asr.

0 Kudos
timw18
Enthusiast
Enthusiast

Cheers Rob, no Kingston mem mods in our Blade and no available memory loss. Cant seem to be able to get onto insite manager on the ESX Server. Was working fine before which is strange!

0 Kudos
ko
Enthusiast
Enthusiast

Most likely is memory issue. even thought you are not using Kingston RAM or do not have memory lost. Try replacing the RAM first if possible.

0 Kudos
timw18
Enthusiast
Enthusiast

Will have to do the work on this on Saturday as have had to move a prod server back on the blade for now and this can only be done out of hours. Will try the firmware upgrade first and then monitor the server. If it is still bouncing I will look at changing the memory. Thank you all for your help on this issue.

0 Kudos
kix1979
Immortal
Immortal

All of the CPU steppings the same on the blade?

Thomas H. Bryant III
0 Kudos
fsecchia
Contributor
Contributor

Have a look to HP Sim agent version (if any).

In some cases we solved the problem upgrading from 7.60 to 7.70

0 Kudos
GlenMarquis2
Enthusiast
Enthusiast

Yes hardware issues most likely memory.

Could you post the logs....?

0 Kudos
timw18
Enthusiast
Enthusiast

LOGS! Did you say LOGS?

How?

Where?

0 Kudos
timw18
Enthusiast
Enthusiast

Did install Hp sim manger 7.60 on there and it was all working fine but since this problem started it hasn't been accessible. Will try reinstalling 7.70 as you suggested. You said you upgraded. Can I install 7.70 over the top?

Message was edited by:

timw18

0 Kudos
timw18
Enthusiast
Enthusiast

I haven't changed the CPU settings at all since the server was first configured and this issue has only just started a few weeks ago

0 Kudos
Rob_Bohmann1
Expert
Expert

You should be able to look in the System Management Homepage if this is an HP server with the agents installed. Open your browser to serverip:2381. Log in the homepage using the root account and you should be able to determine if there are any hardware failures or if it was a hardware problem causing the reboot. If not, then see what you can find in the esx logs /var/log ... vmkernel, vmkwarning, messages dmessage, etc, etc ,etc.

0 Kudos
timw18
Enthusiast
Enthusiast

This brings up a whole new problem. I am using winscp to access the files on the ESX server. I know that root has ssh disabled so added an new user on the ESX server and created it the same as root and put it in the sshd group. Using that user I can see the log files but when I try to view them I get permission denied. HP sim manager was working but is now not accessible.

0 Kudos