VMware Cloud Community
ckunkle
Contributor
Contributor
Jump to solution

ESX 4 reboots randomly

I have a fairly new ESX environment and one ESX server rebooting for some reason. It reboots once a day, or it could be once every two days, but randomly in the day. This has been happening for two weeks now.

I have two Dell PowerEdge R710 running ESX 4.0.0, 164009, so one works fine, the other reboots. I thought I had a problem with a UPS, so I moved the ESX server to another UPS and I still have the problem. I was first running ESXi 4 when I discovered the server rebooting, so I flattened both servers and installed with ESX and the problem followed me. I read some posts talking about VMs crashing the server due to bad RAM, so I ran memtest86 on the server for 10+ hours and the 12GB RAM came back clean. I SSH in to look for some PSoD logs, but I could not find any. The hardware status in virtual center looks clean, and there are no events under tasks & events that stand out. I currently only have two VMs running on this host, both running Windows Server 2008 x64. They are clean installs and are just being used as two tests so I can monitor when they go down in my NOC.

Does anyone have any suggestions on what steps I should take next?

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
Troy_Clavell
Immortal
Immortal
Jump to solution

my guess would be there is a hardware problem. ESX does not just reboot itself. If there was an ESX problem you would see dumps, so in your case since there are none, I would start looking at the hardware.

Get some hardware diagnostics CD's from your hardware vendor and start running them.

View solution in original post

0 Kudos
5 Replies
Troy_Clavell
Immortal
Immortal
Jump to solution

my guess would be there is a hardware problem. ESX does not just reboot itself. If there was an ESX problem you would see dumps, so in your case since there are none, I would start looking at the hardware.

Get some hardware diagnostics CD's from your hardware vendor and start running them.

0 Kudos
ckunkle
Contributor
Contributor
Jump to solution

Installed the Dell OpenManage 6.1.0 on the ESX servers. Dell Tech Support did not find any problems with the hardware by looking at the logs. Also, I sent them a DSET report and they did not find anything. They said to monitor it for the next couple of days to see if anything will show up in the OpenManage logs. I will let you know what I find.

0 Kudos
NTurnbull
Expert
Expert
Jump to solution

I would have thought this will down to hardware as already mentioned, but if you want to rule out the guests effect on the host then why not just swap the vm's running on the hosts over?

Thanks,

Neil

Thanks, Neil
0 Kudos
ckunkle
Contributor
Contributor
Jump to solution

I did that last night and only have one VM running on the problematic ESX server. I came in this morning and as I was walking down the hall, I heard the fans startup. I jumped into my KVM and didn't see anything (no PSoD), and just ESX starting back up. I waited a couple of minutes, logged into the Dell OpenManage, and there is nothing. No events on hardware and no interesting logs under the alerts tab.

I still kind of think it is related to my power supply (only have one in each server, trying to stay green, perhaps too green). We have the second ESX server for the very reason if the first one has problems, so it has worked out in that regard. So for testing purposes, I have swapped out power supplies on both ESX servers to see if the problem is going to move to the other server.

I will keep everyone posted on what happens next.

0 Kudos
ckunkle
Contributor
Contributor
Jump to solution

The other ESX server just rebooted. So this confirms it to be a power supply issue. Thank you for everyone's input.

0 Kudos