_jmorgan_
Contributor
Contributor

ESX Hosts Crashing

I am currently running 3.0.2 I am having this problem where once a month my host machines are crashing. See print screen! Logs dont say much but if needed i can post. My hosts were fine for the first 6 months or so that they were in, only thing that has changed since is I upgraded VC to 2.5. I was planning on upgrading these hosts to 3.5 eventually but not yet. Any ideas?

Tags (1)
0 Kudos
8 Replies
lamw
Community Manager
Community Manager

PSOD usually is due to hardware failure, usually memory or CPU. You can see that a coredump was produced and from what I can tell, looks like possibly bad CPU. You'll want to take a look at your hardware to see if there's any listed faults HP/Dell/other vendor and hopefully you have hardware monitoring like SIM or something. Once the part is replaced, you'll be fine.

0 Kudos
weinstein5
Immortal
Immortal

Is this a single ESX or multiple that are exhibiting this behavior - if it is multiple then I would look for common points of failure - such as FC switch - I do not think it is the VC Upgrade that is causing this - to many customers are running VC 2.5 with 3.0 hosts -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
_jmorgan_
Contributor
Contributor

It is multiple hosts, currently 3 hosts in my farm all doing the same thing, luckily not at the same time. All hosts are Dell 6850's I should be able to install Dell Open Manage on these and check for hardware failures.

0 Kudos
mcowger
Immortal
Immortal

If its all 3 hosts, I would check with dell for an available BIOS upgrade.

Either that or an environmental issue (power flucuations, cooling problems)

--Matt

--Matt VCDX #52 blog.cowger.us
0 Kudos
_jmorgan_
Contributor
Contributor

I think you might be on the right track with the environmental issue. We have had some heat problems in our data center this summer which just got remediated this week. I just installed the Dell Server Administrator and dont see any hardware failures or anything of interest in the logs. Thanks for everyones ideas!!

0 Kudos
IB_IT
Expert
Expert

How may Perc controllers per host do you have installed? an interesting doc from Dell...

This indicates a limitation in ESX 3.0.x for multiple percs...but this indicates an issue for the 6950's...not 6850's...still worth a closer look.

0 Kudos
_jmorgan_
Contributor
Contributor

Well I was hoping to blame this on the cooling problems we have been having but our data center has been 68 degrees for the past two weeks and i just had another host PSOD on me tonight.

Yes that article is very interesting, thank you for posting. But yes i only have one perc and its a 5/i not a 5/e mentioned in the article also i have 6850's, but who knows maybe still possible to happen. A upgrade to 3.5 might be in my future sooner then i was planning.

Anyone have any idea's? Support would be nice and i would call if it didnt run out 6 months ago. Also the Dell Server Administrator shows no hardware failures in the logs.

Thanks

Jarad

0 Kudos
Atko
Enthusiast
Enthusiast

What type of shared storage are your hosts connected to? We had a similar problem with hosts connected via iSCSI to a NetApp SAN we did not get a PSOD but hosts would "crash" consistently every month..

It got to a stage where we scheduled a re-boot every three weeks which "controlled" the problem... Eventually after involving NetApp and VMWare, we upgraded the version of NetApp ONTAP and the problem was resolved.. As with your implementation, our environment was running fine for 4-5 months before we got the issue, think it had something to do with the rise in useage.

0 Kudos