I am currently running 3.0.2 I am having this problem where once a month my host machines are crashing. See print screen! Logs dont say much but if needed i can post. My hosts were fine for the first 6 months or so that they were in, only thing that has changed since is I upgraded VC to 2.5. I was planning on upgrading these hosts to 3.5 eventually but not yet. Any ideas?
PSOD usually is due to hardware failure, usually memory or CPU. You can see that a coredump was produced and from what I can tell, looks like possibly bad CPU. You'll want to take a look at your hardware to see if there's any listed faults HP/Dell/other vendor and hopefully you have hardware monitoring like SIM or something. Once the part is replaced, you'll be fine.
Is this a single ESX or multiple that are exhibiting this behavior - if it is multiple then I would look for common points of failure - such as FC switch - I do not think it is the VC Upgrade that is causing this - to many customers are running VC 2.5 with 3.0 hosts -
It is multiple hosts, currently 3 hosts in my farm all doing the same thing, luckily not at the same time. All hosts are Dell 6850's I should be able to install Dell Open Manage on these and check for hardware failures.
If its all 3 hosts, I would check with dell for an available BIOS upgrade.
Either that or an environmental issue (power flucuations, cooling problems)
I think you might be on the right track with the environmental issue. We have had some heat problems in our data center this summer which just got remediated this week. I just installed the Dell Server Administrator and dont see any hardware failures or anything of interest in the logs. Thanks for everyones ideas!!
Well I was hoping to blame this on the cooling problems we have been having but our data center has been 68 degrees for the past two weeks and i just had another host PSOD on me tonight.
Yes that article is very interesting, thank you for posting. But yes i only have one perc and its a 5/i not a 5/e mentioned in the article also i have 6850's, but who knows maybe still possible to happen. A upgrade to 3.5 might be in my future sooner then i was planning.
Anyone have any idea's? Support would be nice and i would call if it didnt run out 6 months ago. Also the Dell Server Administrator shows no hardware failures in the logs.
What type of shared storage are your hosts connected to? We had a similar problem with hosts connected via iSCSI to a NetApp SAN we did not get a PSOD but hosts would "crash" consistently every month..
It got to a stage where we scheduled a re-boot every three weeks which "controlled" the problem... Eventually after involving NetApp and VMWare, we upgraded the version of NetApp ONTAP and the problem was resolved.. As with your implementation, our environment was running fine for 4-5 months before we got the issue, think it had something to do with the rise in useage.