MCE's are hardware isssues. The first thing I would check are your DIMM modules, then maybe CPU(s). Ensure all firmware is current, then take the vmkernel zdump and send to your hardware vendor. Also a VM-support dump would help.
...and finally see the article below
the KB will guide you to how deciper the PSOD message. i hope you find it useful.
can you attach the screen shot? Typically PSOD's are hardware related.
MCE's are hardware isssues. The first thing I would check are your DIMM modules, then maybe CPU(s). Ensure all firmware is current, then take the vmkernel zdump and send to your hardware vendor. Also a VM-support dump would help.
...and finally see the article below
Hi Troy,
Your answer makes a ton of sense, and I've been suspecting this. This is a dedicated web server which has never been as solid as the other ESXi web server, but I thought it might have been because Nehalems were new when we bought it, and it was the first new architecture from Intel in many, many years. I believe the only safe way for me to handle this is order another server, migrate to it, and then discontinue the old one.
Now for the rest of the story...
I thought after my last post, this is not my problem. I sent the screen shot to the DC. They said they would flash the BIOS. I replied that while they had it down, I wanted the update to the IPMI, and to run S.M.A.R.T. and memory tests while it was down. It was taking so long that I collapsed from exhaustion having been up for two days with 3 1/2 hours of sleep. I woke up during business hours, terrified to find the sites were not up. ESXi was up. I checked the ticket. They also decided to update the firmware on the motherboard as well, and that's where they found the problem. That turned into an emergency chassis swap, so the VMs were all waiting with the question of what happened. The server runs better now than it ever has and I got to sleep last night without the monitor texting me.
THANK YOU TROY!