VMware Cloud Community
IT_Architect
Enthusiast
Enthusiast
Jump to solution

What problem does the attached PSOD indicate?

I've been having problems with the VMware 4 locking up more and more.  Today, I went into the KVM and found the attached PSOD.  I do not understand the PSOD message well enough to interpret the source of the problem.

Reply
0 Kudos
1 Solution

Accepted Solutions
Troy_Clavell
Immortal
Immortal
Jump to solution

MCE's are hardware isssues.  The first thing I would check are your DIMM modules, then maybe CPU(s).  Ensure all firmware is current, then take the vmkernel zdump and send to your hardware vendor.  Also a VM-support dump would help.

...and finally see the article below

http://kb.vmware.com/kb/1005184

View solution in original post

Reply
0 Kudos
6 Replies
idle-jam
Immortal
Immortal
Jump to solution

the KB will guide you to how deciper the PSOD message. i hope you find it useful.

http://kb.vmware.com/kb/1005184

Troy_Clavell
Immortal
Immortal
Jump to solution

can you attach the screen shot?  Typically PSOD's are hardware related.

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

When I first posted the message, I saw what might have been a helpful link, and hit the back button and everything was gone.  I put the text back in but I forgot to re-attach the screen shot.  Here it is

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

MCE's are hardware isssues.  The first thing I would check are your DIMM modules, then maybe CPU(s).  Ensure all firmware is current, then take the vmkernel zdump and send to your hardware vendor.  Also a VM-support dump would help.

...and finally see the article below

http://kb.vmware.com/kb/1005184

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

Hi Troy,

Your answer makes a ton of sense, and I've been suspecting this.  This is a dedicated web server which has never been as solid as the other ESXi web server, but I thought it might have been because Nehalems were new when we bought it, and it was the first new architecture from Intel in many, many years.  I believe the only safe way for me to handle this is order another server, migrate to it, and then discontinue the old one.

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

Now for the rest of the story...

I thought after my last post, this is not my problem.  I sent the screen shot to the DC.  They said they would flash the BIOS.  I replied that while they had it down, I wanted the update to the IPMI, and to run S.M.A.R.T. and memory tests while it was down.  It was taking so long that I collapsed from exhaustion having been up for two days with 3 1/2 hours of sleep.  I woke up during business hours, terrified to find the sites were not up.  ESXi was up.  I checked the ticket.  They also decided to update the firmware on the motherboard as well, and that's where they found the problem.  That turned into an emergency chassis swap, so the VMs were all waiting with the question of what happened.  The server runs better now than it ever has and I got to sleep last night without the monitor texting me.

THANK YOU TROY!

Reply
0 Kudos