Came in to work this morning to be greeted by my first ever PSOD. I checked out kb 1004250, but I really can't seem to find too much info on this one. Anyone ever seen this or have any guesses on it?
Thanks,
Brian
MCE's are always hardware related. I would check CPU first, but it could also be a memory issue. VMware Support won't help with MCE's as they are hardware vendor specefic.
What were you doing on the host before it crashed? were you doing storage scanning?
Thanks Troy. The Dell OpenManage agents are reporting no problems, the hardware status tab looks clean for this host, and I can't find anything of any real interest in the logs either. Guess I'll roll a few unimportant VMs over on this host and continue to monitor it for a while.
also, when the ESX Host comes up, you should have a dump file off /
I usually open them with WinSCP. Sometimes it's obvious what the error is, sometimes not. But look for keywords MCE.
From what I can tell, nothing was going on on the host. This happened before work hours began, so there were no manual admin actions going on. In the hour leading up to the HA event, each and every VM on the host had CPU alarms issued as the cpu usage went from green to yellow then red.
I had the same problem couple of weeks ago, but our storage guy was performing storage scanning and the ESX host crashed.. I rebooted it, came back, and since then it is been running fine. Sent the logs to VMware support and they said it could be because of an old HBA firmware issue..
Used the dump info along with some info I found in kb 1005184. Appears to be something with the CPU data cache on CPU 2. Moving some "less important" VMs back on the host now and will keep on watching it. Good times on a Friday going into a holiday weekend!!
nothing like rolling into the holiday weekend smoothly