Today was my first offical PSOD crash that I have seen with VMWARE ESX (not bad for 3 years of using the product). We have a customer with a Dell Poweredge 2900 III / PERC 6i Controller with SAS Disks. I am not really sure where to begin with troubleshooting this. I restarted the server and it came back up without any issues. The firmware on the storage controller is out of date and the latest firmware is marked Critical. I am not sure if what I experienced is a known issue. This server has been in production for over 6 months so I was surprised to see an issue crop up. I checked Dell Open Manage and it reports all system components with a green check.I have attached the output from the PSOD. Any ideas would be much appreciated. Before I start upgrading firmware etc I wanted to get some advice.
Can you run something like memtest86? Even though the Dell utils have come back with a clean bill of health for the hardware they dont really stress test the hardware, thats where I'd start
Thanks,
Neil
PSOD are caused by either Hardware (soak test it, particularly RAM) or Agents (particularly Hardware monitoring Agents)
Soak test, then remove any additional agents - and monitor