Hi community,
we are experiencing random PSODs ( every 3-6 weeks ) on our systems and are at loss what is causing this.
The hardware:
Two IBM System x3650 M3 (two &Gb SAS HBAs each) connected to a DS3524 storage (dual controller). The servers are running ESXi5. All systems and the ESXi are up to date. According to IBM the hardware is working fine.
PSODs happen randomly about every 3 to 6 weeks on both machines, but not simultaneously. Here is on of the PSODs:
The important lines seem to be:
Failed at vmkdrivers/src_9/vmklinux_9/vmuare/linux_scsi.c:2221 -- NOT REACHED
cpu10:4106)LinScsi: SCSILinuxCmdDone:2220:Attempted double completion
Maybe someone has some insight on this. Thanks in advance for any hints!
all firmware is the latest?
not a solution but maybe make it single path?
seems like it might be connecting over both paths during load? have you been monitoring the io on the hba
Did you manage this IBM KB?
http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5086606
and also vmware one
It applies to the HBA but in my personal experience (IBM Bladecenter HS22 and HS22V) also to all the PCI devices
Thanks for the replies so far.
@sparrowangels:
Yes Hardware and Software are completely up to date. We actually had the same thought about the dual controller setup. If no other solution comes up we might try to make it single path just to see if that cures the problem.
@riker82:
We've also stumbled across those issues but have not applied them since we are not seeing the mentioned errors in the log.