brandontuch
Contributor
Contributor

LSI SAS error?

Hello all. I've been running an evaluation version of ESXi [5.1] at my company, as we're trying to move torwards a more professional virtualization solution. I have a R515 however that doesn't seem to be able to work with VMware. I can add it as a host, but whenever I try to make a new VM on it, or move a VM to it, it just sits there for over an hour, then says Disconnected. I did some poking around on the forums here and found out how to turn on the shell and SSH, then went in to look at the dmseg. I found the following (apologies for the wall of text):

2013-03-13T08:28:39.108Z cpu3:13162)WARNING: LinScsi: SCSILinuxAbortCommand:1949:The driver failed to call done from itsabort handler and yet it returned SUCCESS
2013-03-13T08:28:39.108Z cpu3:13162)WARNING: LinScsi: SCSILinuxAbortCommands:1816:Failed, Driver LSI Logic SAS based MegaRAID driver, for vmhba2
2013-03-13T08:28:39.305Z cpu3:13162)megasas: ABORT sn 7723822 cmd=0x2a retries=0 tmo=0
2013-03-13T08:28:39.305Z cpu3:13162)<5>0 :: megasas: RESET -7723822 cmd=2a retries=0
2013-03-13T08:28:39.305Z cpu3:13162)<3>megasas: cannot recover from previous reset failures
2013-03-13T08:28:39.305Z cpu3:13162)WARNING: LinScsi: SCSILinuxAbortCommand:1949:The driver failed to call done from itsabort handler and yet it returned SUCCESS
2013-03-13T08:28:39.305Z cpu3:13162)WARNING: LinScsi: SCSILinuxAbortCommands:1816:Failed, Driver LSI Logic SAS based MegaRAID driver, for vmhba2
2013-03-13T08:28:39.630Z cpu3:13162)megasas: ABORT sn 7723825 cmd=0x2a retries=0 tmo=0
2013-03-13T08:28:39.630Z cpu3:13162)<5>0 :: megasas: RESET -7723825 cmd=2a retries=0
2013-03-13T08:28:39.630Z cpu3:13162)<3>megasas: cannot recover from previous reset failures

There's pages and pages of that Smiley Sad It's a good server, before I put ESXi on it the uptime was 16 months. I would like to use this server for VMs but don't know if this is something that can be fixed or not. Is this just incompatability, or is the SAS card failing because of ESXi? It's a Dell H700, so I imagine that's a fairly popular (and supported) card.

0 Kudos
2 Replies
anicdjw
Contributor
Contributor

We are seeing this same error on an M610. Were you able to get this resolved?

0 Kudos
abcam
Contributor
Contributor

Hello,

We just had the exact same problem on a Dell PowerEdge R610 equipped with a Dell PERC H700 after an upgrade from ESXi 4.1 to ESXi 5.1u1. The problem was caused by the version 12.10.3-0001 of the RAID controller, that reports some false alarms about multibit ECC errors. The issue was corrected by upgrading the firmware of the RAID controller to the latest version: 12.10.4-0001 (A10). You can download the firmware upgrade here: Driver Details | Dell US. As you can see in the release notes, this version "Fixes an issue where seemingly random multibit ECC errors are seen".

0 Kudos