VMware Cloud Community
fernandomm2
Enthusiast
Enthusiast

LSI MegaRAID: I/O stops completely for a few seconds

I'm experiencing an issue in which I/O will stop completely for a few seconds and resume automatically. This causes a series of issues with the VMs.

Everything points out that this seems to be related to a driver/software issue. Here are the details:

- I'm using ESXi 6 with a LSI MegaRAID 9361-8i card.

- No issues are reported by the card itself. All HDs are ok without SMART errors and the RAID array status is ok.

- I can't reproduce this issue with Centos 7 bare metal ( that's why I'm thinking that this can be a driver/software issue ). I ran a series of tests with Centos 7 and I/O didn't stopped at any moment.

- At the vmkernel.log, I see the following messages:

2015-11-13T14:48:22.461Z cpu33:33804)ScsiDeviceIO: 2645: Cmd(0x43a65bc99300) 0x1a, CmdSN 0x53e1 from world 0 to dev "naa.600605b00a88a2701d8c5e4a1ef7817e" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2015-11-13T14:48:22.540Z cpu33:33804)ScsiDeviceIO: 2645: Cmd(0x43a6597f9740) 0x1a, CmdSN 0x53e6 from world 0 to dev "naa.600605b00a88a2701d8c5e4a1ef7817e" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2015-11-13T14:48:22.705Z cpu55:33359)ScsiDeviceIO: 2645: Cmd(0x43a6590fdac0) 0x1a, CmdSN 0x53eb from world 0 to dev "naa.600605b00a88a2701d8c5e4a1ef7817e" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2015-11-13T14:49:19.087Z cpu19:33506)NMP: nmp_ResetDeviceLogThrottling:3345: last error status from device naa.600605b00a88a2701d8c5e4a1ef7817e repeated 3 times

Here is an image from a Performance chart that shows the problem:

Captura de Tela 2015-11-12 às 14.23.56.png

Any ideas about what can cause this problem?

Thanks!

0 Kudos
3 Replies
Nick_Andreev
Expert
Expert

I would recommend to first check if you're running a controller firmware revision which is on VMware's compatibility list for vSphere 6. You can find the list of supported versions for LSI 9361-8i here: VMware Compatibility Guide: I/O Device Search

---
If you found my answers helpful please consider marking them as helpful or correct.
VCIX-DCV, VCIX-NV, VCAP-CMA | vExpert '16, '17, '18
Blog: http://niktips.wordpress.com | Twitter: @nick_andreev_au
0 Kudos
fernandomm2
Enthusiast
Enthusiast

Thanks for your reply. After talking with LSI support they suggested installing "scsi-megaraid-sas version 6.609.07.00-1OEM" and disabling lsi--mr3 ( although it shows in the compatibility list ).

It was disabled with this command: esxcfg-module  -d  lsi_mr3

I have been testing it for 2 days and so far the "I/O freeze" didn't happened again. I still have to wait a few more days until I'm 100% sure that the issue was fixed.

0 Kudos
Nick_Andreev
Expert
Expert

Glad you've had it fixed.

---
If you found my answers helpful please consider marking them as helpful or correct.
VCIX-DCV, VCIX-NV, VCAP-CMA | vExpert '16, '17, '18
Blog: http://niktips.wordpress.com | Twitter: @nick_andreev_au
0 Kudos