I'm experiencing an issue in which I/O will stop completely for a few seconds and resume automatically. This causes a series of issues with the VMs.
Everything points out that this seems to be related to a driver/software issue. Here are the details:
- I'm using ESXi 6 with a LSI MegaRAID 9361-8i card.
- No issues are reported by the card itself. All HDs are ok without SMART errors and the RAID array status is ok.
- I can't reproduce this issue with Centos 7 bare metal ( that's why I'm thinking that this can be a driver/software issue ). I ran a series of tests with Centos 7 and I/O didn't stopped at any moment.
- At the vmkernel.log, I see the following messages:
2015-11-13T14:48:22.461Z cpu33:33804)ScsiDeviceIO: 2645: Cmd(0x43a65bc99300) 0x1a, CmdSN 0x53e1 from world 0 to dev "naa.600605b00a88a2701d8c5e4a1ef7817e" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2015-11-13T14:48:22.540Z cpu33:33804)ScsiDeviceIO: 2645: Cmd(0x43a6597f9740) 0x1a, CmdSN 0x53e6 from world 0 to dev "naa.600605b00a88a2701d8c5e4a1ef7817e" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2015-11-13T14:48:22.705Z cpu55:33359)ScsiDeviceIO: 2645: Cmd(0x43a6590fdac0) 0x1a, CmdSN 0x53eb from world 0 to dev "naa.600605b00a88a2701d8c5e4a1ef7817e" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2015-11-13T14:49:19.087Z cpu19:33506)NMP: nmp_ResetDeviceLogThrottling:3345: last error status from device naa.600605b00a88a2701d8c5e4a1ef7817e repeated 3 times
Here is an image from a Performance chart that shows the problem:
Any ideas about what can cause this problem?
Thanks!
I would recommend to first check if you're running a controller firmware revision which is on VMware's compatibility list for vSphere 6. You can find the list of supported versions for LSI 9361-8i here: VMware Compatibility Guide: I/O Device Search
Thanks for your reply. After talking with LSI support they suggested installing "scsi-megaraid-sas version 6.609.07.00-1OEM" and disabling lsi--mr3 ( although it shows in the compatibility list ).
It was disabled with this command: esxcfg-module -d lsi_mr3
I have been testing it for 2 days and so far the "I/O freeze" didn't happened again. I still have to wait a few more days until I'm 100% sure that the issue was fixed.
Glad you've had it fixed.