Hello Gurus - All my ESX servers in the cluster are throwing SCSI errors in vmkernel logs. I am running ESX 3.5 with the QLogic HBAs installed.
I am not sure if its a drivers issue or SCSI reservations. Does anyone know what these messages mean or have encountered similar issues?
Jul 16 11:14:59 esxhost1 vmkernel: 161:22:18:43.453 cpu0:1024)<6>Debug scsi underrun
Jul 16 11:14:59 esxhost1 last message repeated 9 times
Jul 16 11:14:59 esxhost1 vmkernel: 161:22:18:43.454 cpu0:1024)<6>Debug scsi underrun
Jul 16 11:14:59 esxhost1 last message repeated 21 times
Jul 16 11:16:18 esxhost1 vmkernel: .649 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:4:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .649 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:5:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:6:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:7:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:4:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:5:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:6:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:7:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:4:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:5:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:6:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:7:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:4:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:5:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:6:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:7:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
Thanks in advance.
Evening,
We have seen these messages when you are experiencing path thrashing. Check your SAN to ensure that you are not seeing tresspassed LUNs.
See the following for more helpful hints: http://communities.vmware.com/thread/213489
Trust this helps in some small way.
Kind regards,
Glen
your sure its qlogic? let me guess - emulex fc adapter?
it is a known issue, read the release notes for esx35u4, find "checkunitready" and update your machine to 35u4 ASAP!
best,
Joerg
Thanks for your replies...we are using QLogic HBAs, however there is a disparity between the reporting of QLogic HBAs version. Below is the output from 'lspci' and qlogic file:
-
0c:00.0 Fibre Channel: QLogic Corp QLA2432 (rev 02)
0c:00.1 Fibre Channel: QLogic Corp QLA2432 (rev 02)
-
QLogic PCI to Fibre Channel Host Adapter for QMH2462:
Firmware version: 4.00.29, Driver version 7.08-vm32
-
QLogic PCI to Fibre Channel Host Adapter for QMH2462:
Firmware version: 4.00.29, Driver version 7.08-vm32
-
I am not sure if this would make any difference. Moreover, I do not have any dead paths, esxcfg-mpath -l -v shows all the paths are active.
Could the I/O errors be due to SCSI reservations?
Thanks!!
Evening,
Have you confirmed that none of your LUN's are trespassed? This will happen on a active/passive array if different ESX servers access the LUN via different SP's.
Thanks and kind regards,
Glen
Thanks for your insight, Glen. I am working with the Storage team on LUN tresspass issue, as I do not see dead paths on the ESX host .
Will update soon.
just removed my out of office message...
Message was edited by: joergriether