vmkernel filled with I/O errors.

vGuy · ‎07-16-2009

Hello Gurus - All my ESX servers in the cluster are throwing SCSI errors in vmkernel logs. I am running ESX 3.5 with the QLogic HBAs installed.

I am not sure if its a drivers issue or SCSI reservations. Does anyone know what these messages mean or have encountered similar issues?

Jul 16 11:14:59 esxhost1 vmkernel: 161:22:18:43.453 cpu0:1024)<6>Debug scsi underrun

Jul 16 11:14:59 esxhost1 last message repeated 9 times

Jul 16 11:14:59 esxhost1 vmkernel: 161:22:18:43.454 cpu0:1024)<6>Debug scsi underrun

Jul 16 11:14:59 esxhost1 last message repeated 21 times

Jul 16 11:16:18 esxhost1 vmkernel: .649 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:4:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .649 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:5:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:6:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:7:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:4:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:5:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:6:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .650 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:7:1 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:4:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:5:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:6:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba1:7:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:4:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:5:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:6:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Jul 16 11:16:18 esxhost1 vmkernel: .651 cpu2:1047)WARNING: SCSI: 2896: CheckUnitReady on vmhba2:7:2 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0

Thanks in advance.

ThompsG · ‎07-16-2009

Evening,

We have seen these messages when you are experiencing path thrashing. Check your SAN to ensure that you are not seeing tresspassed LUNs.

See the following for more helpful hints: http://communities.vmware.com/thread/213489

Trust this helps in some small way.

Kind regards,

Glen

joergriether · ‎07-16-2009

your sure its qlogic? let me guess - emulex fc adapter?

it is a known issue, read the release notes for esx35u4, find "checkunitready" and update your machine to 35u4 ASAP!

best,

Joerg

vGuy · ‎07-16-2009

Thanks for your replies...we are using QLogic HBAs, however there is a disparity between the reporting of QLogic HBAs version. Below is the output from 'lspci' and qlogic file:

-

0c:00.0 Fibre Channel: QLogic Corp QLA2432 (rev 02)

0c:00.1 Fibre Channel: QLogic Corp QLA2432 (rev 02)

-

# more qla2300/1

QLogic PCI to Fibre Channel Host Adapter for QMH2462:

Firmware version: 4.00.29, Driver version 7.08-vm32

-

QLogic PCI to Fibre Channel Host Adapter for QMH2462:

Firmware version: 4.00.29, Driver version 7.08-vm32

-

I am not sure if this would make any difference. Moreover, I do not have any dead paths, esxcfg-mpath -l -v shows all the paths are active.

Could the I/O errors be due to SCSI reservations?

Thanks!!

ThompsG · ‎07-18-2009

Evening,

Have you confirmed that none of your LUN's are trespassed? This will happen on a active/passive array if different ESX servers access the LUN via different SP's.

Thanks and kind regards,

Glen

vGuy · ‎07-24-2009

Thanks for your insight, Glen. I am working with the Storage team on LUN tresspass issue, as I do not see dead paths on the ESX host .

Will update soon.

joergriether · ‎07-24-2009

just removed my out of office message...

Message was edited by: joergriether