ESXi 5.1 U2 I/O problem with BladeCenter S storage...

Marcellus · ‎02-03-2015

Hi all !

Looking for help or other interested people in my problem. I have still repeated this messages in /var/log/vmkernel.log on my 3 blades from Bladecenter S chassis with internal storage (only 3 LUNs created):

2015-02-03T08:58:37.354Z cpu5:9717)WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand failed with status = 0x1055 Host Busy vmhba0:0:0:2 (driver name: MPT SAS Host) - Message repeated 11 times

2015-02-03T08:58:37.354Z cpu6:4102)ScsiDeviceIO: 2320: Cmd(0x4124403ca740) 0x2a, CmdSN 0x1 from world 9717 to dev "naa.6005076b077e1dff502cf9b100000006" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

2015-02-03T08:58:37.355Z cpu0:4104)ScsiDeviceIO: 2320: Cmd(0x4124003f1240) 0x28, CmdSN 0x73a4d from world 4474 to dev "naa.6005076b077e1dff4f72fb8b00000004" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

2015-02-03T08:59:26.071Z cpu3:10296)ScsiDeviceIO: 2320: Cmd(0x4124003b8100) 0x2a, CmdSN 0x80000025 from world 10294 to dev "naa.6005076b077e1dff4f72fb8b00000004" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

2015-02-03T08:59:26.084Z cpu1:10298)ScsiSched: 2612: Reduced the queue depth for device naa.6005076b077e1dff4f72fb8b00000004 to 1, due to queue full/busy conditions. The queue depth could be reduced further if the condition persists.

2015-02-03T08:59:34.183Z cpu0:10297)ScsiSched: 2535: Queue depth for device naa.6005076b077e1dff4f72fb8b00000004 is restored to normal

Tried:

- upgrade BladeCenter S SAS RAID to latest firmware level (1.3.3.006)

- upgrade all three HS22V to latests firmware levels (UEFI 1.23, IMM 1.42, MPTBIOS 6.30.02.00)

- ESXi 5.1.0 (Build 2323236) with latest Lenovo Custom Image Patch 1.2 (version of scsi-mptsas is 4.23.01.00-6vmw.510.2.44.2191751)

- disabled Interrupt Remapping (on all blades) by KB: VMware KB: vHBAs and other PCI devices may stop responding in ESXi 5.x and ESXi/ESX 4.1 when using I...

- disabled CIM vmw_lsiprovider (on all blades)

- change multipath to Path Selection Fixed (VMware)

Finally followed http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5078491 manual 00ak719.pdf section VMware, steps reduces count of errors, but not fully resolve it

Detailed steps describe:

- esxcli system module parameters set -p mpt_sdev_queue_depth=8 -m mptsas

- Disk/SchedNumReqOutstanding 8

- Disk/QFullThreshold 8

- Disk/QFullSampleSize 64

- Disk/DelayOnBusy 2000

- Disk.MaxLUN 128

- Linux guest`s /etc/udev/rules.d/99-vmware-scsi-udev.rules:

ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{vendor}=="VMware ", ATTRS{model}=="Virtual disk ", RUN+="/bin/sh -c 'echo 60 >/sys$DEVPATH/timeout'"

ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{vendor}=="VMware ", ATTRS{model}=="Virtual disk ", RUN+="/bin/sh -c 'echo 4 >/sys$DEVPATH/queue_depth'"

- Windows guest`s:

Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LSI_SAS\Parameters\Device added MaximumTargetQueueDepth=4

- change timings on CIOv SAS card on all blades

Cannot solve that problem for years, IBM do not helped me Queue_depth only reduces errors, not solve.

Any suggestions, or experiences with similar IBM hardware with VMware ?

Thank you for any tips

All

ESXi 5.1 U2 I/O problem with BladeCenter S storage on HS22V 7871 with CIOv (LSISAS1064E)