VMware Cloud Community
Randy_Evans
Contributor
Contributor

WARNING: SCSI: 5422

For a few days before a loss of access to all SAN datastores, we have been getting hundreds of occurances of this sequence of lines in the vmkernel log. The sequence always begins with the WARNING: SCSI: 5422[/i] line.

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.274 cpu5:1062)WARNING: SCSI: 5422: READ of handleID 0x27cc

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.274 cpu5:1038)SCSI: 8021: vmhba1:0:3:0 status = 0/3 0x0 0x0 0x0

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.274 cpu5:1038)SCSI: 8109: vmhba1:0:3:0 Retry (abort after timeout)

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)SCSI: 3169: vmhba1:0:3:0 Abort cmd due to timeout, s/n=2, attempt 1

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x6a02878, originSN 2 from vmhba1:0:3

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)<6>qla24xx_abort_command(0): handle to abort=183

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)LinSCSI: 2604: Forcing host status from 2 to SCSI_HOST_OK

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)LinSCSI: 2606: Forcing device status from SDSTAT_GOOD to SDSTAT_BUSY

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)SCSI: 3182: vmhba1:0:3:0 Abort cmd on timeout succeeded, s/n=2, attempt 1

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)SCSI: 8021: vmhba1:0:3:0 status = 8/0 0x0 0x0 0x0

Apr 9 19:13:23 jagger vmkernel: 0:14:11:38.275 cpu5:1038)SCSI: 8040: vmhba1:0:3:0 Retry (busy)

This is VMware ESX Server 3.0.1 with some, but not all patches.

What is WARNING: SCSI: 5422[/i]?

Any suggestions for troubleshooting?

Thanks.

0 Kudos
3 Replies
admin
Immortal
Immortal

Those are timeout/device busy errors, you can put the SCSI status codes into the form at the bottom of this page.

http://www.vmprofessional.com/index.php?content=resources

vmhba1:0:3:0 status = 0/3 0x0 0x0 0x0

gives: Host timeout: Timed out for unspecified reason

How is the SAN connected? Fibre, iSCSI, NFS?

The chap in this thread traced his problems to a NIC that wasn't working properly, if your SAN is iSCSI/NFS maybe you have a similar issue.

http://www.vmware.com/community/thread.jspa?threadID=62474

Or it could be related to the HP Insight Agent if you have that installed?

http://www.vmware.com/community/thread.jspa?messageID=537564

Randy_Evans
Contributor
Contributor

Our SAN is Fibre Channel, so the host timout[/i] error hints at a failure to respond from the HBA, switch, or storage controller.

There's three things we'll try:

1. Update the HP/QLogic firmware (and the DL360 G5 firmware while we are at it).

2. Update the HP Management Agents software from 7.6.0 to 7.7.0.

3. Update to the latest VMware ESX 3.0.1 patches.

The problem and the warning messages have not happened again since everything was power cycled, so we may wait until we start seeing the warning messages again before taking any action.

If the updates don't fix the problem, we'll try swapping the Fibre Channel cable and switch port with another server that is not having this problem.

0 Kudos
Bwhite
Enthusiast
Enthusiast

We had a similar problem to the one you discribing. We applied all the Frimware and BIOS updates to the SAN and have not had the problem since

Brian