VMware Cloud Community
sun244
Contributor
Contributor

Async IO error on SAN disks, complex problem need help!!!

Folks,

We have IBM storage (2105-F20 or Shark) connected to HP servers (with qlogic) via Brocade 16 port switches. We are seeing following messages only on ESX servers. And during this time, applications freeze and time out.

According to vmware support, this problem is switch or SAN. There is nothing logged in the switch. The storage has also not logged any errors. The bizarre part of this problem is that only ESX servers log these errors. AIX and other windows servers in the same SAN zone have not recorded any errors or problems. Any of you have any tips or ran this kind of problem before have any suggestions? We are almost at a dead-end with no help from anyone.

Messages are as given below:

-


8<----


vmkernel: 0:00:46:17.496 cpu1:1089)World: vm 1089: 3867: Killing self with status=0x0:Success

vmkernel: 0:01:16:59.878 cpu6:1086)<6>scsi(1): RSCN database changed -0xf,0x1900.
vmkernel: 0:01:16:59.878 cpu2:1056)scsi(1): Waiting for LIP to complete...
vmkernel: 0:01:16:59.902 cpu6:1086)<6>scsi(0): RSCN database changed -0x10,0x1900.

vmkernel: 0:01:16:59.902 cpu7:1055)scsi(0): Waiting for LIP to complete...

vmkernel: 0:01:17:05.450 cpu6:1032)SCSI: 3731: AsyncIO timeout (5000); aborting cmd w/ sn 2238, handle bd9/0x7a05278

vmkernel: 0:01:17:05.450 cpu6:1032)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x7a05278, originSN 2238 from vmhba0:0:2

vmkernel: 0:01:17:05.450 cpu6:1032)SCSI: 3731: AsyncIO timeout (5000); aborting cmd w/ sn 6471, handle cc6/0x7a052a8

vmkernel: 0:01:17:05.450 cpu6:1032)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x7a052a8, originSN 6471 from vmhba0:0:4

vmkernel: 0:01:17:11.454 cpu6:1032)SCSI: 3731: AsyncIO timeout (5000); aborting cmd w/ sn 6473, handle cc6/0x7a052a8

vmkernel: 0:01:17:11.454 cpu6:1032)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x7a052a8, originSN 6473 from vmhba0:0:4

vmkernel: 0:01:17:11.454 cpu6:1032)SCSI: 3731: AsyncIO timeout (5000); aborting cmd w/ sn 2239, handle bd9/0x7a05278

vmkernel: 0:01:17:11.454 cpu6:1032)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x7a05278, originSN 2239 from vmhba0:0:2

vmkernel: 0:01:17:17.458 cpu6:1032)SCSI: 3731: AsyncIO timeout (5000); aborting cmd w/ sn 2240, handle bd9/0x7a05278

vmkernel: 0:01:17:17.458 cpu6:1032)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x7a05278, originSN 2240 from vmhba0:0:2

vmkernel: 0:01:17:17.458 cpu6:1032)SCSI: 3731: AsyncIO timeout (5000); aborting cmd w/ sn 6474, handle cc6/0x7a052a8

vmkernel: 0:01:17:17.458 cpu6:1032)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x7a052a8, originSN 6474 from vmhba0:0:4

====================8<============================

Thank you

Sun244

Reply
0 Kudos
3 Replies
waynegrow
Expert
Expert

Not sure if this will help, but see this post:

http://www.vmware.com/community/thread.jspa?threadID=47494

Reply
0 Kudos
pcomo
Enthusiast
Enthusiast

Hi,

Not sure but in this line the problem seems to be on scsi0 (Raid controller of HP server)

vmkernel: 0:01:17:17.458 cpu6:1032)LinSCSI: 3596: Aborting cmds with world 1024, originHandle 0x7a052a8, originSN 6474 from vmhba0[/b]:0:4

You have different I/O error on different partition of this SCSI device.

Reply
0 Kudos