Storage Slowdown or Storage going off line

teddyboy · ‎03-05-2011

Having issues with a HP

\var\log\vmkernel contents are below showing the error

SAN is fibre attached HP 6400 connected via external switches then to the internal HP Blade switches were the hosts live.

40h Task aborted The target returns this status code when a task is aborted by another I_T nexus and the TAS bit in the Control mode page is set to one

[root@ivsvr5 root]# tail /var/log/vmkernel

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.021 cpu4:1041)StorageMonitor: 196: vmhba0:0:0:0 status = 2/0 0x5 0x20 0x0

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.022 cpu0:1024)StorageMonitor: 196: vmhba1:0:28:0 status = 2/0 0x5 0x24 0x0

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.022 cpu6:2340)StorageMonitor: 196: vmhba1:0:30:0 status = 2/0 0x5 0x24 0x0

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.023 cpu0:1024)StorageMonitor: 196: vmhba1:2:47:0 status = 2/0 0x5 0x24 0x0

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.023 cpu6:2340)StorageMonitor: 196: vmhba1:2:31:0 status = 2/0 0x5 0x24 0x0

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.024 cpu0:1024)StorageMonitor: 196: vmhba1:2:43:0 status = 2/0 0x5 0x24 0x0

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.025 cpu0:2355)StorageMonitor: 196: vmhba1:1:36:0 status = 2/0 0x5 0x24 0x0

Mar 4 09:38:31 ivsvr5 vmkernel: 127:12:01:38.025 cpu6:1030)StorageMonitor: 196: vmhba1:0:34:0 status = 2/0 0x5 0x24 0x0

Mar 4 09:39:42 ivsvr5 vmkernel: 127:12:02:49.557 cpu5:1039)Config: 416: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)

Mar 4 09:39:51 ivsvr5 vmkernel: 127:12:02:58.093 cpu5:1039)Config: 416: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)

[root@ivsvr6 root]# tail /var/log/vmkernel

Mar 4 00:16:17 ivsvr6 vmkernel: 127:01:30:07.189 cpu1:1847)VSCSI: 4060: Creating Virtual Device for world 1848 vscsi0:5 (handle 11751)

Mar 4 00:16:17 ivsvr6 vmkernel: 127:01:30:07.189 cpu1:1847)VSCSI: 4060: Creating Virtual Device for world 1848 vscsi0:6 (handle 11752)

Mar 4 00:16:17 ivsvr6 vmkernel: 127:01:30:07.189 cpu1:1847)VSCSI: 4060: Creating Virtual Device for world 1848 vscsi0:8 (handle 11753)

Mar 4 00:16:17 ivsvr6 vmkernel: 127:01:30:07.190 cpu1:1847)VSCSI: 4060: Creating Virtual Device for world 1848 vscsi0:9 (handle 11754)

Mar 4 00:16:17 ivsvr6 vmkernel: 127:01:30:07.190 cpu1:1847)VSCSI: 4060: Creating Virtual Device for world 1848 vscsi0:10 (handle 11755)

Mar 4 02:09:08 ivsvr6 vmkernel: 127:03:22:58.677 cpu1:1860)StorageMonitor: 196: vmhba1:2:31:0 status = 0/2 0x0 0x0 0x0

Mar 4 15:49:36 ivsvr6 vmkernel: 127:17:03:27.235 cpu0:1024)StorageMonitor: 196: vmhba1:0:28:0 status = 40/0 0x0 0x0 0x0

Mar 4 15:49:36 ivsvr6 last message repeated 3 times

Mar 4 15:49:36 ivsvr6 vmkernel: 127:17:03:27.236 cpu0:1024)StorageMonitor: 196: vmhba1:0:28:0 status = 40/0 0x0 0x0 0x0

Mar 4 15:49:36 ivsvr6 vmkernel: 127:17:03:27.237 cpu0:1024)StorageMonitor: 196: vmhba1:0:28:0 status = 40/0 0x0 0x0 0x0

[root@ivsvr7 root]# tail /var/log/vmkernel

Mar 4 08:25:41 ivsvr7 vmkernel: 93:16:59:47.366 cpu2:1826)Init: 1057: Received INIT from world 1826

Mar 4 08:25:41 ivsvr7 vmkernel: 93:16:59:47.366 cpu2:1772)World: vm 1832: 901: Starting world vmware-vmx with flags 44

Mar 4 08:25:41 ivsvr7 vmkernel: 93:16:59:47.367 cpu3:1827)Init: 1057: Received INIT from world 1827

Mar 4 13:24:10 ivsvr7 vmkernel: 93:21:58:16.413 cpu7:1751)Net: 4259: unicastAddr 00:50:56:b0:00:0b;

Mar 4 15:50:03 ivsvr7 vmkernel: 94:00:24:09.626 cpu5:1707)StorageMonitor: 196: vmhba5:0:34:0 status = 40/0 0x0 0x0 0x0

Mar 4 15:50:03 ivsvr7 vmkernel: 94:00:24:09.733 cpu5:1751)StorageMonitor: 196: vmhba5:0:34:0 status = 40/0 0x0 0x0 0x0

Mar 4 15:50:03 ivsvr7 vmkernel: 94:00:24:09.735 cpu5:1786)StorageMonitor: 196: vmhba5:0:34:0 status = 40/0 0x0 0x0 0x0

Mar 4 15:50:10 ivsvr7 vmkernel: 94:00:24:16.739 cpu5:1782)StorageMonitor: 196: vmhba5:0:34:0 status = 40/0 0x0 0x0 0x0

[root@ivsvr9 root]# tail /var/log/vmkernel

Mar 4 05:01:06 ivsvr9 vmkernel: 124:14:52:30.722 cpu0:1103)World: vm 2526: 901: Starting world vmware-vmx with flags 44

Mar 4 05:01:06 ivsvr9 vmkernel: 124:14:52:31.019 cpu1:1099)DevFS: 2307: Unable to find device: 1c407a-iradius-000002-delta.vmdk

Mar 4 05:01:07 ivsvr9 vmkernel: 124:14:52:31.145 cpu1:1099)DevFS: 2307: Unable to find device: 167209d-iradius-000002-delta.vmdk

Mar 4 05:01:07 ivsvr9 vmkernel: 124:14:52:31.399 cpu0:1099)DevFS: 2307: Unable to find device: 9dd0be-iradius-000002-delta.vmdk

Mar 4 05:01:07 ivsvr9 vmkernel: 124:14:52:31.475 cpu0:1099)VSCSI: 4060: Creating Virtual Device for world 1100 vscsi0:0 (handle 14996)

Mar 4 09:39:22 ivsvr9 vmkernel: 124:19:30:45.831 cpu10:3319)StorageMonitor: 196: vmhba3:0:34:0 status = D:0x0/H:0x2 0x0 0x0 0x0

Mar 4 10:22:30 ivsvr9 vmkernel: 124:20:13:54.430 cpu3:3482)Alloc: 9877: Skipping pshare: alloc failed on node 0

Mar 4 12:47:02 ivsvr9 vmkernel: 124:22:38:25.547 cpu5:3424)Alloc: 9877: Skipping pshare: alloc failed on node 1

Mar 4 13:31:18 ivsvr9 vmkernel: 124:23:22:42.093 cpu15:3267)StorageMonitor: 196: vmhba3:0:28:0 status = D:0x0/H:0x2 0x0 0x0 0x0

Mar 4 14:37:17 ivsvr9 vmkernel: 125:00:28:41.335 cpu4:3535)Alloc: 9877: Skipping pshare: alloc failed on node 0

idle-jam · ‎03-05-2011

does it only affect your ESX host? i would suggest logging a call to vmware support too with the diagnostic file? it may due to old firmware/driver or even bad configuration on the SAN.

teddyboy · ‎03-05-2011

Hi and thanks for the reply. When you say only the ESX host are you meaning the VMs as well?

SAN firmware is currently being looked at.

HBA firmware as well.

Additonal interconnects on the blades too.

Will log a call but need to do some diags first as I am assisting remotely with the solution.

Thanks

teddyboy · ‎03-08-2011

Additional information

Issue only affects 1 LUN on 1 host (although in those logs it seems to affect 2 different LUN’s on different hosts) They have ~28 vm’s on one of those LUN’s, and ~20 on the other, and the VM’s running on the affected LUN’s, but on other hosts did not trigger any log messages

Suggesting the overall number of Vms is reduced to 10-14 per LUN.....