Re: Lost connectivity to Storage Device

sutcliff · ‎09-05-2013

I have an IBM DS3524 Storage Subsystem connected with one fiber channel adapter to an IBM server with ESXi 5.0.

I am seeing some large numbers when I run esxtop. My DAV/CMD goes up to 168 or 90 every minute or so. Most of the time it is hovering around 14.75.

I am seeing in the logs:

cpu20:4116)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x28 (0x412440385100) to dev "naa.60080e50002364aa000002bc4eb181de" on path "vmhba5:C0:T0:L6" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x94 0x1.Act:FAILOVER

2013-09-05T13:49:11.778Z

cpu6:4102)WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa.60080e50002364aa000002b44eb14998": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

2013-09-05T13:49:12.526Z cpu8:4282)WARNING: vmw_psp_fixed: psp_fixedSelectPathToActivateInt:464:Selected current STANDBY path vmhba5:C0:T0:L6 for device naa.60080e50002364aa000002bc4eb181de to activate. This may lead to path thrashing.

In Events I am seeing that I am losing connection to the storage device immediately followed by a reconnect.

I logged onto my storage device and noticed that the write cache is off. Could the write cache being off cause this level of IO issue?

Thanks,

Brian

vipinvk · ‎09-09-2013

I am not sure of IBM storage, but generally disabled write cache should not cause the VM to lose connectivity (unless it causes very high latency for disk access). Please check the fabric side, connectivity and policies assigned for disk access. Ensure you are following the best practices.

jdptechnc · ‎09-09-2013

Is this DS3524 a multiple controller unit or just a single controller? Directly connected to the host, or connected through a switch?

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222

sutcliff · ‎09-09-2013

It has multiple controllers but we are just using one of them.

It is a direct connection from the card to the controller. No switch.

jdptechnc · ‎09-09-2013

I wonder if your DS3524 may be trespassing the LUN to the other controller. What is the OS type that you are using on the DS3524 side?

Also, check this article, it matches the sense codes you are getting:

VMware KB: Failover fails when the LUN is configured on a LSI based storage array

VMware KB: Troubleshooting LUN connectivity issues due to IBM DS3500 using an incorrect multipathing... is for 4.1, but may still apply)

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222

All

Lost connectivity to Storage Device