VMware Cloud Community
TheVMinator
Expert
Expert

Finding root cause of loss of storage connectivity

I'm looking for root cause of a loss of storage connectivity that occurred.  I'm wondering if there is a free tool that I use to analyze vmkernel logs (ESXi5.0).  Specifically I'm looking for as much information as possible to determine why there was a loss of connectivity during an outage.  The connectivity is restored to these FC LUNs, but I'm looking for root cause if possible.  Can anyone recommend a software tool or tools for this purpose?

Thanks! 

0 Kudos
5 Replies
vGuy
Expert
Expert

I found this VMworld session to be quite informative on Storage Log Analysis and general troubleshooting methodology:

http://www.vmworld.com/docs/DOC-3793

May not be an exact answer you were looking for but thought might be helpful..:)

TheEsp
Enthusiast
Enthusiast

Hi TheVMinator

What array and switches do you have in your enviroment have you seen anything in the array/switch logs ?

Cheers

David

0 Kudos
TheVMinator
Expert
Expert

Don't have switch logs - but the array is a NetApp FAS3270.  Connected via FC SAN.  vmkernel.log shows this:

2012-03-24T15:56:05.104Z cpu8:4104)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa. " state in doubt; requested fast path state update...

2012-03-24T15:56:05.104Z cpu8:4104)ScsiDeviceIO: 2305: Cmd(0x4124403c6080) 0x2a, CmdSN 0x545cf5 to dev "naa. " failed H:0x2 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

2012-03-24T15:56:15.092Z cpu17:4803)<3> rport-1:0-3: blocked FC remote port time out: saving binding

2012-03-24T15:56:15.092Z cpu23:4853)<3>lpfc820 0000:04:00.2: 0:(0):0203 Devloss timeout on WWPN NPort x0f0191 Data: x0 x7 x0

2012-03-24T15:56:20.343Z cpu5:287013)HBX: 2313: Waiting for timed out [HB state abcdef02 offset 4121088 gen 17 stampUS 2669966207677 uuid jrnl <FB 1699602> drv 14.54] on vol ‘lunname’

2012-03-24T18:38:25.554Z cpu31:4127)WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa. ": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

2012-03-24T18:38:25.556Z cpu0:4136)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x28 (0x412441f81ac0) to dev "naa " on path "vmhba1:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x2a 0x6.Act:FAILOVER

2012-03-24T18:38:25.607Z cpu2:4834)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device "naa. " - issuing command 0x412482235000

0 Kudos
TheEsp
Enthusiast
Enthusiast

> Don't have switch logs - but the array is a NetApp FAS3270.  Connected via FC SAN. 

Can you please clarify

Your ESX hosts are directly connected into the FAS3270 ?

Why don't you have switch logs all Brocade & Cisco FC switches will keep a syslog of all activity ( hence the first question)

cheers

David

0 Kudos
TheVMinator
Expert
Expert

(Don't have switch logs due to different administrative team handling the switches)

Thanks

0 Kudos