I'm looking for root cause of a loss of storage connectivity that occurred. I'm wondering if there is a free tool that I use to analyze vmkernel logs (ESXi5.0). Specifically I'm looking for as much information as possible to determine why there was a loss of connectivity during an outage. The connectivity is restored to these FC LUNs, but I'm looking for root cause if possible. Can anyone recommend a software tool or tools for this purpose?
Thanks!
I found this VMworld session to be quite informative on Storage Log Analysis and general troubleshooting methodology:
http://www.vmworld.com/docs/DOC-3793
May not be an exact answer you were looking for but thought might be helpful..:)
Hi TheVMinator
What array and switches do you have in your enviroment have you seen anything in the array/switch logs ?
Cheers
David
Don't have switch logs - but the array is a NetApp FAS3270. Connected via FC SAN. vmkernel.log shows this:
2012-03-24T15:56:05.104Z cpu8:4104)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa. " state in doubt; requested fast path state update...
2012-03-24T15:56:05.104Z cpu8:4104)ScsiDeviceIO: 2305: Cmd(0x4124403c6080) 0x2a, CmdSN 0x545cf5 to dev "naa. " failed H:0x2 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-03-24T15:56:15.092Z cpu17:4803)<3> rport-1:0-3: blocked FC remote port time out: saving binding
2012-03-24T15:56:15.092Z cpu23:4853)<3>lpfc820 0000:04:00.2: 0:(0):0203 Devloss timeout on WWPN NPort x0f0191 Data: x0 x7 x0
2012-03-24T15:56:20.343Z cpu5:287013)HBX: 2313: Waiting for timed out [HB state abcdef02 offset 4121088 gen 17 stampUS 2669966207677 uuid jrnl <FB 1699602> drv 14.54] on vol ‘lunname’
2012-03-24T18:38:25.554Z cpu31:4127)WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa. ": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2012-03-24T18:38:25.556Z cpu0:4136)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x28 (0x412441f81ac0) to dev "naa " on path "vmhba1:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x2a 0x6.Act:FAILOVER
2012-03-24T18:38:25.607Z cpu2:4834)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device "naa. " - issuing command 0x412482235000
> Don't have switch logs - but the array is a NetApp FAS3270. Connected via FC SAN.
Can you please clarify
Your ESX hosts are directly connected into the FAS3270 ?
Why don't you have switch logs all Brocade & Cisco FC switches will keep a syslog of all activity ( hence the first question)
cheers
David
(Don't have switch logs due to different administrative team handling the switches)
Thanks