VMware Cloud Community
wdroush1
Hot Shot
Hot Shot

BL480G1 iSCSI Errors -- DL360G5 Fine

I have two hosts, a BL480G1 and a DL360G5 using a NexentaStor iSCSI LUN.

The BL480G1 has always had storage issues, ranging from high latency to things that are as critical as this.

When removing snapshots, the snapshots take HOURS (we're talking 8+) to remove, if I take a snapshot and immediately remove it, it takes that long, making backups nearly impossible, during this time the VM will stop responding for up to 15 minutes at a time.

My vmkernel.log:

2012-12-24T05:18:12.416Z cpu4:2177)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x89 (0x41240077bb40, 10724) to dev "naa.600144f0bd9c020000004ff1a959000c" on path "vmhba32:C1:T0:L1" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2012-12-24T05:18:12.416Z cpu4:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:12.416Z cpu4:2177)ScsiDeviceIO: 2322: Cmd(0x41240077bb40) 0x89, CmdSN 0xa432 from world 10724 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:15.506Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:15.506Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x41240073a380) 0x89, CmdSN 0xa439 from world 10722 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:18.898Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:18.898Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x412400782280) 0x89, CmdSN 0xa43e from world 10689 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2012-12-24T05:18:20.001Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:20.001Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x4124007ae3c0) 0x89, CmdSN 0xa42c from world 3720 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:22.029Z cpu1:2177)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x89 (0x412400781580, 10727) to dev "naa.600144f0bd9c020000004ff1a959000c" on path "vmhba32:C4:T0:L1" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:EVAL
2012-12-24T05:18:22.029Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:22.029Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x412400781580) 0x89, CmdSN 0xa444 from world 10727 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2012-12-24T05:18:25.082Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:25.082Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x412400791040) 0x89, CmdSN 0xa45c from world 10724 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:28.608Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:28.608Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x41240072d680) 0x89, CmdSN 0xa462 from world 10726 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:31.705Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:31.705Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x4124007b5340) 0x89, CmdSN 0xa466 from world 10724 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2012-12-24T05:18:33.080Z cpu1:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:33.080Z cpu1:2177)ScsiDeviceIO: 2322: Cmd(0x4124007abfc0) 0x89, CmdSN 0xa453 from world 3720 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:34.736Z cpu5:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:34.736Z cpu5:2177)ScsiDeviceIO: 2322: Cmd(0x41240072ae80) 0x89, CmdSN 0xa46a from world 10722 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2012-12-24T05:18:37.778Z cpu4:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:37.778Z cpu4:2177)ScsiDeviceIO: 2322: Cmd(0x41240072b280) 0x89, CmdSN 0xa47d from world 10689 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:40.868Z cpu4:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:40.868Z cpu4:2177)ScsiDeviceIO: 2322: Cmd(0x41240072ed80) 0x89, CmdSN 0xa482 from world 10727 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:43.882Z cpu4:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:43.882Z cpu4:2177)ScsiDeviceIO: 2322: Cmd(0x412400763980) 0x89, CmdSN 0xa487 from world 10726 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2012-12-24T05:18:46.118Z cpu2:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:46.118Z cpu2:2177)ScsiDeviceIO: 2322: Cmd(0x41240073bf80) 0x89, CmdSN 0xa479 from world 3720 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:46.985Z cpu2:2177)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x89 (0x4124007bb6c0, 2110) to dev "naa.600144f0bd9c020000004ff1a959000c" on path "vmhba32:C2:T0:L1" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2012-12-24T05:18:46.985Z cpu2:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:46.985Z cpu2:2177)ScsiDeviceIO: 2322: Cmd(0x4124007bb6c0) 0x89, CmdSN 0xa48b from world 2110 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:50.104Z cpu2:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:50.104Z cpu2:2177)ScsiDeviceIO: 2322: Cmd(0x412400734a00) 0x89, CmdSN 0xa4a1 from world 10722 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-12-24T05:18:53.120Z cpu2:2177)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x89 (0x412400783680, 10724) to dev "naa.600144f0bd9c020000004ff1a959000c" on path "vmhba32:C2:T0:L1" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2012-12-24T05:18:53.120Z cpu2:2177)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "naa.600144f0bd9c020000004ff1a959000c" state in doubt; requested fast path state update...
2012-12-24T05:18:53.120Z cpu2:2177)ScsiDeviceIO: 2322: Cmd(0x412400783680) 0x89, CmdSN 0xa4a8 from world 10724 to dev "naa.600144f0bd9c020000004ff1a959000c" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
----
I've tried to replicate this on the DL360G5, snapshots add and remove fine (takes less than a minute, I may lose one or two ping packets during it, the SAN is under a bit of resource contention). The BL480G1 is using a GB2EC managed switch, everything has jumboframes enabled (outside of the GB2EC, which is said to "automatically" support jumboframes), tried both with multipath and without.
ANY ideas?
Reply
0 Kudos
6 Replies
TomHowarth
Leadership
Leadership

What version of ESXi are you talking about???  and what is the exact model type of the server?

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
Reply
0 Kudos
a_p_
Leadership
Leadership

In addition to the ESXi version, please elaborate on your system (especially network) configuration. Which enclosure (C3000/C7000) do you use? Do you have dedicated physical switches for the storage traffic in your enclosure? Please post a screen shot of the ESXi host's virtual network configuration (vSwitch/port groups)? How does the physical network (switches, ...) look like?

André

Reply
0 Kudos
wdroush1
Hot Shot
Hot Shot

We're running 5.0.0 623860 (pretty sure that's U1). The exact model is a ProLiant BL480c G1.

Which enclosure (C3000/C7000) do you use?

C7000

Do you have dedicated physical switches for the storage traffic in your enclosure?

Yes, two stacked Dell PowerConnects, they also serve up iSCSI for 3 other hosts in another cluster backed by an EqualLogic, no issues on that end either.

Please post a screen shot of the ESXi host's virtual network configuration (vSwitch/port groups)?

http://img.photobucket.com/albums/v177/Strange_will/Computer/vSwitch-Path1_zps6cd6acc9.png

http://img.photobucket.com/albums/v177/Strange_will/Computer/vSwitch1_zps711b57e5.png

http://img.photobucket.com/albums/v177/Strange_will/Computer/iscsi-init_zps0270f309.png

http://img.photobucket.com/albums/v177/Strange_will/Computer/iSCSI_zps498c3ccc.png

How does the physical network (switches, ...) look like?

VMNIC1 is connected to a port on the GB2EC, and an additional port on the GB2EC is connected to the PowerConnects, all storage has redundant configuration on the PowerConnects (until I can reconfigure the other GB2EC which has lost management capabilities, I'm down to only two ports on thsi blade), these 2 ports are tagged in a VLAN. Pretty straight forward.

Additionally, I can probably flip the ports, the other adapter is a Broadcom BCM5715S while the current is a Broadcom BCm5708 to rule out that and narrow it down to the GB2EC switch.

Reply
0 Kudos
a_p_
Leadership
Leadership

Some thoughts:

  • You have two iSCSI VMkernel ports but only one uplink!? Remove the second iSCSI port group until you have a second uplink available.
  • Can you ensure the iSCSI network is not routed, to ensure it uses the expected route from and to the storage system?
  • Test Jumbo frame functionality by running vmkping -s 8972 -d <storage-IP-address> from the host
    (see http://rickardnobel.se/troubleshoot-jumbo-frames-with-vmkping/)

André

Reply
0 Kudos
wdroush1
Hot Shot
Hot Shot

André Pett wrote:

Some thoughts:

  • You have two iSCSI VMkernel ports but only one uplink!? Remove the second iSCSI port group until you have a second uplink available.

Sure thing, already tried this. I figured this wouldn't cause a problem with ESXi and configuration will be pretty much ready to go once the second switch is reconfigured, but to make sure this isn't a factor I'll turn one off.

Problem still persists.

  • Can you ensure the iSCSI network is not routed, to ensure it uses the expected route from and to the storage system?

Is there a way to check this from the host? Considering I don't have tracert. However I don't see any way it could be routing.

/var/log # vmkping -s 8972 -d 10.100.8.200
PING 10.100.8.200 (10.100.8.200): 8972 data bytes
8980 bytes from 10.100.8.200: icmp_seq=0 ttl=255 time=0.968 ms
8980 bytes from 10.100.8.200: icmp_seq=1 ttl=255 time=0.964 ms
8980 bytes from 10.100.8.200: icmp_seq=2 ttl=255 time=1.039 ms
--- 10.100.8.200 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.964/0.990/1.039 ms
Reply
0 Kudos
wdroush1
Hot Shot
Hot Shot

Hold on with this, may be VAAI related, checking with NAS vendor...

Reply
0 Kudos