VMware Cloud Community
grp
Contributor
Contributor

ESXi 5 iSCSI bug still here in 5.1?

Hi everybody,

I have deployed ESXi 5.1 on an HP P2000 G3 iSCSI array (see my post on this at http://communities.vmware.com/thread/402472?start=0&tstart=0). So far I have had no issues, but recently I have noticed something:

There are errors logged regarding iSCSI. I have first seen them when upgraded my array's firmware and had to restart ESXi. I am able to reproduce them by rescanning the adapters. Here are the errors:

Login to iSCSI target iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 on vmhba37 @ vmk1 failed. The iSCSI initiator could
not establish a network connection to the target.


Login to iSCSI target iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 on vmhba37 @ vmk2 failed. The iSCSI initiator could
not establish a network connection to the target.

Now, after some research, I found out this is a known issue,

http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/

which VMware addresses in a particular way (and claims it was fixed before 5.1 was out):

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=200814...

Well, I've studied the above documents thoroughly (I must also point out the fact that my configuration is similar - I have configured two VMKernels, each of which has the one vmnic active and the other disabled interchangeably). I've tried only the first solution (disable Failback) with no luck. The error is still there. However I have not experienced any issues with my iSCSI links so far. They seem to work fine (at least to my knowledge). Therefore, my questions are the following:

1. Should I be worried about these errors?

2. What would be a quick and simple way to verify the iSCSI links are working OK? Is seeing the datastore mounted and the datapaths 'Active' enough?

I attach some screenshots to illustrate. I will appreciate some response. Thank you.

0 Kudos
4 Replies
Josh26
Virtuoso
Virtuoso

Third post I've seen in the last few days about a P2000 G3 presenting errors following a firmware update, yet appearing to work fine.

This is quite different to the huge bug fixed in 5.0, which pretty much broke iSCSI all together.

I would suggest logging an HP case (who will invariably recommend you start by updating your firmware).

grp
Contributor
Contributor

Thank you for your reply. However it seems to me this is not a problem with the array. I downgraded the firmware with no luck. I have noticed something since your post, though.

The two iSCSI interfaces of the array are: 10.199.1.10 and 10.199.1.20.

The VMkernels are: 10.199.1.4 and 10.199.1.5.

You can see in my initial post's screenshots that both are in active state. So they are supposed to work fine.

But have a look at this (from the host's ssh):

~ # vmkping 10.199.1.10
PING 10.199.1.10 (10.199.1.10): 56 data bytes
64 bytes from 10.199.1.10: icmp_seq=0 ttl=64 time=0.281 ms
64 bytes from 10.199.1.10: icmp_seq=1 ttl=64 time=0.270 ms
64 bytes from 10.199.1.10: icmp_seq=2 ttl=64 time=0.446 ms

--- 10.199.1.10 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.270/0.332/0.446 ms
~ #
~ # vmkping 10.199.1.20
PING 10.199.1.20 (10.199.1.20): 56 data bytes

--- 10.199.1.20 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
~ #

This changes. At some other time, 1.20 replied, 1.10 did not.

To sum up the initial process. This is what I did when first connected the array to the server.

1. Set two iSCSI IPs of the array.

2. Created new vSwitch, with two VMkernels, one for each link. My adapter is HP NCC522SFP.

3. Changed teaming/failover settings.

4. Added Software iSCSI adapter.

5. Connected the array.

I did not use CHAP, nor changed any of the automatically generated iSCSI IQNs.

Should I create two vSwitches with one VMkernel for each one?

Btw, could you please provide the other two links that you mentioned?

PS: Here is the logging produced when doing adapter rescanning:

2013-03-15T08:26:31Z iscsid: discovery_sendtargets::Running discovery on IFACE default(iscsi_vmk) (drec.transport=iscsi_vmk)
2013-03-15T08:26:32Z iscsid: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk1(iscsi_vmk) (drec.transport=iscsi_vmk)
2013-03-15T08:26:32Z iscsid: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk2(iscsi_vmk) (drec.transport=iscsi_vmk)
2013-03-15T08:26:37Z iscsid: connection failed for discovery (err = Interrupted system call)!
2013-03-15T08:26:37Z iscsid: connection to discovery address 10.199.1.10 failed
2013-03-15T08:26:37Z iscsid: connection login retries (reopen_max) 5 exceeded
2013-03-15T08:26:37Z iscsid: discovery_sendtargets::Running discovery on IFACE default(iscsi_vmk) (drec.transport=iscsi_vmk)
2013-03-15T08:26:42Z iscsid: connection failed for discovery (err = Interrupted system call)!
2013-03-15T08:26:42Z iscsid: connection to discovery address 10.199.1.20 failed
2013-03-15T08:26:42Z iscsid: connection login retries (reopen_max) 5 exceeded
2013-03-15T08:26:42Z iscsid: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk1(iscsi_vmk) (drec.transport=iscsi_vmk)
2013-03-15T08:26:47Z iscsid: connection failed for discovery (err = Interrupted system call)!
2013-03-15T08:26:47Z iscsid: connection to discovery address 10.199.1.20 failed
2013-03-15T08:26:47Z iscsid: connection login retries (reopen_max) 5 exceeded
2013-03-15T08:26:47Z iscsid: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk2(iscsi_vmk) (drec.transport=iscsi_vmk)
2013-03-15T08:26:47Z iscsid: Login Target Skipped: iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 if=iscsi_vmk@vmk1 addr=10.199.1.10:3260 (TPGT:1 ISID:0x5) (Already Running)
2013-03-15T08:26:47Z iscsid: Login Target: iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 if=iscsi_vmk@vmk2 addr=10.199.1.10:3260 (TPGT:1 ISID:0x6)
2013-03-15T08:26:47Z iscsid: Notice: Assigned (H37 T0 C0 session=1d, target=1/3)
2013-03-15T08:26:47Z iscsid: Login Target: iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 if=iscsi_vmk@vmk1 addr=10.199.1.20:3260 (TPGT:3 ISID:0x7)
2013-03-15T08:26:47Z iscsid: Notice: Assigned (H37 T0 C2 session=1e, target=1/4)
2013-03-15T08:26:47Z iscsid: Login Target Skipped: iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 if=iscsi_vmk@vmk2 addr=10.199.1.20:3260 (TPGT:3 ISID:0x8) (Already Running)
2013-03-15T08:26:47Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:48Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:49Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:50Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:51Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:52Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:53Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:54Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=2 Failed=0
2013-03-15T08:26:55Z iscsid: Login Failed: iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 if=iscsi_vmk@vmk2 addr=10.199.1.10:3260 (TPGT:1 ISID:0x6) Reason: 00080000 (Initiator Connection Failure)
2013-03-15T08:26:55Z iscsid: Notice: Reclaimed Channel (H37 T0 C0 oid=1)
2013-03-15T08:26:55Z iscsid: Login Failed: iqn.1986-03.com.hp:storage.p2000g3.111812a1a4 if=iscsi_vmk@vmk1 addr=10.199.1.20:3260 (TPGT:3 ISID:0x7) Reason: 00080000 (Initiator Connection Failure)
2013-03-15T08:26:55Z iscsid: Notice: Reclaimed Channel (H37 T0 C2 oid=1)
2013-03-15T08:26:55Z iscsid: DISCOVERY: transport_name=iscsi_vmk Pending=0 Failed=2

0 Kudos
grp
Contributor
Contributor

I am still having this issue even with the latest ESXI patch (Apr 25). If anyone can help I would appreciate it.

0 Kudos
MKguy
Virtuoso
Virtuoso

Can you ping both vmknics from another ESXi host?

Are you using Jumbo Frames?

Check if ARP resolution works for both the target IPs at the same time, e.g:

# esxcli network ip neighbor list
Neighbor     Mac Address        Vmknic    Expiry  State
-----------  -----------------  ------  --------  -----
10.1.1.1   00:15:3b:ff:81:89  vmk0    1041 sec
[...]

Sniff traffic on a vmknic which doesn't work currently to see if there is any traffic at all:

# tcpdump-uw -i vmk1 -nn
tcpdump-uw: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmk1, link-type EN10MB (Ethernet), capture size 96 bytes
[...]

Check which driver your NICs are running with, e.g.:

# ethtool -i vmnic5
driver: bnx2
version: 2.2.1l.v50.1
firmware-version: bc 1.9.6
bus-info: 0000:03:00.0

For your NC522 NIC, it should be the nx_nic driver. Check which driver version you're running, the latest one is 5.0.626:

https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXI5X-QLOGIC-NX_NIC-50626&productId=327

Also make sure the NIC firmware is up to date.

-- http://alpacapowered.wordpress.com
0 Kudos