VMware Cloud Community
cris74b
Contributor
Contributor

CNA issue Dell Blade

I have a blade dell FX2 with 3 blade FC630 connect via CNA BCM57840  (4 interface Broadcom Corporation QLogic 57840 10 Gigabit) to broace 300 switch. The storage it is EMC VNX5100

In 3 blades system are installed vmware esxi6 u2 (dell media installation).

In vmware the FC driver it is 1.713.10.v60.4  in vmware HCL.

I have a issue when copying large amonts of data both from within VMs and at a host level, it will start copying data at full speed and then completely stop for 30-60sec and more before again resuming at full speed.

When this append i have continue event in vmware of "Lost access to volume XXXXx" and "Successfully restored access to volume XXXX"; i have also log in vmkernel.log like

pu3:32903)<3>bnx2fc:vmhba46:0000:01:00.1: bnx2fc_eh_abort:1250 eh_abort: xid = 0x49a

2016-08-22T14:34:15.009Z cpu3:32903)<3>bnx2fc:vmhba46:0000:01:00.1: bnx2fc_eh_abort:1358 abort succeeded

2016-08-22T14:34:15.009Z cpu3:32903)<3>bnx2fc:vmhba46:0000:01:00.1: bnx2fc_eh_abort:1250 eh_abort: xid = 0x468

2016-08-22T14:34:15.009Z cpu3:32903)<3>bnx2fc:vmhba46:0000:01:00.1: bnx2fc_eh_abort:1358 abort succeeded

2016-08-22T14:34:15.009Z cpu3:32903)<3>bnx2fc:vmhba46:0000:01:00.1: bnx2fc_eh_abort:1250 eh_abort: xid = 0x47d

2016-08-22T14:34:15.009Z cpu3:32903)<3>bnx2fc:vmhba46:0000:01:00.1: bnx2fc_eh_abort:1358 abort succeeded

Anyone seen this behavior?

0 Kudos
3 Replies
DavoudTeimouri
Virtuoso
Virtuoso

Hi,

Check this: Understanding lost access to volume messages in ESXi 5.5/6.x (2136081) | VMware KB

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/
0 Kudos
GustavoAyala
Enthusiast
Enthusiast

I'm having a similar problem. Intermittent disconnections to the datastores ( ESXi 5.5  😞

Same symtomps described in that article. Lost heartbeat to the datastore. Some kind of blocking or HBA time out during the night window (a couple of backups during that time but nothing too stressful I guess)

Lost access to volume xxx ( *** ) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

and a couple of seconds later

Successfully restored access to volume xxx ( *** ) following connectivity issues.

I'm using HP blades, but same CNA Qlogic 57810 10 Gigabit, same FC switch Brocade/EMC BR300,  VNX 5400

Latest drives from blade side ( SPP 2016.04)

Fabric OS:  v7.3.0b

VC 4.31

ESXi 5.5 - 3568722

Still have to try the infamous Fillword solution (channge from default 0 to 3) but it's supposed to be fixed in VC 3.70, and I'm using VC 4.31, so fillword should not be a problem.

Anyone any ideas ?

0 Kudos
bridrod
Enthusiast
Enthusiast

Hi,

Even though this thread is a bit old, I wanted to share the issues we found in our environment related to CNA and FCoE (which has been going on for months). In our case they are Broadcom/QLogic adapters (Broadcom QLogic 57810) on Dell M1000e blade chassis equipment. We found out the "lost access to volume" error messages were related to saturation on the LUNs assigned to the hosts (after all recommended patches/firmware and tuning settings from hardware to ESXi OS). We had about 135 LUNs assigned and we were getting a lot of such messages, and at some points affecting VMs (slowness, disconnections, etc). We had LUNs with different sizes: 500GB, 1TB, 2TB. Going with the saturation theory, we requested new LUNs (the equivalent in total size as new 4TB chunks) so that we were able to migrate all VMs from the smaller sizes to the 4TB LUNs. At the end we had about 45 LUNs (instead of the original 135 LUNs) where "lost access" went from thousands of alerts down to no alerts at all. We are still considering 8TB LUNs to bring the total numbers even lower in case we need to add more LUNs.

Apparently some adapters like CNAs, have less cache, less power to handle a higher number of LUNs, since the work is offloaded to software/driver and not handled on the adapter/hardware level.

Hopefully that will help someone out there with similar issues.

0 Kudos