VMware Cloud Community
cab3
Contributor
Contributor
Jump to solution

Emulex lpfc floods vmkernel.log with errors constantly

I'm running ESXi 6.5U2 with a Promise E630F SAN attached storage array.  I have primarily two hosts, both with Emulex LPe12000 8G FC adapters going thru a Brocade 6510 FC switch.  I've observed error messages on a recurring basis, along with I/O within guests coming to a screeching halt for many seconds.  I've never had data loss, and the systems seem to recover, but in looking at the messages in vmkernel.log, I'm assaulted by the likes of what I've shown below, over and over, during the I/O issues. 

I've updated HBA firmware, and am running the vendor drivers, compared to the native lpfc drivers, but nothing seems to actually help.  In searching, I've found little relating to the errors I'm getting, or how to possibly fix them, so I'm very much hoping that someone here will have some ideas.

Here is a snippet of the log errors I'm seeing:

2018-10-08T22:02:17.133Z cpu38:67897)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x99d  None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.133Z cpu38:67897)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x961  None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.135Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x967  None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.155Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x98e  None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.155Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x998  None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.155Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x98d  None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.158Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x964  None: Data(x18:x0:x0:x0)

2018-10-08T22:06:53.383Z cpu38:69589)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x28 failed <0/0> sid x010300, did x011200, oxid xffff iotag x8e2  None: Data(x18:x0:x0:x0)

2018-10-08T22:10:35.153Z cpu40:69589)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x28 failed <0/0> sid x010700, did x011200, oxid xffff iotag x8f9  None: Data(x18:x0:x0:x0)

2018-10-08T22:10:35.183Z cpu40:69589)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x28 failed <0/0> sid x010700, did x011200, oxid xffff iotag x915  None: Data(x18:x0:x0:x0)

2018-10-08T22:10:41.469Z cpu40:66398)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x88 failed <0/0> sid x010700, did x011200, oxid xffff iotag x91d  None: Data(x18:x0:x0:x0)

2018-10-08T22:10:41.491Z cpu40:69359)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x88 failed <0/0> sid x010700, did x011200, oxid xffff iotag x8f2  None: Data(x18:x0:x0:x0)

2018-10-08T22:12:39.565Z cpu11:65686)ScsiDeviceIO: 2954: Cmd(0x439500856dc0) 0x1a, CmdSN 0x149d from world 0 to dev "naa.6d4ae5207caec7001b84ffcb127794d1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-10-08T22:14:12.653Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x958  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:12.869Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x8a failed <0/0> sid x010300, did x011200, oxid xffff iotag x914  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.300Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x8a failed <0/0> sid x010300, did x011200, oxid xffff iotag x991  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.477Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x28 failed <0/0> sid x010300, did x011200, oxid xffff iotag x8dd  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.499Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x28 failed <0/0> sid x010300, did x011200, oxid xffff iotag x97d  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.549Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x8a failed <0/0> sid x010300, did x011200, oxid xffff iotag x94c  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.015Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x993  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.036Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x8de  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.058Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x8cc  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.079Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x9a1  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.100Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x962  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.125Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x986  None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.145Z cpu32:66397)lpfc0:(0) log compression on target 0 starting.

0 Kudos
1 Solution

Accepted Solutions
rajen450m
Hot Shot
Hot Shot
Jump to solution

HI,

Its a good decision, if your business allows change to Qlogic HBAs.

Good Luck.

Regards,

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,

View solution in original post

0 Kudos
4 Replies
rajen450m
Hot Shot
Hot Shot
Jump to solution

Hi,

This is common errors on Emulex HBAs, can be due to

  • Due to faulty HBA
  • FCP cmd x2a - Possible firmware/driver conflict combination if no other storage errors are observed. Refer VMware Knowledge Base
  • Multi-pathing and try to add the correct paths as per ALUA considerations. Hope you are not configured with Round Robin multipathing and using vendor PSP? Please try to find the number of paths associated and figure out, it is due to the
  • I have used SCSI sense code decoder tool  (from virten.net) for your dev "naa.6d4ae5207caec7001b84ffcb127794d1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
TypeCodeNameDescription
Host Status[0x0]OKThis status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.
Device Status[0x2]CHECK_CONDITIONThis status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Plugin Status[0x0]GOODNo error. (ESXi 5.x / 6.x only)
Sense Key[0x5]ILLEGAL REQUEST
Additional Sense Data24/00INVALID FIELD IN CDB

Try to check the number of paths to HBA on device naa.6d4ae5207caec7001b84ffcb127794d1 and verify accordingly..

Regards,

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,
0 Kudos
cab3
Contributor
Contributor
Jump to solution

Thanks for the reply Raj.

I'm seeing the same error on two different systems, with identical model HBA's, into the same switch and storage array.  I may end up swapping out to some QLogic HBAs to see if the problem goes away, but I'd already referenced the VMWare KB article you referred and made sure that HBA firmware and drivers matched latest recommended and are identical on both systems.  I also swapped out the native driver for the Emulex driver, with no significant change in behavior (VMW_bootbank_lpfc_11.4.33.1-6vmw.650.2.50.8294253 replaced with EMU_bootbank_lpfc_12.0.257.5-1OEM.650.0.0.4598673)

# vmkmgmt_keyval -a   # Abbreviated output

Emulex LightPulse FC SCSI 12.0.257.5

Emulex LPe12002-M8 8Gb 2-port PCIe Fibre Channel Adapter on PCI bus 0000:0e device 00 fn 1 port 1 Link Speed: 8 Gb

BoardNum:       1

FW Version:     2.02A4

HW Version:     31004549

ROM Version:    12.00a4

SerialNum:      VM22140846

PCI ID:         10df f100 10df f100

I have validated the Multipathing settings as per VMware compatibility list as using VMW_SATP_DEFAULT_AA & VMW_PSP_FIXED as per recommendations for Promise E630F array.  The only difference I can see in the matrix is that the Promise array was tested with Qlogic device driver, and not Emulex.  I do have a third host, which is my backup server, attached to the same array, using a 4G QLogic HBA thru the same switch, and it exhibits no errors at all to the same array storage.

As for the SCSI error on the local datastore, it's only a single path device, behind a hardware RAID controller.  It's the least of my concern, as I don't really use it for anything, but I'll see if I can determine what's going on with it.

My primary focus, however, is figuring out why I continue to get logs full of the same error messages from the Emulex HBA's on both systems.  I'm trying to get my hands on a couple Qlogic HBAs to see if it's related to the HBA itself, but for now the Emulex are what I have to work with at 8G.

0 Kudos
rajen450m
Hot Shot
Hot Shot
Jump to solution

HI,

Its a good decision, if your business allows change to Qlogic HBAs.

Good Luck.

Regards,

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,
0 Kudos
cab3
Contributor
Contributor
Jump to solution

I did manage to replace the two Emulex adapters that were flooding my logs with the errors as shown.  After putting in Qlogic 8G HBA's, the logs have been very very quiet.

0 Kudos