Solved: Emulex lpfc floods vmkernel.log with errors consta...

cab3 · ‎10-08-2018

I'm running ESXi 6.5U2 with a Promise E630F SAN attached storage array. I have primarily two hosts, both with Emulex LPe12000 8G FC adapters going thru a Brocade 6510 FC switch. I've observed error messages on a recurring basis, along with I/O within guests coming to a screeching halt for many seconds. I've never had data loss, and the systems seem to recover, but in looking at the messages in vmkernel.log, I'm assaulted by the likes of what I've shown below, over and over, during the I/O issues.

I've updated HBA firmware, and am running the vendor drivers, compared to the native lpfc drivers, but nothing seems to actually help. In searching, I've found little relating to the errors I'm getting, or how to possibly fix them, so I'm very much hoping that someone here will have some ideas.

Here is a snippet of the log errors I'm seeing:

2018-10-08T22:02:17.133Z cpu38:67897)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x99d None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.133Z cpu38:67897)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x961 None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.135Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x967 None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.155Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x98e None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.155Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x998 None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.155Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x98d None: Data(x18:x0:x0:x0)

2018-10-08T22:02:17.158Z cpu38:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x964 None: Data(x18:x0:x0:x0)

2018-10-08T22:06:53.383Z cpu38:69589)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x28 failed <0/0> sid x010300, did x011200, oxid xffff iotag x8e2 None: Data(x18:x0:x0:x0)

2018-10-08T22:10:35.153Z cpu40:69589)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x28 failed <0/0> sid x010700, did x011200, oxid xffff iotag x8f9 None: Data(x18:x0:x0:x0)

2018-10-08T22:10:35.183Z cpu40:69589)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x28 failed <0/0> sid x010700, did x011200, oxid xffff iotag x915 None: Data(x18:x0:x0:x0)

2018-10-08T22:10:41.469Z cpu40:66398)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x88 failed <0/0> sid x010700, did x011200, oxid xffff iotag x91d None: Data(x18:x0:x0:x0)

2018-10-08T22:10:41.491Z cpu40:69359)lpfc: lpfc_handle_status:5159: 1:(0):3271: FCP cmd x88 failed <0/0> sid x010700, did x011200, oxid xffff iotag x8f2 None: Data(x18:x0:x0:x0)

2018-10-08T22:12:39.565Z cpu11:65686)ScsiDeviceIO: 2954: Cmd(0x439500856dc0) 0x1a, CmdSN 0x149d from world 0 to dev "naa.6d4ae5207caec7001b84ffcb127794d1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-10-08T22:14:12.653Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x958 None: Data(x18:x0:x0:x0)

2018-10-08T22:14:12.869Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x8a failed <0/0> sid x010300, did x011200, oxid xffff iotag x914 None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.300Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x8a failed <0/0> sid x010300, did x011200, oxid xffff iotag x991 None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.477Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x28 failed <0/0> sid x010300, did x011200, oxid xffff iotag x8dd None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.499Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x28 failed <0/0> sid x010300, did x011200, oxid xffff iotag x97d None: Data(x18:x0:x0:x0)

2018-10-08T22:14:28.549Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x8a failed <0/0> sid x010300, did x011200, oxid xffff iotag x94c None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.015Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x993 None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.036Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x8de None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.058Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x8cc None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.079Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x9a1 None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.100Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x962 None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.125Z cpu32:66397)lpfc: lpfc_handle_status:5159: 0:(0):3271: FCP cmd x2a failed <0/0> sid x010300, did x011200, oxid xffff iotag x986 None: Data(x18:x0:x0:x0)

2018-10-08T22:14:29.145Z cpu32:66397)lpfc0:(0) log compression on target 0 starting.

rajen450m · ‎10-09-2018

HI,

Its a good decision, if your business allows change to Qlogic HBAs.

Good Luck.

Regards,

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,

View solution in original post

rajen450m · ‎10-09-2018

Hi,

This is common errors on Emulex HBAs, can be due to

Due to faulty HBA
FCP cmd x2a - Possible firmware/driver conflict combination if no other storage errors are observed. Refer VMware Knowledge Base
Multi-pathing and try to add the correct paths as per ALUA considerations. Hope you are not configured with Round Robin multipathing and using vendor PSP? Please try to find the number of paths associated and figure out, it is due to the
I have used SCSI sense code decoder tool (from virten.net) for your dev "naa.6d4ae5207caec7001b84ffcb127794d1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

Type	Code	Name	Description
Host Status	[0x0]	OK	This status is returned when there is no error on the host side. This is when you will see if there is a status for a Device or Plugin. It is also when you will see Valid sense data instead of Possible sense Data.
Device Status	[0x2]	CHECK_CONDITION	This status is returned when a command fails for a specific reason. When a CHECK CONDITION is received, the ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier, and other bits). The sense data is listed after Valid sense data in the order of Sense Key, Additional Sense Code, and ASC Qualifier.
Plugin Status	[0x0]	GOOD	No error. (ESXi 5.x / 6.x only)
Sense Key	[0x5]	ILLEGAL REQUEST
Additional Sense Data	24/00	INVALID FIELD IN CDB

Try to check the number of paths to HBA on device naa.6d4ae5207caec7001b84ffcb127794d1 and verify accordingly..

Regards,

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,

cab3 · ‎10-09-2018

Thanks for the reply Raj.

I'm seeing the same error on two different systems, with identical model HBA's, into the same switch and storage array. I may end up swapping out to some QLogic HBAs to see if the problem goes away, but I'd already referenced the VMWare KB article you referred and made sure that HBA firmware and drivers matched latest recommended and are identical on both systems. I also swapped out the native driver for the Emulex driver, with no significant change in behavior (VMW_bootbank_lpfc_11.4.33.1-6vmw.650.2.50.8294253 replaced with EMU_bootbank_lpfc_12.0.257.5-1OEM.650.0.0.4598673)

# vmkmgmt_keyval -a # Abbreviated output

Emulex LightPulse FC SCSI 12.0.257.5

Emulex LPe12002-M8 8Gb 2-port PCIe Fibre Channel Adapter on PCI bus 0000:0e device 00 fn 1 port 1 Link Speed: 8 Gb

BoardNum: 1

FW Version: 2.02A4

HW Version: 31004549

ROM Version: 12.00a4

SerialNum: VM22140846

PCI ID: 10df f100 10df f100

I have validated the Multipathing settings as per VMware compatibility list as using VMW_SATP_DEFAULT_AA & VMW_PSP_FIXED as per recommendations for Promise E630F array. The only difference I can see in the matrix is that the Promise array was tested with Qlogic device driver, and not Emulex. I do have a third host, which is my backup server, attached to the same array, using a 4G QLogic HBA thru the same switch, and it exhibits no errors at all to the same array storage.

As for the SCSI error on the local datastore, it's only a single path device, behind a hardware RAID controller. It's the least of my concern, as I don't really use it for anything, but I'll see if I can determine what's going on with it.

My primary focus, however, is figuring out why I continue to get logs full of the same error messages from the Emulex HBA's on both systems. I'm trying to get my hands on a couple Qlogic HBAs to see if it's related to the HBA itself, but for now the Emulex are what I have to work with at 8G.

rajen450m · ‎10-09-2018

HI,

Its a good decision, if your business allows change to Qlogic HBAs.

Good Luck.

Regards,

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,

cab3 · ‎10-23-2018

I did manage to replace the two Emulex adapters that were flooding my logs with the errors as shown. After putting in Qlogic 8G HBA's, the logs have been very very quiet.

All

Emulex lpfc floods vmkernel.log with errors constantly