VMware Cloud Community
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

vmkernel log filled with fcoe errors

I'm building an ESXi 6 host. It has a dell branded broadcom (now qlogic) 57800. I'm running the latest ESXi and firmware for the card.

My vmkernel.log is filled with these messages every 2 seconds

2015-06-24T17:08:47.788Z cpu0:33368)<6>host11: fip: host11: FIP VLAN ID unavail. Retry VLAN discovery.
2015-06-24T17:08:47.788Z cpu0:33368)<6>host11: fip: fcoe_ctlr_vlan_request() is done

I found this article:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=212052...

which might explain what's going on but I don't know how to fix it.

I don't use FCoE. I'm using iSCSI storage and network on the same 10gig link (other 10gig link i nthe card is not pluged in). I checked the bios on the dell server but cannot find anything to disable just that one port.

Is this is a bug? Is there any way around the problem of this FCoE garbage filling up the log? Looking at another host with the same hardware (but older firmware and esxi 5.5), i don't see any FCoE adapter listed.

Reply
0 Kudos
1 Solution

Accepted Solutions
cykVM
Expert
Expert
Jump to solution

Reply
0 Kudos
15 Replies
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

I went back to a previous version of the network card firmware but problem is still there so I'm thinking maybe it's the driver. I tried removing the bnx2fx it was using but it looks liek vmware just used some generic fcoe driver after reboot. I don't want to start removing all kinds of drivers are they will come back when I update (I assume). Is there any way to tell ESXi to just ignore all the fcoe stuff or just not load it?

Reply
0 Kudos
cykVM
Expert
Expert
Jump to solution

Did you probably activate the software FCoE adapter/initiator accidentally?

Check Setting up Software FCoE on vSphere 5 · Scott's Weblog · The weblog of an IT pro specializing in vir...

Reply
0 Kudos
Alistar
Expert
Expert
Jump to solution

We're having the same issue, wonder how to get rid of this pestering message as well.

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
Reply
0 Kudos
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

I've never added any FCoE adapter although looking at the link you posted it makes it seem like I did. In the vsphere client, I clicked on the two items, selected remove and then rebooted. Now, i can click the add button and see the option to add FCoE (which I cancelled) but the log is still filling with errors.

When I run this command: esxcli fcoe nic list is shows my two vnics as active. Is there any way to set them to inactive?

vmnic0

   User Priority: 3

   Source MAC: <omitted>

   Active: true

   Priority Settable: false

   Source MAC Settable: false

   VLAN Range Settable: true

   VN2VN Mode Enabled: false

vmnic1

   User Priority: 3

   Source MAC: <omitted>

   Active: true

   Priority Settable: false

   Source MAC Settable: false

   VLAN Range Settable: true

   VN2VN Mode Enabled: false

Or going back to the link in my original post, can i somehow fake something to have a matching VLAN to avoid that error in the logs?

Reply
0 Kudos
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

Did you ever open a ticket with Dell and/or vmware?

I'm checking a couple of hosts with the same card running 5.5U1 with firmware 7.10.18 and under storage adapters, they only list iSCSI, not FCoE. Since I tried that same firmware on this host, I'm thinking it's something with ESXi 6 but I'm not sure what to check/try next.

Reply
0 Kudos
cykVM
Expert
Expert
Jump to solution

At least seems to be a known issue: VMware KB: FCoE storage connections fail when LUNs are presented to the host on different VLANs but unfortunately with no solution, yet.

Reply
0 Kudos
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

I saw that article but it says it applies to 5.5 as well but I'm not having that issue with the same hardware in 5.5. Since I'm not using FCoE, can I somehow fake this vlan stuff just to make it happy and stop filling the logs with garbage?

Reply
0 Kudos
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

I managed to get the messages in the vmkernel.log to stop. I don't know what the implications are of what I did, but it stops the messages Smiley Happy My host isn't really running at VMs yet so only time will tell whether or not it's a problem.

I have 2 ports on my card but only one was plugged in. The errors were coming from the active port.

This is the command I ran:

esxcli fcoe nic set -n=vmnic0 --enable-vn2vn y

(I also ran esxcli fcoe nic set -n=vmnic0 -v=999, where 999 is a dummy vlan but then I changed it back to 0 which was the default and the messages still stayed away)

The command requires a reboot.

Since then no errors in the logs. Just wish I knew what I did Smiley Happy

Reply
0 Kudos
cykVM
Expert
Expert
Jump to solution

Hopefully you did not enable something else this way. From your settings above it might be even better/working to put the "VLAN Range Settable: true" to false (disabled) and of course the vn2vm mode back to disabled.

See for example: VMware KB: FCoE Configuration and Basic Troubleshooting for Broadcom NetXtreme II FCoE Offload Capab...

Looks to me like a bug in the driver for that Broadcom device...

Reply
0 Kudos
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

There is no option to disable vlan, only set a vlan. Without turning on vn2vn, you can't set the vlan. I tried updating the drivers (and turning off vn2vn) but that didn't help. Just made the error a little more descriptive:

  

2015-06-26T11:26:13.471Z cpu32:33350)<6>host12: fip: fcoe_ctlr_vlan_request() is done

2015-06-26T11:26:16.824Z cpu26:33370)<6>host12: fip: host12: FIP VLAN ID unavail. Retry VLAN discovery.

2015-06-26T11:26:16.824Z cpu26:33370)<6>host12: fip: fcoe_ctlr_vlan_request() is done

2015-06-26T11:26:18.826Z cpu35:33354)<6>host12: fip: host12: FIP VLAN ID unavail. Retry VLAN discovery.

2015-06-26T11:26:18.826Z cpu35:33354)<6>host12: fip: fcoe_ctlr_vlan_request() is done

2015-06-26T11:26:20.810Z cpu2:33366)<3>bnx2fc:vmhba42:0000:01:00.0: bnx2fc_vlan_disc_timeout:191 VLAN Discovery Failed. Trying default VLAN 1002

2015-06-26T11:26:20.810Z cpu2:33366)<6>host12: fip: link down.

2015-06-26T11:26:20.810Z cpu2:33366)<6>host12: libfc: Link down on port (     0)

2015-06-26T11:26:20.810Z cpu2:33366)<3>bnx2fc:vmhba42:0000:01:00.0: bnx2fc_vlan_disc_cmpl:264 vmnic0: vlan_disc_cmpl: hba is on vlan_id 1002

2015-06-26T11:26:20.810Z cpu2:33366)<3>bnx2fc:vmhba42:0000:01:00.0: bnx2fc_start_disc:3133 Entered bnx2fc_start_disc

2015-06-26T11:26:20.810Z cpu2:33366)<6>host12: libfc: Link up on port (     0)

2015-06-26T11:26:40.812Z cpu8:33353)<6>host12: fip: link down.

2015-06-26T11:26:40.812Z cpu8:33353)<6>host12: libfc: Link down on port (     0)

2015-06-26T11:26:40.812Z cpu8:33353)<3>bnx2fc:vmhba42:0000:01:00.0: bnx2fc_vlan_disc_timeout:216 VLAN 1002 failed. Trying VLAN Discovery.

2015-06-26T11:26:40.812Z cpu8:33353)<3>bnx2fc:vmhba42:0000:01:00.0: bnx2fc_start_disc:3133 Entered bnx2fc_start_disc

2015-06-26T11:26:40.812Z cpu8:33353)fcoe_ctlr_link_up: setting fabric mode for vmnic0

2015-06-26T11:26:40.812Z cpu8:33353)<6>host12: libfc: Link up on port (     0)

2015-06-26T11:26:40.812Z cpu8:33353)<6>host12: fip: fcoe_ctlr_vlan_request() is done

Doing those two steps also made the host refuse to connect to vCenter after reboot. I get an error about vpxa not starting. Checking the vpxa.log on the host, I see errors about fcoe config.

I reinstalled the old driver and turned back on vn2vn but vpxa still refuses to start. Looks like I'm going to reinstall ESXi now Smiley Happy

Reply
0 Kudos
cykVM
Expert
Expert
Jump to solution

From documentation vSphere 6.0 Documentation Center you may try:

esxcli fcoe nic disable --nic-name vmnic0

and the same for vmnic1.

This should completely disable the fcoe thing on those adapters.

Reply
0 Kudos
cykVM
Expert
Expert
Jump to solution

I just found this which sounds pretty interesting: Zenfolio | Michael Davis | Broadcom BCM57810 FCoE and ESXi

Reply
0 Kudos
FreddyFredFred
Hot Shot
Hot Shot
Jump to solution

Interesting timing. I just off the phone with Dell and we came up with exactly the same solution just before you posted that link. I had all the individual steps myself, but didn't have them in exactly the right order (and was a little afraid to delete that .sh startup file even though I was suspecting it was part of the problem)

Just for reference in case that link goes away, here's the commands I used:

esxcli software vib remove -n scsi-bnx2fc

cd /etc/rc.local.d/

rm 99bnx2fc.sh

esxcli fcoe nic disable -n=vmnic0

esxcli fcoe nic disable -n=vmnic1

Reply
0 Kudos
cykVM
Expert
Expert
Jump to solution

Glad it helped in some way Smiley Wink

But just to mention the link above also suggests to:

Probably a good idea to also modify VMware's Non-Critical Patch baseline to exclude the bnx2fc VIB to prevent it's accidental reinstallation.

So on any upgrade/patch installation this might start all over again...

Reply
0 Kudos
EdZ314
Enthusiast
Enthusiast
Jump to solution

This may apply, even though it is on ESXi 5.5 and on an HP Server . I've got some DL580 Gen8 servers that had the same issue when configured with ESXi 5.5 U2 using the current HP image. After much experimentation, I found that I was able to disable the FCoE capabilities from being enabled by changing the NIC personality on the HP FlexFabric 534FLR-SFP+ CNA cards. They show up as " Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet."

Before you make changes if you run this command on your vmnic's it will report that FCoE is available - for example, this command:

esxcli fcoe nic discover -n vmnic0

Will produce this output:

Discovery enabled on device 'vmnic0'

Here's what I did to turn it off. Apparently there are some settings on the CNA that can only be modified from the Legacy BIOS boot mode. The parameter below does not appear to be editable from the normal System Configuration screen on boot. If there's a CTRL+S option on the Broadcom CNA configuration on boot on the Dell server I imagine the same procedure would work.

> Boot and choose F11 for the Boot Menu

> Select Legacy BIOS One-Time Boot Menu

> Type CTRL+S within 5 secs

> Select one CNA interface

> Open Device Hardware Configuration

> Under Storage Personality change from FCoE (the default) to iSCSI

> Exit back to main screen saving changes

> Repeat on 2nd CNA interface

> Exit saving changes, continue exiting and saving until prompted to reboot

Here's the output of the same command after the changes:

esxcli fcoe nic discover -n vmnic0

PNIC "vmnic 0" is not FCoE-capable.

Note: If changing the settings as above does not work, you can boot into the normal F9 setup and just change one of the CNA parameters from the normal menu, then put it back and save it from that screen (after the changes above). I had two servers which the steps above worked the first time, and two others where it didn't seem to work but that could be due to other changes that were made as I was experimenting with other settings.I contacted VMware on this and they referenced the KB that you provided below, but didn't have a complete resolution. I also contacted HP and they were not able to provide an immediate resolution either.