wilsonlopes00
Contributor
Contributor

Vsphere 5.5 and Emulex OneConnect 10Gb NIC trouble

I have installed  ESXi5.5 in a server with Emulex OneConnect 10Gb NICs.

I have installed the last driver for this nic - elxnet-10.0.575.9-1OEM.550.0.0.1331820.x86_64.vib.

After some network activity of virtual machines, the interfaces go down, even the switch ports are up.

vmnic4  0000:05:00.00 elxnet      Down 0Mbps     Half   00:00:c9:e4:13:16 9000   Emulex Corporation OneConnect 10Gb NIC

vmnic5  0000:05:00.01 elxnet      Down 0Mbps     Half   00:00:c9:e4:13:18 9000   Emulex Corporation OneConnect 10Gb NIC

Here is the logs

2013-11-19T15:49:12.395Z cpu2:33376)WARNING: elxnet: elxnet_detectDumpUe:238: 0000:005:00.0: UE Detected!!

2013-11-19T15:49:12.396Z cpu2:33376)elxnet: elxnet_detectDumpUe:249: 0000:005:00.0: Forcing Link Down as Unrecoverable Error detected in chip/fw.

2013-11-19T15:49:12.396Z cpu2:33376)WARNING: elxnet: elxnet_detectDumpUe:257: 0000:005:00.0: UE lo: MPU bit set

2013-11-19T15:49:12.892Z cpu5:33377)WARNING: elxnet: elxnet_detectDumpUe:238: 0000:005:00.1: UE Detected!!

2013-11-19T15:49:12.892Z cpu5:33377)elxnet: elxnet_detectDumpUe:249: 0000:005:00.1: Forcing Link Down as Unrecoverable Error detected in chip/fw.

2013-11-19T15:49:12.892Z cpu5:33377)WARNING: elxnet: elxnet_detectDumpUe:257: 0000:005:00.1: UE lo: MPU bit set

Anyone have a similiar trouble?

Tags (2)
122 Replies
ctaylor23
Contributor
Contributor

I am experiencing very similar issues running 5.1 update 3.  I have an Emulex OCm10102-n-x running on 8 hosts.  Only on one host the adapters keep disconnecting and going offline.  I have tried all combinations of new and old firmware.  Anyone have any other ideas?

0 Kudos
jessem
Enthusiast
Enthusiast

It's been about 2 weeks no with no issues running the following....

ESXI 5.5 HP BLADE DRIVERS/FIRMWARE

EMULEX 554FLB CARD

ELXNET (DRIVER FOR 5.5 - NEEDED TO BE DOWNGRADED)

DRIVER:   10.0.725.2

FIRMWARE: 4.9.416.0

0 Kudos
rheuvink
Contributor
Contributor

We have exact the same setup. But 1 host become not responinding last night. What version of OA and VC do you use? We have 4.30 Jul 08 2014

0 Kudos
djciaro
Expert
Expert

Hi Guys,

I have a case open with HP since the end of October about the following issue: After an upgrade of platform (Blade BIOS / iLO / HP Virtual connect / ESXi to HP customized 5.1 Update 2) we had a issue of half of our uplinks dying after anything from 5 minutes to 1 hours (after booting) after many hours backwards and forwards with HP (recreate server profiles / different iLO versions, ROM, Emulex Firmware / be2net drivers etc) we narrowed issue down to Gen8 blades with Emulex CNAs (554 FLB)

The issue did not effect the mezzanine cards (also 554s) be2net driver version  10.2.293.0 Firmware 10.2.340.19 Everything was as per HP recipe for VMware (HP_Service_Pack_for_ProLiant_2014.09.0_792934_001_spp & VMware-ESXi-5.1.0-Update2-2000251-HP-5.68.30-Sep2014.iso)

The issue was that on the FLB cards the advanced mode was enabled in the Emulex BIOS. As soon as we disabled this all issues disappeared.

I wonder if anyone with these network issues has had a look at the settings in the Emulex BIOS.

To disable Advanced Mode Support through PXE Select:

1. After the BIOS initializes and you have selected your controller, the Controller Configuration screen appears. Select Advanced Mode Support from the drop-down menu. The Controller Configuration Advanced Mode Support dialog box appears.

2. From the drop-down menu, select Disabled and press .

3. Select Save and press .

4. After enabling Advanced Mode Support, the Port Selection screen appears. Select the port you want to configure and press . Continue to configure your controller.  - at this point you can exit out of the BIOS and reboot.

I'd be interested to know if this solves issues for anyone else. HP have been working with Emulex on this issue, we have tried 2 test drivers for them, unable to produce the issue in their labs (so far) they have sourced the exact same batch of blade that we have and tested... when it happened to us we had 9 of the same blades in our chassis (so 18 in total - 1 per DC)

The issue happened on all 18 blades but was solved by disabling the advanced mode. Issue never happened for the remaining Gen7 and older version of Gen8s in the chassis - on all of these Advanced mode was disabled by default on the FLBs and Mezz cards.

regards

Ciarán

VC 4.30 Oa 4.30

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!
cowellja
Contributor
Contributor

Hi Ciarán,

We saw something very similar with driver version and 10.2.298.5 (elxnet) and firmware 10.2.340.19 but on the rackmount FlexLOM - the 554FLR-SFP+. We were using elxnet rather than be2net as in ESXi 5.5 Emulex went over to the native mode driver but the symptoms sound the same. The host came online and appeared to be passing traffic until a number of VMs were on it, when vmnic0 stopped passing traffic (but the link status stayed up) followed shortly afterwards by vmnic1. Strangely, in the case we had open with VMware (where they were working with Emulex), we were told to turn advanced mode on, which sounds like the opposite of the advice you had. At the moment we're staying on an older set of drivers and firmware until we have a definitive answer.

I see Emulex have just put out a new version on http://www.emulex.com/downloads/emulex/drivers/vmware/vsphere-55/drivers/ (I appreciate you're using ESXi 5.1 but it may still be of interest) but there's no indication that the issue has been addressed in the release notes.

http://www-dl.emulex.com/support/elx/rt10.4.0/ga/Docs/final/vmware/vmware_relnotes_elx.pdf

The only thing I did notice was under ESXi 5.5 known issues point 15:

On OCe11100-series adapters if you update the driver and firmware, ESXi 5.5 hosts may report large numbers of packet loss and errors in the vmkernel logs. Throughput is not effected, but errors may fill management software logs.

Workaround

None.

However I'm not sure that it matches since out throughput was affected, to point of taking the hosts out of service. I'd be very interested to hear how your case goes.

Regards,

Jason

0 Kudos
djciaro
Expert
Expert

Hi Jason,

After today's discussion with HP I decided not to pursue the point of finding out why the advanced mode was enabled on the FLB for a particular set of Gen8 blades (although they said they will share a document that explains different scenarios for when advanced mode is disabled or enabled).

They will keep looking with Emulex but we are not going to proceed with further attempts to reproduce the issue. In our case it was definitely the advanced mode that (off) that solved the issue.

The latest SPP for Proliant servers will be release next week (planned) There is also a new version of the customized HP ESXi (30/03/2015) http://www8.hp.com/us/en/products/servers/solutions.html?compURI=1499005#tab=TAB4 which apparently will have fixes for driver bugs  ( related to the ELXNET drivers for 5.5) The firmware is still version 10.2.340.19 but HP mentioned this will also be updated.

We will start testing as soon as everything is available, and with a bit of luck plan for a migration to vSphere 6 in July. fingers crossed there are no issues with the emulex cards.

I will post if we get any further updates.

Regards

Ciarán

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!
0 Kudos
MartynThomas
Contributor
Contributor

For those still with issues, after working with HP over the last few months, I now have what appears to be a stable platform.

I'm using the following combination which were released a few weeks ago:

Driver: VMW-ESX-5.5.0-elxnet-10.4.255.13-2555693.zip

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI55-EMULEX-ELXNET-10425513&productId=35...


Firmware: 10.2.477.23

http://h20564.www2.hp.com/hpsc/swd/public/detail?sp4ts.oid=4324631&swItemId=MTX_00b06590d26c4222a0a9...


Everything appears to be stable at the moment. I've introduced the same loads which previously caused the issues I mentioned previously.


Thanks,


Martyn

0 Kudos
jessem
Enthusiast
Enthusiast

Martyn,

Has 10.4.255.13 been certified via HP.  I can't seem to find that on HP's site?

0 Kudos
MartynThomas
Contributor
Contributor

I'm not sure to be honest, those 2 links were provided to me directly by HP.

Cheers,

Martyn

0 Kudos
jessem
Enthusiast
Enthusiast

Ok, well I'll take it as they are since HP support provided those links.

0 Kudos
AK077
Contributor
Contributor

@MartynThomas --  Thanks for your post.

We recently experienced a similar issue.  In our case our BL460G8 blades abruptly disconnected both ethernet and storage networks serviced by the Emulex OneConnect chipset (NC554FLB). 


Our environment regained stability after applying the following:

esxcli software vib install -d /tmp/VMW-ESX-5.5.0-elxnet-10.4.255.13-offline_bundle-2555693.zip  ## Emulex OneConnect Network Driver v10.4.255.13

esxcli software vib install -d /tmp/VMW-ESX-5.5.0-lpfc-10.2.455.0-offline_bundle-2254453.zip  ## Emulex OneConnect FC Driver v10.2.455.0

esxcli software vib install -d /tmp/hp-esxi5.5uX-bundle-2.2-17.zip ## HP ESXi5.5 bundle

(reboot)

Then

./CP025747.scexe    ## Emulex OneConnect Firmware v10.2.477.10

(reboot)

NOTE:   The HP prescribed ESX5.5 recipie specifies the elxnet v10.2.445.0 driver.   In our experience this has not resolved the issue.   We are stable with elxnet 10.4.255.13.

We're in dialog with L3 (VMW, HDS, HP) to triage further.

Hope this helps someone.

Adam Kupsta.

0 Kudos
PaloSelfie
Contributor
Contributor

That helped me pretty much, thanks for the thread.

Br,

Anthony J

AraqueFotos

0 Kudos
AK077
Contributor
Contributor

Anthony / Anyone else experiencing this issue,

I'm trying to correlate root cause.   Are you able to share a few details about your storage config? 

- FC/iSCSI?

- Both network/storage attachment serviced by same Emulex interface?

- Array subscribed vs. allocation levels (%)

- Any array tiering policies enabled?

Thanks,

Adam.

0 Kudos
elkjop_faye
Contributor
Contributor

We are having an issue related to this, I susptect.

Our setup is:

- IBM PureFlex x240 nodes.

- FC storage

- FW: 10.4.255.25

- Driver: elxnet: 10.4.255.13 lpfc: 10.4.245.0

Both network and storage goes through the same hardware adapters.

We do over provision datastores. (But it would take quite some time to get any specific number here)

We are using array tiering on some datastores.

After upgrading to 5.5.u2 network starts to reports that it drops. (Have only upgraded one host.)

0 Kudos
edp4you
Contributor
Contributor

Dear Collegues,

up and down of 10Gb network port experienced by our customer was solved using STP (shielded) Cat.7 cables instead of UTP (unshielded) Cat.6e.

The issue was caused by cross-talking (see wikipedia here).

We have solved issue regarding up/down of ports on the switches, but we continue to have issues with vSphere 5.1 and 5.5.

We are unable to have iscsi speed more than 460MByte/sec instead of a blade host with Windows 2012 R2 with the same configuration that reach 1.6GByte/sec.

Giovanni Coa

0 Kudos
edp4you
Contributor
Contributor

Our customer experienced that cross-talking was the issue.

Cross-talking cause issue when multiple UTP (Unshielded) cables at 10Gbits cause interference each others.

Using Cat.7 STP (Shielded) solved the issue of up/down.

Many other issues are now caused by EMULEX elxnet drivers on ESXi (vSphere) 5.5 U2.

We can't go more than 460MByte/sec instead of 1.6GByte/sec of Windows 2012 R2 in same configuration.

Giovanni Coa

0 Kudos
ochmartin
Contributor
Contributor

The latest reliable combination in our environment (with vmware vDS) seems to be (for 554FLB):

# esxcli network nic get -n vmnic0

   Driver Info:

         Bus Info: 0000:04:00:0

         Driver: elxnet

         Firmware Version: 10.2.477.10

         Version: 10.2.445.0

I have no courage to test it with nexus1000v in production environment.

If i try to install vib VMW-ESX-5.5.0-elxnet-10.4.255.13-offline_bundle-2555693.zip after reboot the hypervisor ends in PSOD! Strange....

With Nexus1000v is emulex driver absolutely unusable, and when i switch driver from emulex to be2net it leads to corrupted packets.

So in one of our clusters we changed hypervisors with NIC 554FLB to 534FLB (Broadcom), they are without any issues.

We are using nic for ethernet communication, no FC, no iSCSi. The driver must support vxlans.

The suggestion is - don't use emulex, use broadcom Smiley Happy

But in this time i have no idea how to upgrade from 5.1 to 5.5...

0 Kudos
ochmartin
Contributor
Contributor

I can't agree, because we are using 10Gb twinax, no UTP, and we have a lot of problems with emulex in esxi5.5

0 Kudos
Brahmzy
Enthusiast
Enthusiast

Having a real similar issue - been using these BL460Gen8 blades w/554 Emulex LOMs for a year fine now and decided to update to the latest SPP (2015.04.0).

Now I have the 2 hosts that got upgraded drop off 1hr-3hrs after reboots - (first time had an outage with VMs on them.)

I found this thread, and disabled Advanced Mode in the Emulex BIOS and that seems to have fixed the issue, but I'm not willing to put the hosts back into production yet.

Strangely, I spot checked the other blades and they have Advanced set to 'Enabled' by default and have worked fine with the older firmware and drivers.

Does anybody know if there's a problem with disabling Advanced in the Emulex BIOS?  Is there some functionality I will be missing?

We are still on 5.1U3 and looking to move to 5.5U3 soon.

This combo has worked for a year, but is very old:

be2net driver:          4.9.288.0

be2net firmware:     10.2.340.19

This combo which the latest 2015.04.0 SPP upgraded to broke everything:

be2net driver:          10.2.477.10

be2net firmware:     10.2.453.0

Anybody else have any new findings on this awful problem?

0 Kudos
djciaro
Expert
Expert

Hi,

There is no problem in disabling the advanced mode in the Emulex BIOS. We worked extensively with HP to try and find the root of this problem.

Eventually after months they published the following customer advisory with the workaround: http://h20564.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c04608235

They claim that the lastest version of the driver resolves this issue: 10.2.477.20 (this is not available with the latest SPP and you will need to download it separately: ftp://ftp.hp.com/pub/softlib2/software1/sc-linux-fw-sys/p930408510/v105241 or https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI51-EMULEX-BE2NET-10247720&productId=28...

For some background on the cause of this issue (why some cards have advanced mode enabled by default and others don not: (feedback from HP) The Emulex firmware that relates to Advance Mode / SR-IOV being enabled or disabled: With 4.2.x.x FW and previous, customers had the ability to manually set the state of SR-IOV in the NIC BIOS. In 4.6.x.x firmware, this ability was removed and the customer could no longer toggle the state of SR-IOV manually.  This created major issues due to known compatibility issues with SR-IOV and certain OS’s.  The ability to disable SR-IOV needed to be given back to the end user in the NIC BIOS to resolve that.  This was accomplished by tying the SR-IOV state to the Advanced Mode Support State with FW 4.9.x.x and higher. So depending on the firmware version that was initially installed on the nic will affect whether Advanced Mode and SR-IOV are enabled or not.

Regards

Ciarán

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!
0 Kudos