wilsonlopes00
Contributor
Contributor

Vsphere 5.5 and Emulex OneConnect 10Gb NIC trouble

I have installed  ESXi5.5 in a server with Emulex OneConnect 10Gb NICs.

I have installed the last driver for this nic - elxnet-10.0.575.9-1OEM.550.0.0.1331820.x86_64.vib.

After some network activity of virtual machines, the interfaces go down, even the switch ports are up.

vmnic4  0000:05:00.00 elxnet      Down 0Mbps     Half   00:00:c9:e4:13:16 9000   Emulex Corporation OneConnect 10Gb NIC

vmnic5  0000:05:00.01 elxnet      Down 0Mbps     Half   00:00:c9:e4:13:18 9000   Emulex Corporation OneConnect 10Gb NIC

Here is the logs

2013-11-19T15:49:12.395Z cpu2:33376)WARNING: elxnet: elxnet_detectDumpUe:238: 0000:005:00.0: UE Detected!!

2013-11-19T15:49:12.396Z cpu2:33376)elxnet: elxnet_detectDumpUe:249: 0000:005:00.0: Forcing Link Down as Unrecoverable Error detected in chip/fw.

2013-11-19T15:49:12.396Z cpu2:33376)WARNING: elxnet: elxnet_detectDumpUe:257: 0000:005:00.0: UE lo: MPU bit set

2013-11-19T15:49:12.892Z cpu5:33377)WARNING: elxnet: elxnet_detectDumpUe:238: 0000:005:00.1: UE Detected!!

2013-11-19T15:49:12.892Z cpu5:33377)elxnet: elxnet_detectDumpUe:249: 0000:005:00.1: Forcing Link Down as Unrecoverable Error detected in chip/fw.

2013-11-19T15:49:12.892Z cpu5:33377)WARNING: elxnet: elxnet_detectDumpUe:257: 0000:005:00.1: UE lo: MPU bit set

Anyone have a similiar trouble?

Tags (2)
122 Replies
marcomcpap
Contributor
Contributor

I have the same problem with NIC Emulex here in my HS23 servers. Updated the Emulex firmware and native Emulex NIC driver but the problems continued.

I had the idea to update the legacy NIC driver from Emulex, deactivated the native driver and activated the driver legacy, did the test and the problem stopped.

Run the commands to disable the native driver and enable the legacy driver:

esxcli system module set --enabled=false --module=elxnet

esxcli system module set --enabled=true --module=be2net

reboot the host.

uzimmermann
Contributor
Contributor

I have the same problem on a number of HP BL460c G7. I have not tried this on the BL460c Gen8s, which use the same driver/firmware. My scenario right now is:

8 BL460c Gen8 running ESXi 5.1 with dvs Cisco Nexus 1000V. Trying to add 6 BL460c G7 with ESXi 5.5 to this dvs will lead to unrecoverable error in the elxnet driver. Adding the G7s to a dvs which is not Cisco based does not lead to the unrecoverable error.

I have engaged Emulex and they are currently trying to reproduce it.

0 Kudos
raffic_ncc
Enthusiast
Enthusiast

I faced the same issue on my ESX4.0 host. After updated the NIC firmware and latest driver version. Issue got fixed for me.

Mohammed Raffic VCP4,VCP5,VCAP4-DCA,VCAP5-DCA,VCP-Cloud, MCSA.MCTS,CCA http://www.vmwarearena.com
0 Kudos
uzimmermann
Contributor
Contributor

I am already running the latest version firmware (as posted by HP) and the latest driver (as posted on VMware.com). Tried actually 3 different versions of the driver.

0 Kudos
TheLinak
Contributor
Contributor

This is a known bug:

 

Cisco bugID CSCuj81943

 

X- the issue is that N1Kv is not able to set the NIC speed correctly for the interface with [ethtool –S ] due to the new driver on the Emulex card;

X- As a workaround you can either try ESXi 5.1 or try to use the older be2net driver on the Emulex card;

X- the fix for N1Kv will come out in next release, ETA is not determined.

0 Kudos
raffic_ncc
Enthusiast
Enthusiast

Thanks for the information

Mohammed Raffic VCP4,VCP5,VCAP4-DCA,VCAP5-DCA,VCP-Cloud, MCSA.MCTS,CCA http://www.vmwarearena.com
0 Kudos
SouthSideSeth
Contributor
Contributor

FWIW, I had the same problem with this NIC in an IBM HS23 blade.  The three blades in this chassis I was installing shipped with esxi 5.1 pre-installed.  The customer ordered the blades with the advanced software upgrade that enables vnics/IO virtualization (forget what they call it exactly).  I went into the UEFI on the blades and enabled multichannel switch independent mode, NIC personality.  I rebooted the blades and they came back up showing 6x 10GE + 2x 1GE NICs all connected and functioning correctly.  Unfortunately for me the customer wanted to run vmware 5.5 on their new blade center.  I did a clean install of 5.5 on the HS23s with the IBM custom vmware ISO.  Despite all the firmware and driver updates I could find, plus support calls to vmware and IBM, we were not able to get the vnics to work with vmware 5.5.  I finally gave up and disabled multichannel on the nics (set them back to physical mode).  This particular blade center had Cisco Nexus 4001i switches.  The weird thing was that even though on the network adapters tab in vmware, the links showed down as with the OP, on the network tab, I was still able to see CDP stats from a vswitch uplink port.  I confirmed this wasn't cached by disabling the internal ports on the 4Ks and verifying the CDP information went away.

BTW, for fun I used the IBM vmware ISO to downgrade one of the blades back to 5.1 and still the 10GE links would not come up.  My guess is IBM has some secret driver that they ship on a pre-installed 5.1 USB key that isn't in their custom ISO for 5.5 or 5.1. 

Going to follow this thread as I would like to see how this turns out.

Seth

0 Kudos
uzimmermann
Contributor
Contributor

FYI, we have this problem on both version of Cisco Nexus 1000v 4.2(1)SV2(2.1) and 4.2(1)SV2(2.1a). Even the latest elxnet driver (posted Dec 24 2014 on vmware.com) is not fixing this problem. Cisco Nexus is triggering it, but the actual bug is in the driver or firmware of the Emulex chip.


0 Kudos
SouthSideSeth
Contributor
Contributor

I setup another blade center today that came preloaded with the vmware 5.1 image from IBM again.  I decided to take a gamble and try vnics again.  The nics are showing connected but I'm having some other network problems that I think are caused by this NIC. 

Here are the drivers showing:

Name    PCI Device     Driver     Link Speed  Duplex  MAC Address         MTU Description        

------ -------------  ---------  ---- -----  ------  -----------------  ---- --------------------------------------------

vmnic0 0000:016:00.0  be2net     Up 10000  Full    34:40:b5:c8:f0:e8  1500 Emulex Corporation OneConnect 10Gb NIC (be3)

0 Kudos
elxaamaya
Enthusiast
Enthusiast

This issue has been addressed by N1k in the release 4.2(1)SV2(2.2) that was released in Jan 2014. It is documented in the bug report CSCuj81943. We addressed it by using a different API  to retrieve the NIC speed in ESX 5.5.



0 Kudos
Nevets01
Contributor
Contributor

We have the same problem with an Emulex 554FLB 10Gb without a Cisco Nexus 1000v.

Switching side is showing everything up and connected. VSphere client shows a physical disconnect on both adapters.

Setting the speed for both NICs through de CLI gives a connected state, but after rebooting the esxi host de VSphere client shows both NICs disconnected again. Strange thing is: when de NICs show a disconnected state the host is pingable on its management IP.

Emulex firmware v4.9.416.0

ESX 5.5.0.

Message was edited by: Nevets01 VMWare Driver Info:          Bus Info:          Driver: elxnet          Firmware Version: 4.9.416.0          Version: 10.2.298.5 After some testing i got both NICs connected (through CLI) at 10000Gb, but one NIC is showing observed IP ranges and the Other NIC isn't. When i disable the NIC WITH observed IP Ranges network connectivity is lost. I am 100% sure that it isn't the Cisco switching side config/port channel. vSwitch config is also OK (Route based on IP hash/Link status only/Notify/Failback

0 Kudos
VMSavvy
Enthusiast
Enthusiast

Hey marcomcpap - I had a similar issue and was able to fix it with your esxcli commands. It worked like a charm and after a reboot all my 16 NICs started showing up.. Thanks for the tip mate..

0 Kudos
fgw
Contributor
Contributor

same problem here!

using HP ProLiant BL460c G7 servers with embedded NC553i (10Gb 2-port FlexFabric Converged Network Adapter) as esxi hosts


migrated to vsphere v5.5 u1 some days ago. yesterday one of our esxi hosts got disconnected from the cluster. fond the following in the logs:


2014-09-01T13:17:51.499Z cpu13:32852)Uplink: 6530: enabled port 0x2000002 with mac e4:11:5b:e0:1d:d8

2014-09-01T13:17:52.499Z cpu13:32852)NetPort: 1632: disabled port 0x2000002

2014-09-01T13:17:56.169Z cpu14:33444)WARNING: elxnet: elxnet_detectDumpUe:274: 0000:002:00.1: UE Detected!!

2014-09-01T13:17:56.172Z cpu14:33444)elxnet: elxnet_detectDumpUe:285: 0000:002:00.1: Forcing Link Down as Unrecoverable Error detected in chip/fw.

2014-09-01T13:17:56.172Z cpu14:33444)WARNING: elxnet: elxnet_detectDumpUe:302: 0000:002:00.1: UE lo: MPU bit set

2014-09-01T13:17:56.172Z cpu14:33444)WARNING: elxnet: elxnet_detectDumpUe:312: 0000:002:00.1: UE hi: PMEM bit set

2014-09-01T13:17:56.499Z cpu13:32852)NetPort: 1632: disabled port 0x2000002

2014-09-01T13:17:56.499Z cpu13:32852)Uplink: 6530: enabled port 0x2000002 with mac e4:11:5b:e0:1d:d8

2014-09-01T13:17:56.532Z cpu2:33443)WARNING: elxnet: elxnet_detectDumpUe:274: 0000:002:00.0: UE Detected!!

2014-09-01T13:17:56.532Z cpu2:33443)elxnet: elxnet_detectDumpUe:285: 0000:002:00.0: Forcing Link Down as Unrecoverable Error detected in chip/fw.

2014-09-01T13:17:56.532Z cpu2:33443)WARNING: elxnet: elxnet_detectDumpUe:302: 0000:002:00.0: UE lo: MPU bit set

2014-09-01T13:17:56.532Z cpu2:33443)WARNING: elxnet: elxnet_detectDumpUe:312: 0000:002:00.0: UE hi: PMEM bit set

today the same issue on another esxi host.

opened a case with HP and also with vmware! no solution so far. as far as i was told, this was already observed at other customers. vmware support came up with the recommendation to use legacy drivers as described a few posts above:

"Run the commands to disable the native driver and enable the legacy driver:

esxcli system module set --enabled=false --module=elxnet

esxcli system module set --enabled=true --module=be2net

reboot the host."

will wait for a response from HP before i switch to legacy drivers on all esxi hosts...

has anybody other recommendations than switching to legacy drivers? will the switch to legacy drivers using the above commands be persistent across reboots?

0 Kudos
iefke
Enthusiast
Enthusiast

I got an customer who experience the same problem as fgw described. The hardware is HP G7 with the Emulex NC553i (10Gb 2-port FlexFabric Converged Network Adapter). Please let us know if you have an update on this issue.

Blog: http://www.ivobeerens.nl
0 Kudos
VMSavvy
Enthusiast
Enthusiast

Following commands worked for me.. Not on 1 but on 7 hosts so I'm sure this can work for you as well. Try these

esxcli system module set --enabled=false --module=elxnet

esxcli system module set --enabled=true --module=be2net

VMSavvy Smiley Happy

0 Kudos
Armann
Enthusiast
Enthusiast

Did not want to start a new thread since we are having very similar problems Smiley Happy


Running HP C7000 enclosure - Blade460c G8 with HP Flexfabric 10Gb 2-port 554FLB nics.

If I put a OS on the blade the nic works fine, so it's a Vmware issue.

Tried using ESXi 5.5 Update 1 and 2 using the HP custom images.

Also tried ESXi 5.5 U2 driver rollup and non rollup, always the some problem, no network.

Link is up, see attachment, tried changing to legacy driver, that didn't work.

Vmware nic drivers original.JPG

Any ideas ?

http://www.kerfisnet.is
0 Kudos
Nevets01
Contributor
Contributor

Make sure your switching side switch port speeds match with the selected speed on the NIC in VMWare.

Make sure that your switching side ports aren't SUSPENDED.

I downloaded the latest drivers, configured the network adapter on Auto negotiate, Cisco switching side ports on Auto Negotiate and made sure the ports aren't suspended and the channel configured correctly.

After rebooting the server everything worked fine.

Remark: with the latest drivers it's only possible to configure the speed on 10000Mb full duplex or Auto Negotiate (use the vSphere Client GUI)

Armann
Enthusiast
Enthusiast

Thanks, that made me think and talk to our network guy.

This is how the port was configured and no network access:


interface GigabitEthernet0/12

switchport access vlan 250

speed 1000

spanning-tree portfast


Then we changed it too this and now it works:


interface GigabitEthernet0/12

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 250

switchport mode trunk

Does the port need to be trunked to work for Vmware ?

http://www.kerfisnet.is
0 Kudos
wilber822
Enthusiast
Enthusiast

I got exactly same problem on BL460c G7 and ESXi 5.5 U1. (NC553i)

Firmware and drivers are up to date. NIC driver version is elxnet 10.2.298.5. NIC firmware version is 10.2.340.19

If your VC version is 4.01 or  later  you may see NIC SPEED is 10Gbps for all NICs. You have to change it by following article:

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/mostViewedDisplay?javax.portlet....

I have changed VC network setting to allocating correct speed.

esxcli network nic list command give correct NIC SPEED.

But issue still there.This is more like a Emulex problem.

https://www.zhengwu.org
0 Kudos