I have installed ESXi5.5 in a server with Emulex OneConnect 10Gb NICs.
I have installed the last driver for this nic - elxnet-10.0.575.9-1OEM.550.0.0.1331820.x86_64.vib.
After some network activity of virtual machines, the interfaces go down, even the switch ports are up.
vmnic4 0000:05:00.00 elxnet Down 0Mbps Half 00:00:c9:e4:13:16 9000 Emulex Corporation OneConnect 10Gb NIC
vmnic5 0000:05:00.01 elxnet Down 0Mbps Half 00:00:c9:e4:13:18 9000 Emulex Corporation OneConnect 10Gb NIC
Here is the logs
2013-11-19T15:49:12.395Z cpu2:33376)WARNING: elxnet: elxnet_detectDumpUe:238: 0000:005:00.0: UE Detected!!
2013-11-19T15:49:12.396Z cpu2:33376)elxnet: elxnet_detectDumpUe:249: 0000:005:00.0: Forcing Link Down as Unrecoverable Error detected in chip/fw.
2013-11-19T15:49:12.396Z cpu2:33376)WARNING: elxnet: elxnet_detectDumpUe:257: 0000:005:00.0: UE lo: MPU bit set
2013-11-19T15:49:12.892Z cpu5:33377)WARNING: elxnet: elxnet_detectDumpUe:238: 0000:005:00.1: UE Detected!!
2013-11-19T15:49:12.892Z cpu5:33377)elxnet: elxnet_detectDumpUe:249: 0000:005:00.1: Forcing Link Down as Unrecoverable Error detected in chip/fw.
2013-11-19T15:49:12.892Z cpu5:33377)WARNING: elxnet: elxnet_detectDumpUe:257: 0000:005:00.1: UE lo: MPU bit set
Anyone have a similiar trouble?
So, after opening a case with HP on this, they advised against disabling Advanced mode in the Emulex BIOS. They explained they're aware that that may fix this issue, but mentioned there could be a performance impact - that they would only recommend it in very particular cases.
They also supplied me with the 10.2.477.20 VMware driver which I have applied to 3 hosts running 10.2.477.10 554FLB firmware.
I have yet to put them back into production (I'll wait a week) but they have not dropped off where before they'd disappear within 1-3 hours, consistently.
So, it appears this latest 10.2.477.20 VMware Emulex driver has fixed the issue while still allowing Advanced mode to stay enabled.
I work for a large company that purchased ~2,500 Gen 8 blades with 554FLB CNAs last October (manufactured all at once) for a worldwide deployment we are just now completing. We are using a single ESX 5.5 autodeploy image which is complaint with the HP recipe - 10.2.340.19 Emulex firmware & 10.2.298.5 driver. All hardware is the same everywhere, same total stack of firmware across all hardware in all locations everywhere, same gold image, etc.
With all of the original hardware no issues. Thousands of identical stateless ESX 5.5 OSs happily PXE booting 100% of the time. With a deployment this large we did however have some DOA hardware and had to replace about a dozen or so 554FLBs across the world. Guess what? With just about all of the replacement 554FLBs we started seeing very strange issues 95% consistent with the litany of problems described throughout this thread. I found this thread by accident after googling the error messages found in the ESX debug screen. The discussion here seems to have revolved around finding the perfect combination of drivers, settings, etc (everyone is assuming their hardware is good). My simple theory is that a bad batch of 554FLBs is floating around or perhaps the firmware on these or similar adapters isn't getting applied properly at the factory nor can a user re-apply the firmware with success. In my case so far I have had to ship known working / good 554FLBs from my lab to the field to replace the replacements. 100% of the time this has worked so far.
Now I need your help. If you are still having these issues in your shop or had them and found a root cause please send me a PM with any information you can share like HP case #s. I am working with HP to find the root cause.
Thanks
Hi domenic10
We are similar case. I worked with MartynThomas, HP and VMware to figured out which combination was stable.
I would suggest you don't wast time on that, they (HP and Emluex) has tried about 1 year but no final RCA.
We are using Cisco and HP both, feel Cisco is very stable, but HP not. I intend to replace HP by DELL.
Sorry, my post may not related to this post. Just FYI.
Emulex NICs are nothing but trouble - after fruitless calls between HP & Cisco we moved to Broadcomm for or new ESXi deployments and are happy ever since.
Hi all,
by my experience, the working combination of Emulex driver and firmware, which was tested in 5.5 is:
# esxcli network nic get -n vmnic0
Driver Info:
Driver: elxnet
Firmware Version: 10.5.65.21
Version: 10.5.65.4
this combination is described in http://vibsdepot.hp.com/hpq/recipes/HP-VMware-Recipe.pdf
and looks like good combination.
I didn't do all tests yet, but combination described above looks stable, vxlan are OK and so on.
Hey guy and gals
would anyone have a current link to the firmware version 10.2.477.23..
HP seem to have lost most of their WEB environment in their transition..
Hi,
We have got the similar problem in exsi 5.5. We have upgraded the firmware and network adapter driver. No luck.
Finally call the hardware vendor and replace the network card. It's work.
Might be helpful
Thanks.
In case anyone is still following this lengthy thread, VMware recently updated the following article listing elxnet 10.5.65.4 as the solution:
Packet drops and connectivity issues when using Emulex elxnet Driver version 10.2.298.5 or earlier on OCe10102 and OCe11102 adapters or OEM equivalents
http://kb.vmware.com/kb/2091192
The interesting thing is that, previously, the article said this was fixed with elxnet 10.2.445.0.
Hi,
We have got the similar problem in exsi 5.5. We have upgraded the firmware and network adapter driver. No luck.
Finally call the hardware vendor and replace the network card. It's work.
Might be helpful
Thanks.
Hi Guys,
Just let you guys know.
HP told me the issue was fixed by driver elxnet 10.4.255.13 but unfortunately it's not.
I slowly deployed driver elxnet 10.4.255.13 and firmware 10.2.477.10 on our environment. We didn't observed any issue till today.
It happened again. Same problem. I have deployed on 50 - 80 hosts. There are 1000+ VMs on it, now it happens again!!!! I'm tried.
I have to say again, don't use HP.
Same issue upgrading from v5.1 to v5.5U3 on BL685cG7 blades.
Fix worked as described here.
Which fix exactly do you mean?
Please: which fix do you mean ??
Hi
If you are running ESXi 5.5 / 6 use the following combination:
Driver: elxnet
Firmware Version: 10.5.65.21
Version: 10.5.65.4
If you are using 5.1 use:
driver: be2net
version: 10.5.65.4
firmware-version: 10.5.65.21
you cna check current with following command: ethtool -i vmnic0
They will work fine.
Ciarán
having the same issue here with 5.5 update 3a
server 5.5 update 3d
This message has repeated 21504 times: 0000:081:00.1: Error in Card Detected! Cannot allocate WRBs hw_error:1|fw_timeout:02016-06-27T07:54:20.533Z cpu63:33571)BC: 3423: Pool 0: Blocking due to no free buffers. nDirty = 271 nWaiters = 1
NC553i
Driver Info:
Driver: elxnet
Firmware Version: 10.7.110.31
Version: 10.7.110.13
according the hp recipe book these are the latest and greatest drivers/firmware for the NC553i
cant find a solution to this...
we have the exact same config in our other DC and this error does not exist there.
concerned if this is a major issue or not 😕
I ran into a similar issue. running:
ESXi 5.5U3b
Emulex OCe11102-NM
updated the FW to 11.1.38.57 and VIC to 11.1.145.0 and then the NIC dropped off the network but was showing as Enabled and UP but no observed IP ranges. It could vmkping itself but not other hosts vmk vmotion ports.
Banged my head on it for hours going over every setting AND Rebooting several times, uninstalling drivers etc. finally, wild shot in the dark, changed the NIC speed from 10000 to 1000 and then back and everything came back up all good.
dingo and djciaro are you guys doing the below steps assuming your server is running HP Hardware:
1. Update the firmware using the last HP SPP 2016.04 (864794_001_spp-2016.04.0-SPP2016040.2016_0317.20.iso) ?
2. Upgrade or install the ESXi using the HPE latest ESXi (VMware-ESXi-5.5.0-Update3-3568722-HPE-550.9.5.0.33-Apr2016.iso) ?
I had a similar problem and it was the "Personality" in the HPE BIOS. It defaulted the NICs to iSCSI and FCoE. I had to change them to NIC and all the links came up. HPE Support was worthless and no help at all.
We had a similar issue in our condition a couple of times also. On a couple of hosts, the firmware and driver overhaul reduced this issue - A "network stress test" consisting of copying a larger VMDK to another ESXi host from the SSH shell revealed whether the issue was remedied or not (usually the UE happened within 15 minutes). But on one server it was for sure an equipment mistake and since we had the NIC supplanted, these quit showing up. May IBM drivers help to get the solution easily because IBM have strong relation with this problem.