Hi all,
We had an issue with one of our ESXi5.5 hosts in our HA Cluster this weekend.
Hardware: Dell Blade M620, Broadcom BCM57810 10Gb
It threw a PSOD complaining about PCPU no heart beat and showing broadcom errors (see pic attached).
Going through the vmkernel logs it seems that one of the HBAs failed
Extract from vmkernel.log
2014-12-05T23:13:58.377Z cpu8:10592052)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver bnx2i, for vmhba36
...
2014-12-05T23:14:11.151Z cpu0:33497)<1>bnx2i::0x4109c61eab40: ####CID leaked bnx2i_tear_down_conn: sess 0x4109c75f4738 ep 0x4109dc859690 {0x5a, 0x1a}
...
2014-12-05T23:14:11.401Z cpu6:33497)bnx2i::0x4109c61eab40: bnx2i_conn_stop::vmnic3 - sess 0x4109c75f9948 conn 0x4109c75f9d20, icid 41, cmd stats={p=0,a=1,ts=1950037,tc=1950036}, ofld_conns 9
I will check whether there are new drivers available, but also including pic of current driver details....
Has anyone come across this before?
Am I right in assuming that the issue is with the physical nic?
What is strange is that in the vmkernel log file I cannot see any problems from the other 10Gb card (same model, I know it should be a different vendor) which should keep working an not failing the host (heartbeat).
Comments are appreciated.
The problem seems to be related to Broadcom drivers and I will recommend you upgrade the drivers to a newer version: https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI55-BROADCOM-BNX2-225FV558&productId=35...
Another option is upgrade the Broadcom firmware too, but you should look for these drivers on Dell support page.
I agree with Richard.
PSOD usually happen because of firmware mismatch. If you can confirm from the server vendor and get the latest firmware it will also prevent future failures.
For now, you can also go ahead and upgrade the firmware for broadcom as that's the one causing the issue. Also, I suggest you open a ticket with VMware to get a better opinion.
Hi rb51,
as others mentioned you should upgrade bnx2x driver to the latest with regards to the Dell Customized Image (from 04 Dec 2014):
VMware ESXi 5.5 Update 2 Driver Details | Dell US
the right driver version is bnx2x - 2.710.39.v55.2
with regards to actual VMware async drivers download list for BCM8710:
VMware Compatibility Guide: I/O Device Search
there are also some newest builds: bnx2x - 2.710.52.v55.2
...its up to you which one you will use but as Abhilashhb point out its good to contact VMware support first.
Below is step by step guide how to install async drivers to existing ESXi installation:
VMware KB: Installing async drivers on VMware ESXi 5.0, 5.1, and 5.5
Here is the latest firmware from Dell (7.10.18) for BCM 57810:
Broadcom NetXtreme I and II Network Device Firmware 7.10.18 Driver Details | Dell US
Message was edited by: vNEX
thank you guys for the replies so far, much appreciated...
Ticket logged with VMware support so they can be aware of the issue, which may impact other customers.
Going on hols from tomorrow PM (GMT) so not much time for tshoot/debug. Work colleagues will be monitoring host and we decided not to upgrade broadcom drivers/firmware until heard from VMware and I come back.
I hope VMware support team can provide few answers/clues about this issue.
regards,
rb51