Highlighted
Contributor
Contributor

HP NC532i (Broadcom 57711E) network adapter from flex-10 caused a hard crash, which bnx2 driver to use?

Is anyone else having this issue? We just had 3 servers crash due to a bnx2x_panic_dump. Once the network cards crashed the ESX server had to be rebooted to come back. Even though only a few vmNICs died, the entire server became unreachable, and the VMs became unreachable, even if the vmnic wasn’t bound to the vSwitch that the VM was on.

After researching it appears that VMware supports 3 different drivers:

1. bnx2x version 1.45.20

2. bnx2x version 1.48.107.v40.2

3. bnx2x version 1.52.12.v40.3

On 6/10/2010 VMware came out with a patch for 1.45.20, but esxupdate maked it obsolete, since our version (1.52.12v40.3) was newer. Should I downgrade my driver?

Also the VMware HCL has conflicting information. According to this:

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=io&productId=1...

1.52.12.v40.3 is supported by vSphere4 Update2, and not vSphere Update1, yet the U2 release only has an update for the 1.45.20 driver.

Yet according to this:

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=io&productId=1...

1.52.12.v40.3 is supported by both vSphere4 Update2 and vSphere Update1.

Here are the details of my environment:

HP BL460G6 blade servers, with flex-10 modules.

The individual blades are using HP NC532i Dual Port 10GbE Multifunction BL-c Adapter, firmware bc 5.0.11.

The chassis OA itself is using firmware v3.0.

The Flex-10 module is using firmware v. 2.33.

Crash Dump:

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.131 cpu1:4426)VMotionRecv: 1080: 1276732954553852 😧 Estimated network bandwidth 75.588 MB/s during page-in

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.131 cpu7:4420)VMotion: 3381: 1276732954553852 😧 Received all changed pages.

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.245 cpu7:4420)Alloc: vm 4420: 12651: Regular swap file bitmap checks out.

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.246 cpu7:4420)VMotion: 3218: 1276732954553852 😧 Resume handshake successful

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.246 cpu3:4460)Swap: vm 4420: 9289: Starting prefault for the migration swap file

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.259 cpu0:4460)Swap: vm 4420: 9406: Finish swapping in migration swap file. (faulted 0 pages, pshared 0 pages). Success.

Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_stats_update:4639(vmnic1)]storm stats were not updated for 3 times
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_stats_update:4640(vmnic1)]driver assert
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:658(vmnic1)]begin crash dump -


Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:666(vmnic1)]def_c_idx(0xff5) def_u_idx(0x0) def_x_idx(0x0) def_t_idx(0x0) def_att_idx(0xc) attn_state(0x0) spq_prod_idx(0xf8)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:677(vmnic1)]fp0: rx_bd_prod(0x6fe7) rx_bd_cons(0x3e9) *rx_bd_cons_sb(0x0) rx_comp_prod(0x7059) rx_comp_cons(0x6c59) *rx_cons_sb(0x6c59)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:682(vmnic1)] rx_sge_prod(0x0) last_max_sge(0x0) fp_u_idx(0x6afb) *sb_u_idx(0x6afb)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:693(vmnic1)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:697(vmnic1)] fp_c_idx(0x0) *sb_c_idx(0x0) tx_db_prod(0x0)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[4f]=[0:deda0310] sw_bd=[0x4100b462c940]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[50]=[0:de706590] sw_bd=[0x4100b4697b80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[51]=[0:deac2810] sw_bd=[0x4100baad8e80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[52]=[0:de9ae390] sw_bd=[0x4100bda03f40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[53]=[0:de3e9a90] sw_bd=[0x4100b463ecc0]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[54]=[0:3ea48730] sw_bd=[0x4100bab19100]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[55]=[0:de5b1190] sw_bd=[0x4100bda83980]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[56]=[0:ded48410] sw_bd=[0x4100bdb06080]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[57]=[0:3e3f0d10] sw_bd=[0x4100bca0f480]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[58]=[0:de742110] sw_bd=[0x4100bda35d40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[59]=[0:de6ffc90] sw_bd=[0x4100bcab3800]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5a]=[0:de619710] sw_bd=[0x4100b4640c40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5b]=[0:de627e10] sw_bd=[0x4100bcaad440]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5c]=[0:3e455e10] sw_bd=[0x4100b462a9c0]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5d]=[0:de3a6110] sw_bd=[0x4100bdaf1d80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5e]=[0:3e37df90] sw_bd=[0x4100b470d580]

any thoughts, suggestions ?

0 Kudos
102 Replies
Highlighted
Immortal
Immortal

I would open a support ticket.

http://www.vmware.com/support/policies/howto.html

-- David -- VMware Communities Moderator
0 Kudos
Highlighted
Contributor
Contributor

yup, I've opened up tickets with both VMware and HP. I was just hoping someone else has had similar experiences. As soon as I hear back I will post the results/resolution...

0 Kudos
Highlighted
Contributor
Contributor

Hi,

I currently have the same problem on a bunch of BL495c G6 with the Broadcom 57711E. I opened a ticket at HPs VMware support and they told me that VMware is aware of that problem and are currently working on a solution. In the meantime I should downgrade my driver. The support guy mailed me the following URL

http://downloads.vmware.com/d/details/esx_esxi_40_broadcom_bnx2x_dt/ZHcqYmR3KmVidGR3 which points to version 1.48.107-1.0.4

I'm also running version 1.52.12.v40.3 at the moment, but I've no clue how to perform a downgrade and asked HP to provide a step-by-step guide. I only have 9x5 support, so I'll have to wait until tomorrow.

Lars

Highlighted
Contributor
Contributor

Hey Lars,

heres some info I pulled off of broadcoms website that you might find helpful:

*How do I know which driver version is installed in VMware ESX 4.0?

To find out the current version of the bnx2 or bnx2x driver:

  1. esxupdate query --vib-view


*How do I uninstall/reinstall the driver for VMware ESX 4.0?


To load or unload bnx2 or bnx2x drivers manually:


  1. vmkload_mod bnx2 or bnx2x (load driver)

  2. vmkload_mod bnx2 or bnx2x (unload driver)


*How do I upgrade a bnx2 or bnx2x driver for VMware ESX 4.0?


To upgrade bnx2 or bnx2x drivers:


  1. esxupdate update -bundle=&lt;filename&gt; --maintenancemode

  2. After the update is complete, reboot ESX.


*How do I remove a bnx2 or bnx2x driver for VMware ESX 4.0?


To remove bnx2 or bnx2x drivers:


  1. esxupdate query --vib-view (to query the driver version)

  2. esxupdate -b &lt;driver filename&gt; remove


I would definitely test on a stage host first though...

0 Kudos
Highlighted
Contributor
Contributor

Hi there,

i am currentyl using:

#esxupdate query --vib-view | grep bnx

rpm_vmware-esx-drivers-net-bnx2x_400.1.45.20-1.0.7.193498@x86_64 retired 2010-04-10T16:44:01.112482+02:00

cross_vmware-esx-drivers-net-bnx2x_400.1.52.12.v40.3-1.0.4.00000 installed 2010-04-11T09:16:22.490669+02:00

rpm_vmware-esx-drivers-net-bnx2x_400.1.45.20-2vmw.1.9.208167@x86_64 retired 2010-04-10T16:44:01.108305+02:00

and seeing no issue right now.

0 Kudos
Highlighted
Contributor
Contributor

Hi!

The driver downgrade worked like a charm. Hope that the whole stuff is stable now.

Thanks for your reply!

Lars

0 Kudos
Highlighted
Contributor
Contributor

So after the crash there were a bunch of timeouts, but I didnt include in the log dump above for you guys to see. I figured it was normal behavior, as the bnx2x driver did panic and die, and there would be timeouts if the driver crashed.

But according to VMware, these entries are what is causing the issue.

"On some systems under heavy networking and processor load (large number of virtual machines), some NIC drivers might randomly attempt to reset the device and fail. The VMkernel logs generate the following messages every second:"

Here is the patch that needs to be applied to fix the issue. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101745...

here is a snippet from my log files:

vmkwarning:Jun 16 15:34:24 esxserver1 vmkernel: 14:16:50:42.667 cpu8:4239)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic2:

transmit timed out

vmkwarning:Jun 16 15:34:24 esxserver1 vmkernel: 14:16:50:43.667 cpu3:4241)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic3:

transmit timed out

vmkwarning:Jun 16 15:34:24 esxserver1 vmkernel: 14:16:50:44.669 cpu3:4230)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic3:

transmit timed out

and the knowledge base snippet:

Oct 13 05:19:19 vmkernel: 0:09:22:33.216 cpu2:4390)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out

Oct 13 05:19:20 vmkernel: 0:09:22:34.218 cpu8:4395)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out

So on one of my ESX hosts that is at vSphere4 Update 2, esxupdate info -b doesnt say anything about the bulleting being installed, but esxupdate query tells me I have update2 installed. In the VMware Update Manager, it lists the bulletin as being installed.

esxserver1 ~]# esxupdate info -b ESX400-201002401-BG

Unknown bulletin ESX400-201002401-BG

root@esxserver1 ~# esxupdate query

-


Bulletin ID--


-
Installed- -
Summary
--


hp-classic-mgmt-solution-840.21.2217 2010-05-14T15:11:52 HP SNMP Agents for ESX 4.0

BCM-bnx2x-1.52.12.v40.3 2010-05-14T15:15:23 bnx2x: net driver for VMware ESX

ESX400-Update02 2010-06-16T20:04:19 VMware ESX 4.0 Complete Update 2

Hope this helps someone. It looks like the issue is resolved.

0 Kudos
Highlighted
Contributor
Contributor

hey ITOSITU

I recommend that you update your ESX servers asap if they're not patched. We were running fine for a few weeks, and this just came out of nowhere. this seems to happen when there is a lot of activity, and we're running a very high VM--&gt;ESX server ratio. (40-60 VMs per ESX host)

0 Kudos
Highlighted
Enthusiast
Enthusiast

Not sure if this is related, but wanted to mention it. I have been having some issues doing an ESXi 4.0 U1 install on some BL460 G6 servers and the install would not see the NIC. HP sent me an advisory that mentioned there is a known but rare issue with the NIC firmware for the NC532i. I was using NIC firmware BootLoader code of 1.5.7 and they had me downgrade to previous version of 1.4.8. This version can be found on the HP FW CD 8.60. They said that this firmware will support the newer driver, but I haven't gotten there yet. Maybe this can help with the issues you all are seeing. Also, there is now new OA firmware (ver 3.10) and VC firmware (ver 3.01).

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
Highlighted
Contributor
Contributor

Hi all,

I just want to inform you that everything is stable after the driver downgrade. No PSODs so far.

Lars

0 Kudos
Highlighted
Contributor
Contributor

We are having the sames issues on ESXi with BL490 G6's. Does 4.1 fix these problems? Or should we roll back the bnx2 driver?

0 Kudos
Highlighted
Contributor
Contributor

I assume your doing active/active flex-10:

BTW a really good solution for Vpshere 4 virtual connect with Flex-10 can be found here http://jeffreywolfanger.com its 2 c7000 with vsphere 4....It goes over the whole configuration:

"You need three things to be set:

The 1.52 driver from VMware, the bootcode on your blades has to be updated and your flex-10 modules have to be upgraded to 2.32 or higher....I am running 3 something. Without those upgrades you can run into issues...

You dont need to worry about beacon probing eitheryou can read more about the driver solution here:

http://www.virtualtroll.com/?p=368

The solution is the following three action points:

Verify that the firmware on HP Virtual Connect was running 2.30 as minimum. This setup was running with 2.32 (newest version)

Verify that the NIC driver version was Broadcom NetXtreme II Ethernet Network Controller driver 1.52.12.v40.3 (minimum) for ESX/ESXi 4.0. This was different from what the VMware HCL stated.

Verify that the NC532i/m bootcode version 5.0.11 (minimum). The bootcode on the NC532 was NOT up-to-date on each blade.

"

Thanks,

Jeffrey Wolfanger

http://jeffreywolfanger.com

0 Kudos
Highlighted
Contributor
Contributor

I would avoid the Broadcom 1.5.2 driver at all costs. There is definitely a bug with that driver that causes the NICs to "panic". We had this same exact issue with Brodcomm 57710 NICs running in IBM 3650 M2 hosts. We ran into a bug with the 1.44 driver (pre Update 2) and Jumbo frames, so we upgraded to 1.52. After upgraded we had hosts disconnecting from vCenter and saw the NIC panic in the vmkernel log. Another issue we had was with NetQueue. By default the 1.5.2 driver has NetQueue enabled, but set to just 1 queue. We increased the number of queues to 8 and immediately on reboot one of the 2 NICs would panic. We had to scale it back down to 4 for both NICs to work. The 1.4.8 driver has NetQueue enabled and automatically creates queues based on what it thinks you need...for us it's using 4 queues.

We currenlty have a CritSit with IBM started to replicate our environment and issue. We are also working with our VMware reps to find other companies that had this same issue.

0 Kudos
Highlighted
Contributor
Contributor

We had exactly the same problem with HP ProLiant BL495c G6 Server (newest Firmware, and newst Broadcom Firmware). We quickly migrated to ESX 4.1 which so far sovled the problem.

There's a new Driver now:

driver: bnx2x

version: 1.54.1.v41.1-1vmw

firmware-version: BC:5.2.7 PHY:baa0:0105

I hope the issue will not raise up again!

Cheers Michael

0 Kudos
Highlighted
Contributor
Contributor

Hey all,

I have been fighting this problem as well and have opened up a support case with both HP and VMware, both of which seem to not be interested in finding a solution and have both pointed me back over the fence.

We have a c3000 chassis with the latest OA and VC code. It has two FLEX-10 interconnects. The blades are bl490c blades with both onboard and mezz FLEX-10 interfaces (NC532i/NC532m). We have used the Smart Update CD v9.1 from HP to update the firmware on the FLEX interfaces to v2.2.6 and are running ESXi 4.1 with driver version 1.54.

Unfortunately whenever we build our SQL VM and try to connect to it the FLEX-10 connections stop working throughout the entire chassis and the only way to resolve this is to power cycle the ENTIRE chassis.

ESXi captured errors when it happens and the logs show all FLEX-10 instances being disconnected. We sent this to VMware and their suggestion to us has been to downgrade to ESXi 4.0 u2 and use an experimental driver for the FLEX-10 interfaces. Unfortunately we can't do this because we are supposed to be testing View 4.5 beta and our licenses for the systems and infrastructure components are W2K8 R2 64-bit which are only supported under ESXi 4.1.

So in essence we now have lot's of chassis and blades that can't be used because of the driver/firmware incompatibility issue that seems fairly prevalent based on the number of community posts that address it. I'm curious if anyone else here has tried this using ESXi 4.1 and what your results have been.

V/R

Aaron

0 Kudos
Highlighted
Contributor
Contributor

Why are you using flex 2.2.6 you should be on 3 imo or 2.3 or higher....You have to be 2.3 or higher or fail.

http://jeffreywolfanger.com/three-requirements-for-hp-flex-10-and-virtual-connect-with-vmware-vspher... that shows you drivers etc....

0 Kudos
Highlighted
Contributor
Contributor

I'm not using Virtual Connect 2.2.6, that is at 3.01. Version 2.2.6 is the Broadcom Linux Upgrade Utility version on HP's website which updated the Boot code/PXE firmware to the latest levels curently available on the HP Smart Update CD v9.1.

In essence I am at the latest levels of firmware for the NC532i/NC532m interfaces, I'm at the latest Virtual Connect firmware, and I'm using the latest VMWare drivers available.

So far the only response from VMware is to downgrade to 4.0 u2 until a new driver comes out for 4.1. Downgrading defeats the purpose in my case as 4.0 does not support View 4.5 beta which we are trying to test to resolve other Common Access Card issues. Smiley Sad

I asked VMware what version of firmware they tested the 1.54 driver against, but was not given a response other than to contact HP to see if there was a level of firmware that worked better with their driver thatn the most recent level. Seems odd that a version of hardware firmware would be written to work with an individual OS driver, and not the other way around. 😛

0 Kudos
Highlighted
Enthusiast
Enthusiast

I have been working on a similar system and was working towards getting everything the latest levels, ESXi 4.1, OA 3.10, VirtualConnect 3.01, and NC532i at the latest FW. I had the hosts at ESXi 4.1 and ready to do the remaining any day. Well, the whole platform went offline. The hosts were still online (except 1 which PSODed) and VMs were running, but nothing was talking on the network. Once I took down some of the VMs and some hosts cleanly, other hosts that I had not touched at this point were suddenly back on the network. The futher interesting point was that when one of the hosts was booting back up and it got to loading the BNX2 driver it locked up and another host suddenly stopped responding again. End result was that we upgraded all the FW on the blades, upgraded the OA FW to 3.10 and upgraded the VirtualConnect FW to 3.01 and then restarted everything. So far it has been running fine and I hope this is the last time I have to deal with this.

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
Highlighted
Enthusiast
Enthusiast

Well, I jinxed myself. Hard crash occured this morning. Speaking with HP Support and then VMware Support (both at a Severity 1 level) the problem all centers around DCC and SmartLink in Flex-10. Here is what I was running in my environment:

- HP BL460 g6 blades: BIOS 2010.03.30, iLO 1.82, NIC firmware 2.2.6 w/ bootcode 5.2.7

- c7000: OA FW 3.10

- VirtualConnect: FW 3.01

- ESXi 4.1 using bnx2 driver 1.54.1.v41.1-1vmw.0.0260247

- 4-1GB uplinks from each VirtualConnect module connected to 2 upstream switches in an LACP trunk (one module connected to one swtich)

- VMware management network and Vmotion each on their own dedicated vswitches

- VM networks on 2 different Virtual Distributed Switches (each passing multiple VLANS)

- VirtualConnect configured with networks on each "side" of the chassis (left side & right side) for redundancy

HP gave me the following article (which is not generally available yet - but don't get me started on that one):

                • Insert Begin *************************************************************************<!--[if !mso]>

v\:*

o\:*

w\:*

.shape

</style>

<![endif]><![if gte mso 9]><![endif]><![if gte mso 9]><![endif]><![if gte mso 10]>

/* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

*+SUPPORT COMMUNICATION -

CUSTOMER ADVISORY+*

Document ID: c02476622

Version: 1

*Advisory: VMware ESX/ESXi 4.1 - Broadcom bnx2x VMware ESX

Driver Version 1.54 Does Not Function With Virtual Connect Device Control

Channel (DCC) and SmartLink Capability for 10 Gb Broadcom Adapters in VMware

ESX/ESXi 4.1*

NOTICE: The information in this document, including

products and software versions, is current as of the Release Date. This

document is subject to change without notice.

Release Date: 2010-08-13

Last Updated: 2010-08-13

-


DESCRIPTION

The Broadcom bnx2x VMware ESX Driver Version 1.54 does not function with

HP Virtual Connect Device Control Channel (DCC) and SmartLink features on

ProLiant and Integrity server blades configured with the NC532m or the NC532i

adapter running firmware version 2.2.6. After installing or upgrading VMware

ESX/ESXi 4.1 the following functionality is either not installed or is lost:

  1. New installation - DCC and
    SmartLink functionality is unavailable in an HP Virtual Connect
    environment with the NC532m or NC532i Network Adapters after installing
    VMware ESX/ESXi 4.1.

  2. Upgrade installation - If the
    bnx2x Asynchronous Driver Update CD version 1.52 was previously installed
    on a VMware ESX/ESXi 4.0 host, DCC/SmartLink capabilities will be lost
    after upgrading to VMware ESX/ESXi 4.1, which will overwrite the bnx2x
    driver version 1.52 with version 1.54 that is included with the base
    VMware ESX/ESXi 4.1operating system.

  3. Network failover - ProLiant
    and Integrity server blades hosting VMware ESX/ESXi 4.1 may lose network
    failover capabilities that use the VMware ESX NIC teaming failover policy
    (vSwitch setting) "Link Status only."

SCOPE

Any ProLiant and Integrity server blade with Virtual Connect Version 2.30

(or later) and configured with the NC532m or NC532i adapter firmware version

2.2.6. after installing VMware ESX/ESXi 4.1 with the Broadcom bnx2x VMware ESX

Driver Versions 1.54.

RESOLUTION

As a workaround, to allow network failover capabilities, use VMware Beacon

Probing to determine proper VM NIC link status as follows:

Reconfigure the VMware ESX/ESXi 4.1 NIC teaming failover policies to

"Beacon Probing." This modification will remove the dependency on

SmartLink to toggle the VM NICs failing link status mapped to a FlexNIC.

There is no workaround that supports the Virtual Connect DCC and SmartLink

capabilities with VMware ESX/ESXi 4.1 and the Broadcom bnx2x VMware ESX Driver

Version 1.54.

This advisory will be updated when an updated driver is released to

support DCC and SmartLink capabilities in Virtual Connect on ProLiant server

blades.

                • Insert End *************************************************************************

VMware support reminded me that Beacon Probing requires 3 NICs per vswitch. Well, that would mean changing the infrastructure design so that is out. If SmartLink was wanted, choice is to rollback to ESX 4.0x and rollback the firmware on the NICs, but all VMs would have to have their 4.1 VMware tools uninstalled and the 4.0x versions installed (reboots in most cases). Or stay without SmarLink and if one side degraded that was running traffic, until all links go down would it actually failover.

Other choice is to do some redesign of the uplink connectivity, but I will have to ponder that over the weekend.

If anyone else has some insight or thoughts on this, please jump in. Turns out that VMware will be writing a KB on this based on what I have presented to them.

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos