VMware Cloud Community
iambrucelee
Contributor
Contributor

HP NC532i (Broadcom 57711E) network adapter from flex-10 caused a hard crash, which bnx2 driver to use?

Is anyone else having this issue? We just had 3 servers crash due to a bnx2x_panic_dump. Once the network cards crashed the ESX server had to be rebooted to come back. Even though only a few vmNICs died, the entire server became unreachable, and the VMs became unreachable, even if the vmnic wasn’t bound to the vSwitch that the VM was on.

After researching it appears that VMware supports 3 different drivers:

1. bnx2x version 1.45.20

2. bnx2x version 1.48.107.v40.2

3. bnx2x version 1.52.12.v40.3

On 6/10/2010 VMware came out with a patch for 1.45.20, but esxupdate maked it obsolete, since our version (1.52.12v40.3) was newer. Should I downgrade my driver?

Also the VMware HCL has conflicting information. According to this:

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=io&productId=1...

1.52.12.v40.3 is supported by vSphere4 Update2, and not vSphere Update1, yet the U2 release only has an update for the 1.45.20 driver.

Yet according to this:

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=io&productId=1...

1.52.12.v40.3 is supported by both vSphere4 Update2 and vSphere Update1.

Here are the details of my environment:

HP BL460G6 blade servers, with flex-10 modules.

The individual blades are using HP NC532i Dual Port 10GbE Multifunction BL-c Adapter, firmware bc 5.0.11.

The chassis OA itself is using firmware v3.0.

The Flex-10 module is using firmware v. 2.33.

Crash Dump:

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.131 cpu1:4426)VMotionRecv: 1080: 1276732954553852 😧 Estimated network bandwidth 75.588 MB/s during page-in

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.131 cpu7:4420)VMotion: 3381: 1276732954553852 😧 Received all changed pages.

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.245 cpu7:4420)Alloc: vm 4420: 12651: Regular swap file bitmap checks out.

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.246 cpu7:4420)VMotion: 3218: 1276732954553852 😧 Resume handshake successful

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.246 cpu3:4460)Swap: vm 4420: 9289: Starting prefault for the migration swap file

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.259 cpu0:4460)Swap: vm 4420: 9406: Finish swapping in migration swap file. (faulted 0 pages, pshared 0 pages). Success.

Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_stats_update:4639(vmnic1)]storm stats were not updated for 3 times
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_stats_update:4640(vmnic1)]driver assert
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:658(vmnic1)]begin crash dump -


Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:666(vmnic1)]def_c_idx(0xff5) def_u_idx(0x0) def_x_idx(0x0) def_t_idx(0x0) def_att_idx(0xc) attn_state(0x0) spq_prod_idx(0xf8)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:677(vmnic1)]fp0: rx_bd_prod(0x6fe7) rx_bd_cons(0x3e9) *rx_bd_cons_sb(0x0) rx_comp_prod(0x7059) rx_comp_cons(0x6c59) *rx_cons_sb(0x6c59)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:682(vmnic1)] rx_sge_prod(0x0) last_max_sge(0x0) fp_u_idx(0x6afb) *sb_u_idx(0x6afb)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:693(vmnic1)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:697(vmnic1)] fp_c_idx(0x0) *sb_c_idx(0x0) tx_db_prod(0x0)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[4f]=[0:deda0310] sw_bd=[0x4100b462c940]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[50]=[0:de706590] sw_bd=[0x4100b4697b80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[51]=[0:deac2810] sw_bd=[0x4100baad8e80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[52]=[0:de9ae390] sw_bd=[0x4100bda03f40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[53]=[0:de3e9a90] sw_bd=[0x4100b463ecc0]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[54]=[0:3ea48730] sw_bd=[0x4100bab19100]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[55]=[0:de5b1190] sw_bd=[0x4100bda83980]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[56]=[0:ded48410] sw_bd=[0x4100bdb06080]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[57]=[0:3e3f0d10] sw_bd=[0x4100bca0f480]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[58]=[0:de742110] sw_bd=[0x4100bda35d40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[59]=[0:de6ffc90] sw_bd=[0x4100bcab3800]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5a]=[0:de619710] sw_bd=[0x4100b4640c40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5b]=[0:de627e10] sw_bd=[0x4100bcaad440]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5c]=[0:3e455e10] sw_bd=[0x4100b462a9c0]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5d]=[0:de3a6110] sw_bd=[0x4100bdaf1d80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5e]=[0:3e37df90] sw_bd=[0x4100b470d580]

any thoughts, suggestions ?

Reply
0 Kudos
102 Replies
chrisfmss
Enthusiast
Enthusiast

(Note: The driver has not been certified to support iSCSI, and VMware does not provide support in iSCSI application unitl iSCSI certification is complete).

http://downloads.vmware.com/d/details/esx41_broadcom_netxtreme_dt/ZHcqYnRlZHBiZCVodw==

Reply
0 Kudos
ViFXStu
Contributor
Contributor

The 4.0 driver got added to the VMware site within the last hour I think:

http://downloads.vmware.com/d/details/esx4_broadcom_bcm57710_dt/ZHcqYnRldGRiZCVodw==

Message was edited by: ViFXStu - changed link to direct link to driver

Reply
0 Kudos
vmmoz
Contributor
Contributor

Ich bin bis 19.11.2010 nicht im Büro. Ich werde Ihre Nachricht nach meiner Rückkehr beantworten.

In dringenden Fällen wenden Sie sich bitte an unsere Hotline call-sued@acp.at (0316-4603-15999).

Reply
0 Kudos
Mackopes
Enthusiast
Enthusiast

Yep,

I believe this is the bugfix for the PSODs on 4.0. The 1.60 drivers is still pending:

Here is the direct download:

http://downloads.vmware.com/d/details/esx4_broadcom_bcm57710_dt/ZHcqYnRldGRiZCVodw==

Reply
0 Kudos
ViFXStu
Contributor
Contributor

So are you saying this doesnt enable SmartLink/DCC? If so its the same as the driver that ship with 4.0 (1.48.x), I can't see why they would release another driver that doesnt support these features?

Reply
0 Kudos
Mackopes
Enthusiast
Enthusiast

Actually, this should have DCC/Smartlink as the 1.52 driver before on 4.0 did work with this. It simply is a bugfix for the PSOD.

So for ESX 4.0, use the new 1.52 driver and you will have DCC/Smartlnk and no PSOD.

For ESX 4.1 use the new 1.60 driver and you will also have DCC/Smartlink and no PSOD.

At least in theory... We will see in the next few weeks as we all test it.

Aaron

Reply
0 Kudos
ViFXStu
Contributor
Contributor

Great news at last (if it works...)

What can we expect from 1.60 for 4.0?

Reply
0 Kudos
Rabie
Contributor
Contributor

I will download and install om my test inviroment.

I just wish VMware would provide some changelogs, there are non on the drivers so you have no idea what it's supposed to fix.

Another usefull bit of info would be when a bug was introduced so you know which versions are affected...

Reply
0 Kudos
Rabie
Contributor
Contributor

Hi,

Sorry my ISO mounting util has gone on the fritz, the driver has been updated.

I have asked my HP representative for a change log on the 1.52 update.

He has however mentioned that they are still waiting on 1.60 to be released for 4.0 soon.

Regards

Rabie

Reply
0 Kudos
Rabie
Contributor
Contributor

Hi,

Seems to have been alittle bit of confusion, there will be no 1.60 driver for ESXi 4.0, 1.52.12.v40.8 is the fxed driver which should stop the PSODs and corrupt datagrams.

So far it has been testing fine on our dev cluster.

Regards

Rabie

Reply
0 Kudos
stefanjansson
Contributor
Contributor

Jag har semester,åter 16 aug

I´m on vacation ,will be back on December 13

mvh /regards

// Stefan

Reply
0 Kudos
musicnieto
Contributor
Contributor

Just wanted to give everyone an FYI with some pain I went through with the 1.60 driver. Make sure you thoroughly test before going live with this driver. In my environment we have the HP BL460c G6's with the broadcom nics and we have the Flex 10 modules along with the 1000v.

The environment in question is not prod yet because we were being held up by the driver issues. After testing 1.60 it appears that we were getting consistent network drop outs from the host with the 1.60 driver whenever we created any type of traffic (vmotion, continous ping) going to and from the host with the 1.60 driver. It appears that the driver may have some type of bug.

My assumption on the bug may have to do with some type of firmware combination between the Flex 10's, the nic card firmware, and the OA firmware, and how we have our shareduplinks configured.

We ruled out the Nexus 1000v because we were experiencing this issue when not using the 1000v.

I have a similiar environment in another datacenter and it has been running fine for months (knock on wood) In that environment we are using the Flex 10's with Firmware 2.32, Broadcom driver 1.54 v1 & firmware 5.2.7, OA 2.60 and the 1000v 4.0 (4) sv1 (3a).

VMware has been very helpful. It appears that the Express patch that was released on Nov 29th which includes a revised 1.54 driver fixes the PSOD issues and also fixed my issue of network drop outs. I have installed the driver on 4 hosts (2 inside the 1000v and 2 outside the 1000v).

I have been doing vMotions all day now and it appears to be stable.

I will continue to test for about a week to see if i see any issues.

My original PSOD issues were being caused by doing Netapp SnapManager for SQL work on SQL VM's.

I am going to have my SQL DBA run some SQL jobs on a few VM's as a test to make sure I don't experience the PSOD issues again.

I am also going to test the SmartLink issue as i was told by VMware that this was corrected with the revised 1.54 broadcom driver.

Reply
0 Kudos
slogan8r
Contributor
Contributor

I recently ran into this problem starting last week where randomly, after months of being fine, the network connectivity started dropping. The problem seemed to come from DRS migrations and backups.

We are running 6 x HP BL460c G6's with ESX 4.1 and had the 1.54 driver for the broadcom nics. Originally we were seeing the crashes about every 24 hours so we updated to the 1.60 drivers on 3/6 hosts as of 12/6/2010. Since then (2 days now) we have not seen a failure on these hosts (knock on wood), however the other hosts that were not updated have crashed. As of today all the hosts are now on 1.60 and I will report back in a couple days to let you all know how its working out.

as an FYI we are on OA 3.11

Thanks

Reply
0 Kudos
sayahdo
Contributor
Contributor

Thank you for your email, I am working out of the office until Friday 10th Dec with limited access to email and phone. If this is urgent please contact Axon directly on 0800 806090

Regards

Mike

Reply
0 Kudos
bebman
Enthusiast
Enthusiast

Question I have for everyone/anyone working this issue....

Are we supposed to use the 1.60 driver or the 1.54 patch?

Covered in this KB: VMware ESXi 4.1 Patch ESXi410-201011401-BG: Updates Firmware (1029398) http://kb.vmware.com/kb/1029398

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
Reply
0 Kudos
slogan8r
Contributor
Contributor

I should have clarified before, but I spoke to VMware support on the phone for about 3.5 hours the other day before the upgrade (I talked to 2 systems eng's and 2 network eng's) and what I did is what they suggested.

Long story short, they could not come up with any reason why this was happening and all of them said to just upgrade to v1.60 (they never mentioned a patch to v1.54) I find it odd that any of them did not know about this issue but they were very good with helping me out.

Reply
0 Kudos
Mackopes
Enthusiast
Enthusiast

We are using the 1.60 driver for ESXi 4.1 and the 'new' 1.52.12.v40.8 driver for ESX 4.0 U2.

Both seem to be working fine. There is very little load on the ESXi boxes yet, but we have an 8-node bl460c G6 cluster running ESX 4.0 U2 with the 1.52.12.v40.8 driver running 222 virtual machines and it has been running flawlessly for 3 weeks now.

Reply
0 Kudos
musicnieto
Contributor
Contributor

Hey Slogan8r & Mackopes,

Would you be able to tell me what firmware levels you are at for the Virtual Connect (are they Flex 10?), OA, and Nic Cards? Any chance you guys are using a 1000v? Also, what does your Virtual Connect Domain look like? Are you stacking chassis'? How many shared up link sets so you have and what do they consist of?

The only reason why I ask is because I am trying to find similiarities and differences between your and mine environment. What worries me is how you said your environment was working fine and then out of nowhere you started having issues.

I have an environment that has been stable for 5 months and it houses so many production systems. I can't afford for that to go down.

In terms of the 1.60 driver, it just didn't work for me. The 1.54 v2 driver seems to be working but..................I have tested taking down one of my shared uplinks to the Virtual Connect Domain and it seems to always cause 1 host to lose connectivity to the network and its always a different host. I still have to do a little more testing but i have tried it twice and both times i lose a host.

I will keep everyone posted.

Reply
0 Kudos
Mackopes
Enthusiast
Enthusiast

@musicnieto:

No problem!

OA Firmware: 3.11

VC Firmware 3.10

bnx2x Firmware: 5.2.7

bnx2x Driver: 1.52.12.v40.8 (ESX 4.0 Classic)

iLO Firmware: 2.10

BL460c G6 Firm: 5/19/2010

We are NOT using 1000v. We ARE using Distributed Virtual Switch

We do have Flex-10.

We have 3 enclosures but do not stack. We use Virtual Connect Enterprise Manager to manage the VC Domains using VCEM pre-defined MAC ranges

We use "Tunnel VLAN Tag" mode for our VC Domains

Uplink configuration:

3 x 10gbit uplinks PER Flex Module

1 x 10gbit uplink is used for a Shared Uplink Set which contains 2 networks (Service Console, and VMotion)

2 x10 gbit uplinks are trunked and enable VLAN Tunneling (we have a LOT of VLANS)

Smartlink enabled on all

Flex NIC Configuration:

1 nic is used for SC at 1gbit

1 nic is used for Vmot at 3gbit

1 nic is used for VM Data networks at 5.9gbit (VLANs are then configured at the vDS level)

1 nics is set at 100mbit and connected to LOOPBACK and unused. (This is so HP SIM doesn't complain)

Reply
0 Kudos
slogan8r
Contributor
Contributor

musicnieto,

VC Flex10's we are using v2.33

OA v3.11

We are NOT using a 1000v

We are not stacking Chassis, just a c7000 with 6 blades. We also are running a c7000 with 8 x BL460c G1's, WITHOUT any Flex10's, and we haven't had a single problem with this environment. Thats what let me to believe it has to do with the combination of Flex10's and Broadcom NA.

So far though we havent had any issues after upgrading, except for an unknown hardware failure on one host last night!! talk about bad luck :{

Now we are in the process of upgrading everything to the latest versions and playing the waiting game.

Reply
0 Kudos