VMware Cloud Community
iambrucelee
Contributor
Contributor

HP NC532i (Broadcom 57711E) network adapter from flex-10 caused a hard crash, which bnx2 driver to use?

Is anyone else having this issue? We just had 3 servers crash due to a bnx2x_panic_dump. Once the network cards crashed the ESX server had to be rebooted to come back. Even though only a few vmNICs died, the entire server became unreachable, and the VMs became unreachable, even if the vmnic wasn’t bound to the vSwitch that the VM was on.

After researching it appears that VMware supports 3 different drivers:

1. bnx2x version 1.45.20

2. bnx2x version 1.48.107.v40.2

3. bnx2x version 1.52.12.v40.3

On 6/10/2010 VMware came out with a patch for 1.45.20, but esxupdate maked it obsolete, since our version (1.52.12v40.3) was newer. Should I downgrade my driver?

Also the VMware HCL has conflicting information. According to this:

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=io&productId=1...

1.52.12.v40.3 is supported by vSphere4 Update2, and not vSphere Update1, yet the U2 release only has an update for the 1.45.20 driver.

Yet according to this:

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=io&productId=1...

1.52.12.v40.3 is supported by both vSphere4 Update2 and vSphere Update1.

Here are the details of my environment:

HP BL460G6 blade servers, with flex-10 modules.

The individual blades are using HP NC532i Dual Port 10GbE Multifunction BL-c Adapter, firmware bc 5.0.11.

The chassis OA itself is using firmware v3.0.

The Flex-10 module is using firmware v. 2.33.

Crash Dump:

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.131 cpu1:4426)VMotionRecv: 1080: 1276732954553852 😧 Estimated network bandwidth 75.588 MB/s during page-in

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.131 cpu7:4420)VMotion: 3381: 1276732954553852 😧 Received all changed pages.

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.245 cpu7:4420)Alloc: vm 4420: 12651: Regular swap file bitmap checks out.

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.246 cpu7:4420)VMotion: 3218: 1276732954553852 😧 Resume handshake successful

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.246 cpu3:4460)Swap: vm 4420: 9289: Starting prefault for the migration swap file

Jun 16 17:03:54 esx-2-6 vmkernel: 0:01:03:09.259 cpu0:4460)Swap: vm 4420: 9406: Finish swapping in migration swap file. (faulted 0 pages, pshared 0 pages). Success.

Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_stats_update:4639(vmnic1)]storm stats were not updated for 3 times
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_stats_update:4640(vmnic1)]driver assert
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:658(vmnic1)]begin crash dump -


Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:666(vmnic1)]def_c_idx(0xff5) def_u_idx(0x0) def_x_idx(0x0) def_t_idx(0x0) def_att_idx(0xc) attn_state(0x0) spq_prod_idx(0xf8)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:677(vmnic1)]fp0: rx_bd_prod(0x6fe7) rx_bd_cons(0x3e9) *rx_bd_cons_sb(0x0) rx_comp_prod(0x7059) rx_comp_cons(0x6c59) *rx_cons_sb(0x6c59)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:682(vmnic1)] rx_sge_prod(0x0) last_max_sge(0x0) fp_u_idx(0x6afb) *sb_u_idx(0x6afb)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:693(vmnic1)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:697(vmnic1)] fp_c_idx(0x0) *sb_c_idx(0x0) tx_db_prod(0x0)
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[4f]=[0:deda0310] sw_bd=[0x4100b462c940]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[50]=[0:de706590] sw_bd=[0x4100b4697b80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[51]=[0:deac2810] sw_bd=[0x4100baad8e80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[52]=[0:de9ae390] sw_bd=[0x4100bda03f40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[53]=[0:de3e9a90] sw_bd=[0x4100b463ecc0]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[54]=[0:3ea48730] sw_bd=[0x4100bab19100]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[55]=[0:de5b1190] sw_bd=[0x4100bda83980]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[56]=[0:ded48410] sw_bd=[0x4100bdb06080]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[57]=[0:3e3f0d10] sw_bd=[0x4100bca0f480]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.229 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[58]=[0:de742110] sw_bd=[0x4100bda35d40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[59]=[0:de6ffc90] sw_bd=[0x4100bcab3800]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5a]=[0:de619710] sw_bd=[0x4100b4640c40]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5b]=[0:de627e10] sw_bd=[0x4100bcaad440]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5c]=[0:3e455e10] sw_bd=[0x4100b462a9c0]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5d]=[0:de3a6110] sw_bd=[0x4100bdaf1d80]
Jun 16 17:09:42 esx-2-6 vmkernel: 0:01:08:57.230 cpu1:4280)<3>[bnx2x_panic_dump:712(vmnic1)]fp0: rx_bd[5e]=[0:3e37df90] sw_bd=[0x4100b470d580]

any thoughts, suggestions ?

0 Kudos
102 Replies
bebman
Enthusiast
Enthusiast

HP Customer Advisory has gone public:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c02476622&jumpi...

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
fishface
Contributor
Contributor

We are having the exact same issues with vSphere 4.1 on a brand new HP Blade Chassis system with Flex 10 virtual connect ethernet modules (rev 3.01).

Its been running fine for 8 weeks, and just this week decided to die. We now give a PSOD wjhich seems to relate to when any load is added (yet we cant easily replicate the issue). We have 4 vSphere bl460c G6 Blades and 3 Non-Virtual Blades in the encosure running Win 2008. Of the physical OS blades - 1 or all may lose network connectivity. Its a bit random, - so it seems the entire FLEX connection that dies. iLo was lost on the first outage we had, but has not seemed to die in the subsequent issues since upgrading the OA firmware to 3.11 on the Flex 10.

We've replaced the Chassis OA controller and Midplane, but it's made no difference.

Our BNX Driver version is 1.54.

Did anyone else with these issues find a satisfactory fix?

0 Kudos
aaron757
Contributor
Contributor

We still have an open ticket with VMware, I suggest you open one as

well. We are able to reproduce by simply connecting to a SQL server

through the network connection as opposed to connecting locally. When

we do it all ESXi 4.1 blades in the chassis lose Flex-10 connectivity.

Curiously, the ESXi 4.0u2 blades that also reside in the same chassis

continue operating fine.

On Tue, Sep 14, 2010 at 9:44 PM, fishface

0 Kudos
bebman
Enthusiast
Enthusiast

I ended up with the following configuration:

Onboard Admin FW = 3.11

Virtual Connect FW = 3.01

iLO FW = 2.00

BL460G6 BIOS = 2010.05.20 (A) (24 Jun 2010)

ESXi version = 4.0 Update2 - clean install

The ESXi version means that we are using the 1.52 Broadcom NIC driver.

Items I am curious about:

Any of you that are still having problems, did you do a clean install of 4.1 or an upgrade?

Is anyone runnning 4.1 with VC FW of 2.32 or 2.33? And what version of OA?

Items to note:

I am not confident of HP at this point with VC Flex-10 and their VMware design and support decisions. Even their latest VC Cookbook says that Beacon Probing can be used with 2 NICs. Also, the cookbook makes no reference to 4.1 which was released at least 2 months before the current version of the cookbook.

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
sayahdo
Contributor
Contributor

Re Fishface's environment; details are

Onboard Admin FW = 3.11

Virtual Connect FW = 3.01

iLO 2 Firmware = 1.82

BL460G6 BIOS = 2010.05.20 (A) (24 Jun 2010)

ESXi version = 4.1 - clean install

1.54 Broadcom NIC driver.

When the ESX server has PSoD, approx 4 minutes latter all other ESX server are disconnected and the associated VM's

1 of the 3 Dedicated Windows Server is also disconnected (Windows Server 2008 x64)

We have a case logged with VMware and HP. What the bet the point fingers at each other.

Cheers

Mike

#############

Sorry, i was logged into vmware with my colleagues account and posted this uunder his name. I've removed and updated my post

#############

VMware support are now telling us that we need to install this patch to make the HP BL460c G6 with Intel 56XX proc supported

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101746...

They think it will remove the Exception 6 PSoD

Its an ESiX 4.0 patch for ESXi 4.1 ????

Has anyone done this?

I've imported to vCetner 4.1 update manager but its telling me i'm compliant........ grrrrrr getting VMware support to call me back

Cheers

Mike

Message was edited by: sayahdo

0 Kudos
vigen
Contributor
Contributor

.

0 Kudos
bebman
Enthusiast
Enthusiast

In my environment we are running is 55xx procs, so this patch is not of concern for me, but I also found that HP has released an update BIOS for the 56xx processors that has to due with memory errors. You can read that here: http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=37...

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
sayahdo
Contributor
Contributor

Hi,

As noted in my post, (a couple above)

I was logged into vmware with my colleagues account access the VMware Service Request and made a post under his name. I've removed and updated my post.

VMware support has come back to me. During a WebEx session they told me that we didn't need the patch

we ware running VMware ESXi 4.1.0 GA, which included the patch

Will check out the BIOS firmware

Cheers

Mike

0 Kudos
sayahdo
Contributor
Contributor

So HP Level 2 Support point the finger at VMware.

VMware support pointed the finger at HP

Wicked

Anyhow we have an inside HP blade specialist that has siad the followning config will work

Onboard Admin FW = 3.11

Virtual Connect FW = 3.01

iLO 2 Firmware = 1.82

BL460G6 BIOS = 2010.08.16 (9 Sep 2010) NEW

NC532i = 2.2.4 +LOWER+

ESXi version = 4.1 - clean install

bnx2x-1.48

BUT.....

we are getting an error trying to get the driver into ESX4.1. Telling us very politely it is obsolete

Anyone injectted the 1.48 or 1.52 driver to ESXi 4.1??

Cheers

Mike

0 Kudos
bebman
Enthusiast
Enthusiast

And today, I find this:

Again HP is just not getting it. Not surprising after reading this:

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
robert_s
Contributor
Contributor

We have the same problem since 4 weeks.

ESXi4.1 installaed on HP BL460cG6 with VC.

Server crashes to PSOD everytime in a virtual Windows sombody tries some MS SQL work like configuration wizzard or SQL Management Studio for import/export Data to a MS-SQL Server running on a VM.

The Problem is reported to VMware and HP. VMware has first told us that this is a new BUG and an unknown problem. We have done a SR at 19th August. The first VMware and HP forum entries are from begin of August.

We have in the same blade enclosure ESX4.0 server - here we have no problem with the same VMs and same actions.

Yesterday we have started to load the 1.45 and 1.48 bnx2x driver from ESX4.0U2 to some test ESXi4.1 servers and the first tests were fine - no PSOD or crash.

We have to do some more tests with the MS SQL tools which has been used to crash a ESXi4.1 with the 1.54 bnx2x driver.

This can be a first workaround until VMware gets a new working Broadcom driver for the 57711 chips and HP virtual Connect.

We can do our project work which has to be interrupted since the server crashes everytime we have tried to configure our application database.

0 Kudos
sayahdo
Contributor
Contributor

VMware support have officially told us that Broadcom have told them not to downgrading the driver or firmware with ESX 4.1

didn't help that ESXi 4.1 GA wont allow you to load the downgraded driver. even the vmware tech couldn't do it!

This is a massive problem and has been escalated within VMware Broadcom and HP

I've heard from a few people the the 1.6 driver is due out soon as well as a new release f 4.1. any day now.

We are expecting a call from VMware escalation Point of contact soon

Cheers

Mike

0 Kudos
sayahdo
Contributor
Contributor

Hey Robert,

What version are you running for your;

O/A

VC

If you use ESX 4.0 it states that it has only been tested with; driver v1.52.12.v40.3, firmware v2.2.6, and VC version 2.33.

Cheers

0 Kudos
stefanjansson
Contributor
Contributor

Hi all ,

any news or progress in this problem ? We are about to upgrade to vSphere 4.1 and are running VC and HP BL490 G6 blades ,but won´t do that until I have heard that this has been fixed. I have intalled vSphere 4.1 in our test environment ,and haven´t seen this problem there ,but we are only running a couple of test servers in that environment,and they are pretty much idle all the time.... We are not running the latest VC firmware ( we are still on 2.31 ) but that was one of the things we would upgrade during this upgrade to vSphere 4.1. Any new input on this matter is appreciated

reagrds Stefan

0 Kudos
sayahdo
Contributor
Contributor

We have been running the following configuration for nearly 2 weeks stable.........

BL460c G6 Blade BIOS – 2010.08.16 (9 Sep 2010)

Flex 10 NIC NC532i Driver – 1.54 (inbox)

Flex 10 NIC NC532i Firmware – 2.2.6

VC Flex 10 Eht Firmware –3.01

OA Firmware – 3.10

Smart Link – TURNED OFF

Beacon Probing - on

These are the recommendations from VMware and HP Support

There are 2 changes from the original deployment.......

Blade Bios has been updated

SMART link has been turned off

It would seem that the combination of these 2 has fixed the problem. Touch wood!

Cheers

Mike

0 Kudos
stefanjansson
Contributor
Contributor

Hi Mike ,

thanks for a fast response.

You are saying Smart Link - TURNED OFF ,are you referring to the "uncheck" box for Smart Link in VC ? And then you have changed Network failover method in vSwitch from "link status only" to "Beacon Probing" ?

I also hope that these changes is a workaround that helps. I would have preferred to run at "Link status only" but that obvious isn´t working so....

regards Stefan

0 Kudos
sayahdo
Contributor
Contributor

Hi,

Yes this is a work around.

Yes, you must not use Smart Link. It must be turned off/disabled/unchecked

See this following HP advisory for more info. While it states that Smart link doesn't work, It doesn't state that it must be turned off/disabled/unchecked.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02476622

And Yes, when disabling smart link you need to change the vSwitch fail over detection from link status to Beacon Probing

We found that the the combination of the Bios and the Smart Link removed our PSoD. And i'm not going to revert either change to see which one it was Smiley Happy

There is new driver beeing worked on at the moment, 1.6. Its not due out for about another 5-6 weeks, so i've been told.

This should remedy the problem.

There are other work arounds

Use qLogic Flex 10 Mezz cards instead of the onboards. Bit of an expensive work around! Unless its green fields

HP came back to me with and alternative to Beacon Probing.

If your VC Flex 10 Eth module had 2 10 GbE uplinks to 2 Switches, 4 uplinks in total, you could create a Channel across each of the the 2 connection for diversity. see below

In our case we only had 1 uplink from each VC, 2 in total, so HP suggested this config using 1 Gb SFP's as a work around

We asked HP for qLogic cards and 2 more VC's at no cost to use as the equipment is not fit for purpose :). They said no. And stated that if we deviated from the HP Advisory, we would fall out of support. Bit humbug if you ask me, they just didn't want to stump up with new kit....

So........We have decided to stay with Beacon Probing at this stage. NO smart link, turned off/disabled/unchecked

Hope this helps you all.

Cheers

Mike

0 Kudos
stefanjansson
Contributor
Contributor

Hi Mike ,

Thanks for the detailed input on this ,really appreciated.

Cheers

Stefan

0 Kudos
NAz0GuL
Contributor
Contributor

Hi:

We have BL680c with HP NC364m Quad Port 1GbE BL-c Adapter. I'm not sure whether this will has impact on our ESX 4.1?

Thank you!

0 Kudos
KFM
Enthusiast
Enthusiast

As far as I know, the problems only stem from using the NC532i Flex-10 adapter. As the NC364m is NOT a Flex-10 device, I doubt it exhibits the same problems.

0 Kudos