VMware Cloud Community
beezleinc
Enthusiast
Enthusiast
Jump to solution

bnxtnet fails to load on 3 of 4 identical servers

I have four brand new identical Dell Poweredge R730's with BCM57406 10G nic adapters (on the 6.7U1 HCL)

Model : BCM57406

Device Type : Network

Brand Name : DELL

Number of Ports: 2

DID : 16d2

SVID : 14e4

SSID : 4060

VID : 14e4

One of the four servers will load bnxtnet driver and activate the nic just fine.  The other three will not and I am stumped.  I have checked any and all bios/nic settings. All firmware is identical, PCI slots are identical, esxi 6.7U1 is loaded identical.... and yet I cannot get three of them past this error.

vmkernel.log from server that works....

2019-01-31T12:19:13.436Z cpu1:2097664)Loading module bnxtnet ...

2019-01-31T12:19:13.437Z cpu1:2097664)Elf: 2101: module bnxtnet has license BSD

2019-01-31T12:19:13.441Z cpu1:2097664)Device: 192: Registered driver 'bnxtnet' from 22

2019-01-31T12:19:13.441Z cpu1:2097664)Mod: 4962: Initialization of bnxtnet succeeded with module ID 22.

2019-01-31T12:19:13.441Z cpu1:2097664)bnxtnet loaded successfully.

2019-01-31T12:19:13.442Z cpu6:2097620)bnxtnet: bnxtnet_initialize_devname:61: [0000:06:00.0 : 0x4309fd3bfe10] PCI device 16d2:14e4:4060:14e4 detected

2019-01-31T12:19:13.442Z cpu6:2097620)bnxtnet: bnxtnet_dev_probe:1275: [0000:06:00.0 : 0x4309fd3bfe10] Starting Cumulus device probe

2019-01-31T12:19:13.442Z cpu6:2097620)DMA: 679: DMA Engine 'cumulus-0000:06:00.0' created using mapper 'DMANull'.

2019-01-31T12:19:13.442Z cpu6:2097620)DMA: 679: DMA Engine 'cumulus-co-0000:06:00.0' created using mapper 'DMANull'.

2019-01-31T12:19:13.442Z cpu6:2097620)VMK_PCI: 914: device 0000:06:00.0 pciBar 0 bus_addr 0x91c20000 size 0x10000

2019-01-31T12:19:13.442Z cpu6:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:06:00.0 : 0x4309fd3bfe10] mapped pci bar 0 at vaddr  0x450196a40000

2019-01-31T12:19:13.442Z cpu6:2097620)VMK_PCI: 914: device 0000:06:00.0 pciBar 2 bus_addr 0x91c30000 size 0x10000

2019-01-31T12:19:13.442Z cpu6:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:06:00.0 : 0x4309fd3bfe10] mapped pci bar 2 at vaddr  0x450196a60000

2019-01-31T12:19:13.442Z cpu6:2097620)VMK_PCI: 914: device 0000:06:00.0 pciBar 4 bus_addr 0x91dc2000 size 0x2000

2019-01-31T12:19:13.442Z cpu6:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:06:00.0 : 0x4309fd3bfe10] mapped pci bar 4 at vaddr  0x450196468000

2019-01-31T12:19:13.443Z cpu6:2097620)bnxtnet: dev_init_device_info:1113: [0000:06:00.0 : 0x4309fd3bfe10] PHY is AutoGrEEEn capable

2019-01-31T12:19:13.479Z cpu6:2097620)WARNING: bnxtnet: bnxtnet_alloc_mem_probe:933: [0000:06:00.0 : 0x4309fd3bfe10] Disable VXLAN/Geneve RX filter due to firmware bug. Refer to VMware Compatibilit

2019-01-31T12:19:13.479Z cpu6:2097620)bnxtnet: bnxtnet_alloc_intr_resources:899: [0000:06:00.0 : 0x4309fd3bfe10] The intr type set to MSIX

2019-01-31T12:19:13.479Z cpu6:2097620)VMK_PCI: 764: device 0000:06:00.0 allocated 16 MSIX interrupts

2019-01-31T12:19:13.479Z cpu6:2097620)bnxtnet: bnxtnet_dev_probe:1352: [0000:06:00.0 : 0x4309fd3bfe10] Interrupt mode: MSIX, max fastpaths: 16 max roce irqs: 0

2019-01-31T12:19:13.479Z cpu6:2097620)bnxtnet: bnxtnet_dev_probe:1358: [0000:06:00.0 : 0x4309fd3bfe10] Ending successfully cumulus device probe

2019-01-31T12:19:13.479Z cpu6:2097620)bnxtnet: bnxtnet_attach_device:235: [0000:06:00.0 : 0x4309fd3bfe10] Driver successfully attached cumulus device (0x2d544305d9cc7d46) with Chip ID=0x16D2 Rev/Me

2019-01-31T12:19:13.480Z cpu6:2097620)Device: 327: Found driver bnxtnet for device 0x2d544305d9cc7d46

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097666 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097667 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097668 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097669 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097670 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097671 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097672 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097673 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097674 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097675 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097676 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097677 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097678 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097679 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097680 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)CpuSched: 697: user latency of 2097681 netpoll-backup 0 changed by 2097620 vmkdevmgr -6

2019-01-31T12:19:13.480Z cpu6:2097620)bnxtnet: bnxtnet_start_device:389: [0000:06:00.0 : 0x4309fd3bfe10] Driver successfully started cumulus device (0x2d544305d9cc7d46)

2019-01-31T12:19:13.480Z cpu6:2097620)Device: 1466: Registered device: 0x4305d9cc0070 pci#s00000005.00#0 com.vmware.uplink (parent=0x2d544305d9cc7d46)

2019-01-31T12:19:13.480Z cpu6:2097620)bnxtnet: bnxtnet_scan_device:559: [0000:06:00.0 : 0x4309fd3bfe10] Successfully registered uplink device

vmkernel.log from other three servers that don't work....

2019-01-31T12:18:56.545Z cpu4:2097664)Loading module bnxtnet ...

2019-01-31T12:18:56.546Z cpu4:2097664)Elf: 2101: module bnxtnet has license BSD

2019-01-31T12:18:56.550Z cpu4:2097664)Device: 192: Registered driver 'bnxtnet' from 22

2019-01-31T12:18:56.550Z cpu4:2097664)Mod: 4962: Initialization of bnxtnet succeeded with module ID 22.

2019-01-31T12:18:56.550Z cpu4:2097664)bnxtnet loaded successfully.

2019-01-31T12:18:56.551Z cpu7:2097620)bnxtnet: bnxtnet_initialize_devname:61: [0000:05:00.0 : 0x4309fd3bfe10] PCI device 16d2:14e4:4060:14e4 detected

2019-01-31T12:18:56.552Z cpu7:2097620)bnxtnet: bnxtnet_dev_probe:1275: [0000:05:00.0 : 0x4309fd3bfe10] Starting Cumulus device probe

2019-01-31T12:18:56.552Z cpu7:2097620)DMA: 679: DMA Engine 'cumulus-0000:05:00.0' created using mapper 'DMANull'.

2019-01-31T12:18:56.552Z cpu7:2097620)DMA: 679: DMA Engine 'cumulus-co-0000:05:00.0' created using mapper 'DMANull'.

2019-01-31T12:18:56.552Z cpu7:2097620)VMK_PCI: 914: device 0000:05:00.0 pciBar 0 bus_addr 0x91c20000 size 0x10000

2019-01-31T12:18:56.552Z cpu7:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:05:00.0 : 0x4309fd3bfe10] mapped pci bar 0 at vaddr  0x450196540000

2019-01-31T12:18:56.552Z cpu7:2097620)VMK_PCI: 914: device 0000:05:00.0 pciBar 2 bus_addr 0x91c30000 size 0x10000

2019-01-31T12:18:56.552Z cpu7:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:05:00.0 : 0x4309fd3bfe10] mapped pci bar 2 at vaddr  0x450196560000

2019-01-31T12:18:56.552Z cpu7:2097620)VMK_PCI: 914: device 0000:05:00.0 pciBar 4 bus_addr 0x91c42000 size 0x2000

2019-01-31T12:18:56.552Z cpu7:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:05:00.0 : 0x4309fd3bfe10] mapped pci bar 4 at vaddr  0x450196468000

2019-01-31T12:18:56.552Z cpu7:2097620)bnxtnet: dev_init_device_info:1113: [0000:05:00.0 : 0x4309fd3bfe10] PHY is AutoGrEEEn capable

2019-01-31T12:18:58.068Z cpu7:2097620)WARNING: bnxtnet: hwrm_send_msg:168: [0000:05:00.0 : 0x4309fd3bfe10] HWRM cmd resp_len timeout, cmd_type 0x11(HWRM_FUNC_RESET) seq 5

2019-01-31T12:18:59.583Z cpu7:2097620)WARNING: bnxtnet: hwrm_send_msg:168: [0000:05:00.0 : 0x4309fd3bfe10] HWRM cmd resp_len timeout, cmd_type 0x11(HWRM_FUNC_RESET) seq 6

2019-01-31T12:18:59.583Z cpu7:2097620)DMA: 724: DMA Engine 'cumulus-0000:05:00.0' destroyed.

2019-01-31T12:18:59.583Z cpu7:2097620)DMA: 724: DMA Engine 'cumulus-co-0000:05:00.0' destroyed.

2019-01-31T12:18:59.583Z cpu7:2097620)WARNING: bnxtnet: bnxtnet_attach_device:208: [0000:05:00.0 : 0x4309fd3bfe10] failed to find cumulus device (status: Failure)

2019-01-31T12:18:59.583Z cpu7:2097620)Device: 2628: Module 22 did not claim device 0x1bd34305d9cc7d46.

2019-01-31T12:18:59.584Z cpu7:2097620)bnxtnet: bnxtnet_initialize_devname:61: [0000:05:00.1 : 0x4309fd3bfe10] PCI device 16d2:14e4:4060:14e4 detected

2019-01-31T12:18:59.584Z cpu7:2097620)bnxtnet: bnxtnet_dev_probe:1275: [0000:05:00.1 : 0x4309fd3bfe10] Starting Cumulus device probe

2019-01-31T12:18:59.585Z cpu7:2097620)DMA: 679: DMA Engine 'cumulus-0000:05:00.1' created using mapper 'DMANull'.

2019-01-31T12:18:59.585Z cpu7:2097620)DMA: 679: DMA Engine 'cumulus-co-0000:05:00.1' created using mapper 'DMANull'.

2019-01-31T12:18:59.585Z cpu7:2097620)VMK_PCI: 914: device 0000:05:00.1 pciBar 0 bus_addr 0x91c00000 size 0x10000

2019-01-31T12:18:59.585Z cpu7:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:05:00.1 : 0x4309fd3bfe10] mapped pci bar 0 at vaddr  0x450196500000

2019-01-31T12:18:59.585Z cpu7:2097620)VMK_PCI: 914: device 0000:05:00.1 pciBar 2 bus_addr 0x91c10000 size 0x10000

2019-01-31T12:18:59.585Z cpu7:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:05:00.1 : 0x4309fd3bfe10] mapped pci bar 2 at vaddr  0x450196520000

2019-01-31T12:18:59.585Z cpu7:2097620)VMK_PCI: 914: device 0000:05:00.1 pciBar 4 bus_addr 0x91c40000 size 0x2000

2019-01-31T12:18:59.585Z cpu7:2097620)bnxtnet: bnxtnet_map_pci_mem:784: [0000:05:00.1 : 0x4309fd3bfe10] mapped pci bar 4 at vaddr  0x45019469c000

2019-01-31T12:19:00.090Z cpu7:2097620)WARNING: bnxtnet: hwrm_send_msg:168: [0000:05:00.1 : 0x4309fd3bfe10] HWRM cmd resp_len timeout, cmd_type 0x0(HWRM_VER_GET) seq 0

2019-01-31T12:19:00.090Z cpu7:2097620)DMA: 724: DMA Engine 'cumulus-0000:05:00.1' destroyed.

2019-01-31T12:19:00.090Z cpu7:2097620)DMA: 724: DMA Engine 'cumulus-co-0000:05:00.1' destroyed.

2019-01-31T12:19:00.090Z cpu7:2097620)WARNING: bnxtnet: bnxtnet_attach_device:208: [0000:05:00.1 : 0x4309fd3bfe10] failed to find cumulus device (status: Failure)

2019-01-31T12:19:00.090Z cpu7:2097620)Device: 2628: Module 22 did not claim device 0x602e4305d9cc7eef.

The server with the working nic is actually working with the older driver

bnxtnet                        20.6.101.7-11vmw.670.0.0.8169922      VMW     VMwareCertified   2019-01-16

bnxtroce                       20.6.101.0-20vmw.670.1.28.10302608    VMW     VMwareCertified   2019-01-16

But I have tried the older and the newest version on the other three

bnxtnet                        212.0.119.0-1OEM.670.0.0.8169922      BCM                    VMwareCertified   2019-01-31

bnxtroce                       212.0.114.0-1OEM.670.0.0.8169922      BCM                    VMwareCertified   2019-01-31

I have swapped nics between the servers and the results are the same... the server with the working nic works with any of the nics and the other three servers won't so the physical nic cards are fine.

I don't know if this is a vmware or Dell issue.

Any ideas/thoughts on possible issues or other things to try?  Next step is to swap the Dell PCI riser and see if maybe somehow that might be an issue.

1 Solution

Accepted Solutions
beezleinc
Enthusiast
Enthusiast
Jump to solution

I've tried multiple versions of esxi and the bnxtnet driver.  All fail to recognize in esxi.

I swapped the Broadcom 57406 with an Intel X540 and it was recognized and worked perfectly in all of the servers.

Bottom line,  the BCM57406 card is on the VMware HCL but it clearly does not work correctly in the PowerEdge R730 and I don't have time to troubleshoot further.

I hope this helps someone else avoid a 6+ hour Dell tech support call.

*** FOLLOW UP ***

Dell escalation team has confirmed that the BCM57406 has issues with Linux SLI and esxi.   They have agreed to swap mine for Intel X550-T2 cards.

View solution in original post

3 Replies
HassanAlKak88
Expert
Expert
Jump to solution

Hello,

Try to remove the new version of driver installed on the broken servers then reinstall the old version working following the below:

  1. List VIB installed: esxcli software vib list
  2. Remove driver from ESXi SSH:  esxcli  software vib remove --vibname=name
  3. List again to check the driver was successful removed: esxcli software vib list
  4. Install the new driver after upload it to a Temp directory on ESXi: esxcli software vib install -v {VIBFILE} or esxcli software vib install -d {OFFLINE_BUNDLE} (depend on the driver's package used).
  5. List again to check the driver's version status: esxcli software vib list
  6. Reboot the server and check the connectivity.

Please consider marking this answer "CORRECT" or "Helpful" if you think your question have been answered correctly.

Cheers,

VCIX6-NV|VCP-NV|VCP-DC|

@KakHassan

linkedin.com/in/hassanalkak


If my reply was helpful, I kindly ask you to like it and mark it as a solution

Regards,
Hassan Alkak
0 Kudos
beezleinc
Enthusiast
Enthusiast
Jump to solution

I've tried multiple versions of esxi and the bnxtnet driver.  All fail to recognize in esxi.

I swapped the Broadcom 57406 with an Intel X540 and it was recognized and worked perfectly in all of the servers.

Bottom line,  the BCM57406 card is on the VMware HCL but it clearly does not work correctly in the PowerEdge R730 and I don't have time to troubleshoot further.

I hope this helps someone else avoid a 6+ hour Dell tech support call.

*** FOLLOW UP ***

Dell escalation team has confirmed that the BCM57406 has issues with Linux SLI and esxi.   They have agreed to swap mine for Intel X550-T2 cards.

arunat
Contributor
Contributor
Jump to solution

beezleinc - I got a question about the below statement you posted.

>Dell escalation team has confirmed that the BCM57406 has issues with Linux SLI and esxi.   They have agreed to swap mine for Intel X550-T2 cards.

I also see similar command timeouts in my PowerEdge servers using Broadcom NICs. My understanding is, these command timeouts would also be logged when the NIC goes unresponsive. How did the Dell team figure out that the issue was with Linux SLI (what is SLI btw?) and not the NICs?

Thanks

0 Kudos