VMware Cloud Community
Glenn7
Contributor
Contributor

ESXi 5.0 & DELL R720 Network Connectivity Loss

Ok there have been a lot of threads about trying to get the Dell R720 working with the Broadcom 5720 daughter card.  You can inject the drivers post build or you use the DELL recovery cd http://ftp.dell.com/FOLDER00609866M/1/ to build the server.

The problem I am highlighting in this post occurs when the servers are commissioned.  We put 4 DELL R720 servers in production.  They are configured using 4 ports on the 5720 etherchannelled and using vDs. Within 2-3 weeks at different times all 4 hosts experienced a network failure and caused production outages.  All Vm's were unresponsive & offline and a reboot of each physical host resolved the issues temporarily but would reoccur at a later date.  Calls to both vmware and Dell were not very productive and it took 2 months to finally work out the problem, involving a lot of time & effort on my part.  Basically specific to the DELL R720 server the below criteria must be enforced to ensure the hosts do not experience random network loss.

The above was implemented on all 4 hosts they have been stable for the last 8 weeks.

0 Kudos
79 Replies
JProos
Contributor
Contributor

Glenn,

Have you experienced any recurrence of the issue since the tg3 driver reconfiguration?  I haven't but I finally decided to upgrade my 3 R820's to the Dell 5.1 image. 

So far I've done one and I expect to do the other 2 on Sunday. 

FYI: there's a command (or 2) you have to execute on the host or the upgrade from the Dell 5.0 image to the Dell 5.1 image will fail.  The commands are:

     esxcli software vib remove –n Dell-Configuration-VIB
     esxcli software vib remove –n Dell-License-VIB

I only needed the first one as the upgrade didn't complain to me about the license-vib so I didn't bother even issuing the second command.

Here's a reference for those commands (you can also find a reference to them here on vmware's site😞

http://en.community.dell.com/techcenter/b/techcenter/archive/2012/06/28/upgrade-from-dell-customized...

Finally, I see that the tg3 driver reconfiguration survived the upgrade to 5.1.

Jason

0 Kudos
ChrisGurley
Enthusiast
Enthusiast

I did fresh rebuilds last week, including my R820. After the install, I redid the NetQueue and Interrupt Remapping disabling, and it's been stable since then. Just FYI.

Chris

0 Kudos
klima
Contributor
Contributor

Hello, i also habe problems with network adapter Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet on ESXi 5.1 on server R620

Before install drivers tg3-3.124c.v50.1-offline_bundle-841079.zip was:

~ # lspci | grep BCM

00:01:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe

00:01:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe

00:02:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe

00:02:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe

00:07:00.0 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic0]

00:07:00.1 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic1]

00:08:00.0 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic2]

00:08:00.1 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic3]

After install and reboot:

~ # lspci | grep BCM
00:01:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic4]
00:01:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic5]
00:02:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic6]
00:02:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic7]
00:07:00.0 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic0]
00:07:00.1 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic1]
00:08:00.0 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic2]
00:08:00.1 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic3]

But i have problem:

~ # ethtool -i vmnic4
Can not get control fd: No such file or directory

In vSphere Client network card BCM5720 not displayed.

Anybody have ideas how to fix this problem ? Thanks.

0 Kudos
Glenn7
Contributor
Contributor

Hi

That will be due to the ISO you used to build the OS (vmware doesn’t include the bcm5720 inbox (ie as part of the image).

You will see articles on how to build your own ISO with the driver but easier and quicker to go to the DELL website and download their

Modified ISO which will have the Broadcom driver injected into it.

0 Kudos
klima
Contributor
Contributor

I had been install ESXi from Dell's iso VMware-VMvisor-Installer-5.1.0-799733.x86_64-Dell_Customized_RecoveryCD_A00.iso

0 Kudos
Glenn7
Contributor
Contributor

Hi

I think that driver version tg3-3.124c.v50.1-offline_bundle-841079.zip is not supported yet - if you read back through this post you will see

detailed process on how to confirm if your NIC driver is on the supported HCL list for vmware

The driver version on the DELL iso should work driver: tg3 version: 3.123b.v50.1

0 Kudos
vcocaud
Contributor
Contributor

Have no problem with tg3-3.124c.v50 driver.

ESXi 5.0 Update 1 DELL Custom + updated tg3-3.124c.v50 driver + NetQueue Disabled.

0 Kudos
klima
Contributor
Contributor

/vmfs/volumes/50bc59e1-ea996480-3c77-001018791af8 # esxcli software vib install /vmfs/volumes/50bc59e1-ea996480-3c77-001018791af8/tg3-3.123b.v50.1-offline_bundle-682322.zip
Error: Unknown command or namespace software vib install /vmfs/volumes/50bc59e1-ea996480-3c77-001018791af8/tg3-3.123b.v50.1-offline_bundle-682322.zip

/vmfs/volumes/50bc59e1-ea996480-3c77-001018791af8 # esxcli software vib install -d /vmfs/volumes/50bc59e1-ea996480-3c77-001018791af8/tg3-3.123b.v50.1-offline_bundle-682322.zip
Installation Result
   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
   Reboot Required: true
   VIBs Installed: Broadcom_bootbank_net-tg3_3.123b.v50.1-1OEM.500.0.0.472560
   VIBs Removed: Broadcom_bootbank_net-tg3_3.124c.v50.1-1OEM.500.0.0.472560
   VIBs Skipped:


/vmfs/volumes/50bc59e1-ea996480-3c77-001018791af8 # reboot

~ # lspci | grep Net
00:01:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic4]
00:01:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic5]
00:02:00.0 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic6]
00:02:00.1 Network controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet [vmnic7]
00:07:00.0 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic0]
00:07:00.1 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic1]
00:08:00.0 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic2]
00:08:00.1 Network controller: Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T [vmnic3]
~ #
~ #
~ #
~ # ethtool -i vmnic7
Can not get control fd: No such file or directory

~ # esxcli system settings kernel list | grep -i netqueue
netNetqueueEnabled               Bool    Enable/Disable NetQueue support.    FALSE       FALSE    TRUE

;(

0 Kudos
Glenn7
Contributor
Contributor

Hi

What hardware are you using and what bios version? Also can you advise output of cmd

esxcli software vib list | grep Broadcom

0 Kudos
klima
Contributor
Contributor

Server Dell PowerEdge R620

BIOS Version: 1.1.2
Firmware Version: 1.06.06 (Build 15)

Operating System: VMware ESXi 5.1.0 build-799733
Operating System Version: 5.1.0 GA (build-799733) Kernel 5.1.0 (x86_64)

~ # esxcli software vib list | grep Broadcom
misc-cnic-register             1.72.1.v50.2-1OEM.500.0.0.472560   Broadcom  VMwareCertified   2012-12-02
net-bnx2                       2.2.3e.v50.1-1OEM.500.0.0.472560   Broadcom  VMwareCertified   2012-12-02
net-bnx2x                      1.74.17.v50.1-1OEM.500.0.0.472560  Broadcom  VMwareCertified   2012-12-02
net-cnic                       1.74.04.v50.1-1OEM.500.0.0.472560  Broadcom  VMwareCertified   2012-12-02
net-tg3                        3.123b.v50.1-1OEM.500.0.0.472560   Broadcom  VMwareCertified   2012-12-03
scsi-bnx2fc                    1.74.02.v50.2-1OEM.500.0.0.406165  Broadcom  VMwareCertified   2012-12-02
scsi-bnx2i                     2.74.07.v50.1-1OEM.500.0.0.472560  Broadcom  VMwareCertified   2012-12-02

Thanks for your help, Glenn.

0 Kudos
Glenn7
Contributor
Contributor

The bios needs to be at least 1.2.6 which dell prosupport advised us during troubleshooting the original issue.

0 Kudos
VMCHIS
Enthusiast
Enthusiast

I am experiencing this issue with some R620's we have.  I noticed that there is a new Dell vSphere 5.0 Update 1 A02 ISO available for download, posted on 12/26/2012.  Does anyone know if you still have to do the disable netqueue with this version?  Looks like Dell revised this ISO 3 times now.  Not sure what changed.

0 Kudos
VMCHIS
Enthusiast
Enthusiast

Just as an FYI, the tg3 driver was updated in the Dell vSphere 5.0 Update 1 A02 image.

Running ethtool -i vmnic0 shows the following

driver: tg3

version: 3.124c.v50.1

firmware-version: FFV7.2.20 bc 5720-v1.25

bus-info: 0000:01:00.0

I am not sure what Dell build I was running before, but when I ran the same command above it state my tg3 driver version was 3.123b.v50.1

I just need to know if we still need to disable the netqueue with this driver?

0 Kudos
Glenn7
Contributor
Contributor

Hi

The netqueue fix is the solution to the nics dropping off the network so if you have the bcm5720 card installed then you should def disable the feature.

0 Kudos
robincm
Contributor
Contributor

Some interesting strangeness that I noticed this week which seems consistent (did it on all three of the R720s I tried) to do with applying updates to a host via VUM.

I've found that on hosts built using the A00 Dell ESXi 5.0 update 1 ISO where I've used the

esxcfg-module -s force_netq=0,0,0,0,0,0,0,0 tg3
command to disable NetQueue, and then applied patch ESXi500-201212210-UG (KB2033752) to them, after the reboot all the tg3 (in my case BCM5719) vmnics are missing because the tg3 driver fails to load.

So now I'm using the other method to disable NetQueue, i.e.

esxcfg-advcfg -k FALSE netNetqueueEnabled

Details here if anyone's intersested: http://rcmtech.wordpress.com/2013/02/04/missing-broadcom-5719-tg3-nics-after-updates/

I've not had a chance to log this with VMware yet.

Also, I'm still testing a new version of the tg3 driver for Broadcom which they hope has fixed the NetQueue issue. My NICs were going pop after about a month so it's taking a while to discover if the issue is actually fixed or not... Does anyone get the loss of connectivity sooner than that if they re-enable NetQueue on their tg3 driver?

0 Kudos
athrunn0510
Contributor
Contributor

Hi Glenn7,

Just want to verify if this is the fix you are saying...

Broadcom 5719/5720 NICs using tg3 driver become unresponsive and stop traffic in vSphere

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203570...

Thanks! Smiley Happy

0 Kudos
dlee2654
Enthusiast
Enthusiast

just a reply to let you know that just today 3/4/2013 that kb article has been updated that lists the resolution!  Just got word back from my Dell rep that this has been resolved!  "This issue has been resolved for ESXi 5.x in version 3.129d.v50.1 of the async tg3 driver released by Broadcom."  I just verified the download is available and I will begin testing this updated driver asap!

0 Kudos
athrunn0510
Contributor
Contributor

hi dlee2654,

thanks for letting us know. Smiley Happy

0 Kudos
ChrisGurley
Enthusiast
Enthusiast

Hey folks,

I'm a bit late to the new driver release celebration, but wanted to ask if interrupt remapping still needs to be disabled with the new driver? 'Might be an entirely separate element/issue, but wanted to see if this new driver eliminates other tweaks necessary to be stable.

Thanks,

Chris

0 Kudos
Glenn7
Contributor
Contributor

Hi Chris

I haven’t tested with the new drivers but until someone confirms it I would say disable it. We did not experience any performance hit by disabling IR.

We moved away from the BCM5720’s and used Intel cards with any new DELL R720’s but have since upgraded our farm to Intel 10GB cards.

Thanks

Glenn

0 Kudos