VMware Cloud Community
Box293
Enthusiast
Enthusiast
Jump to solution

Broadcom 57711 + ESX/ESXi 4.1 + Jumbo Frames

Having some issues getting Broadcom 57711 NIC working on ESX/ESXi 4.1 with MTU 9000.

On ESX/ESXi 4.0 U2 I have no problems getting and MTU of 9000 working, I can push our EQL PS6010 SAN pretty hard and get about 550,000 KBps with IO Meter running inside 4 x VM's.

I have read documentation and have found that only an MTU of 1500 is supported for the 57711 NIC's on ESX/ESXi 4.1. This has somethig to do with the fact that these NICs have hardware iSCSI offloading, there are additional iSCSI adapters that appear under the storage adapters section on 4.1 hosts.

I have it all properly configured with 1:1 binding of the iSCSI nics to the iSCSI VMK ports using an MTU of 1500. When configured this was the max I can get out of the SAN is about 280,000 KBps. If I try the same process using an MTU of 9000 the VM's / host seem to stop responding.

While I understand that there is a limitation of 1500 when using the hardware iSCSI NICs I am unable to get the software iSCSI to work with an MTU of 9000 on ESX/ESXi 4.1. If I try software iSCSI using an MTU of 9000 the VM's / host seem to stop responding.

Is this also a limitation or am I missing something.

Is there anyone else out there experiencing the same problems as me?

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
0 Kudos
60 Replies
RobFisher
Contributor
Contributor
Jump to solution

So, I think maybe this issue isn't completely fixed.  I'm seeing some really slow performance when I do 64KB Sequential Read Tests, and lots of lag.  When I get above 32k, my confguration falls apart.  Is anyone else seeing this issue still?  I'm using SQLIO to test with.

My setup:

ESXi 4.1 hosts:

Dell R810s

Broadcom 57711 10Gb nics (4 ports per server) 1:1 nic to vmk, Jumbo Frames across the board.

driver: bnx2x

version: 1.54.1.v41.1-2vmw (it looks like you're running something newer? 1.60+?  I only see this for ESX 4.0)
firmware-version: BC:6.0.35 PHY:0aa0:0406
bus-info: 0000:11:00.0

Dell EqualLogic Mem Module is installed

VMGuests:

Paravirtualized SCSI Controller

Storage:

Dell EqualLogic PS6010XV Firmware 5.0.4

Network:

Brocade TurboIron 24x Switches - Jumbo Frames, Flow Control, Egress-buffer-threshold Max enabled

Test Results:

Average I/O Size KB: 64KB

Averate IOPS: 401.8

Average Lagency ms: 14.4ms

Average I/O Rate MB/sec: 25.1MB/sec

Average Queue Depth: 2

Average I/O Size KB: 32KB

Averate IOPS: 19,266.7

Average Lagency ms: 0.30ms

Average I/O Rate MB/sec: 602.1MB/sec

Average Queue Depth: 2

http://communities.vmware.com/servlet/JiveServlet/download/1720977-58678/bad-perf.jpg

0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

Hi Rob,

It sounds and looks like the same problem.


Updated drivers is the answer to your problem.

VMware ESX/ESXi 4.1 Driver CD for Broadcom NetXtreme II Ethernet Network Controllers

http://downloads.vmware.com/d/details/esx41_broadcom_netextremeii_dt/ZHcqYnRlaHRiZHRAag==

There are notes in this thread that give you step by step instructions on how to install them from vMA.

Some other things to consider:

  • You can confirm your switches are correctly configured for jumbo frames by doing an ESX 4.0 install (and using relevant drivers) and then doing the same tests. This just rules out ESXi 4.1 as the source of the problem
  • Make sure you are within the networking maximums

Hope this gets your problem sorted.

VCP3 & VCP4 32846 VSP4 VTSP4
0 Kudos
RobFisher
Contributor
Contributor
Jump to solution

Ah!  That was the problem.  I was running 1.54 instead of 1.60.50.v41.2.  I thought that Update Manager would have installed the newer one, but for me, it didn't.  I had to manually install the patch using the Offline .zip import method into Update Manager.  I guess with the latest 4.1 update, there is a bug with the /scratch drive, it disappears after a while, and the fix is to restart your ESXi host (I'm using embedded), then try to scan and apply updates.  (Hope they fix that one soon too).

Anyway, so I've updated to 1.60 on the bnx2x driver.  I found in 4.0, it looks like there is a 1.62 driver, I wonder if/when it will come to 4.1?

My results, while better than before, are still not what I expect.  It seems there is a 127.1KB cap on I/O size, and for my 64KB sequential read test, I'm only getting around 350MB.  My 32KB test still gets closer to 600MB.  I was expecting the 64KB test to get to around 700MB.

HPCMPO-bnx2x-driverupdate-second.png

0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

Glad to hear you've got it sorted.

VCP3 & VCP4 32846 VSP4 VTSP4
0 Kudos
RobFisher
Contributor
Contributor
Jump to solution

I think there is still a bit of an issue.. but it's a lot better.. and maybe even usable now. Smiley Happy  The 1.62 driver for ESX 4.0 was released not too long ago.  I hope the 1.62 driver for ESX/ESXi 4.1 is released soon.  I wonder if it will help more?

Thanks!

Rob

0 Kudos
enderox
Contributor
Contributor
Jump to solution

Hi there,

One very very interesting thread.

v1.62.15 drivers for ESXi 4.1.0 update1 are now available.

Anyone had any success using these with Jumbo Frames and ESXi iSCSI s/w initiator ?

We've lost about 3 days (& evenings) with this latency issue.

  initially seeing latency up to 16600ms with Jumbo Frames enabled.

  latency down to the hundreds with Jumbo Frames disabled.

Now below 70ms with new bnx2x driver, with Jumbo Frames disabled. System now usable but obviously not performance optimised.

Still unusable with Jumbo Frames enabled.

I really didn't think we were using bleeding edge equipment, but the performance woes have caused a lot of heartache.

I suspect we would have spec'd Intel 10GbE cards if we'd seen this thread earlier or Dell had made us aware of the issues.

0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

I haven't tried the new drivers yet, might wait a while :smileylaugh:

What is the hardware you are using?

How many NICs in total in the ESXi HOST?

Are you using distributed vSwitches?

VCP3 & VCP4 32846 VSP4 VTSP4
0 Kudos
RobFisher
Contributor
Contributor
Jump to solution

The v.1.60.x driver made a huge difference for me, and the v1.62 driver made more improvement on my 64KB Sequential Read tests. Overall, the driver seemed to make my environment usable.

I'd say if you're not in production yet, try out the new driver, or even try it on one of your production ESX/ESXi 4.1 hosts to see if it makes a difference.  Just move your guests over to the other one, and give it a try.  Maybe build a test system really quick.

If you're using EqualLogic, make sure you have the Dell EqualLogic Multipath Extension Module (MEM) PSP installed and configured.  You can also configured the SCSI controller as ParaVirtualized SCSI controller on non-OS volumes to see some improvement.  What kind of switches are you using?  Have you configured them per their recommended iSCSI configuration?  Are you using them for iSCSI traffic only?

0 Kudos
enderox
Contributor
Contributor
Jump to solution

Hi,


apologies for not being more descriptive with our setup:
4 x Dell R910
   quad 8core Xeon7550, 128Gb mem, mirrored SD cards for ESXi install,
   12 NICs: 1 x 57711 10GbE (dual port), 4 x 5709c dual port GbE, 4 x onboard 5709c GbE

2 x PowerConnect 8024F
PS6510x 48disk SAS
PS6510e 48disk SATA

ESXi 4.1.0 update1 (build 348481):

completely separate iSCSI network using the 10GbE cards
using ESXi's iSCSI s/w initiator

no Distributed vSwitches

no OpenManage (iDRAC on these servers only)

Performance issues only discovered once when we went live, and started P2V'ing servers.

Dell have shipped Intel 10GbE cards to replace the Broadcoms. Lots of work still to do.

Message was edited by: enderox

0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

I agree with your rant 1,000,000% :smileysilly:

Without a lot of proof at this point in time, I would NOT install the "Dell OpenManage Offline Bundle and VIB for ESXi".

I have two scenarios that have cause me problems and once I removed OpenMange from the equation they have both stopped. I am awaiting for EqualLogic to finish testing AND/OR a spare EqualLogic Shelf to re-test with.

  1. ESXi hosts disconnect from vCenter during vMotion's and the host usually reconnects but not always. This does not occur when the host does not have OpenManage installed.
  2. The 6.4.0 version of OpenManage, when installed on ESXi 4.1 Update 1 (2 x hosts) caused our 2 x PS6010XV units to become seperated from each other / split brain. Everything returned to normal when I powered off these 2 x ESXi hosts. Additionally the ESXi hosts could not see any volumes on the SAN immediately before the problem occurred.

Once again I must stress that problem 2 has occurred once and because it occurred on a production system I have not had the desire to test again. I have logged the case with EqualLogic however there has been delays in their testing because they are having troubles getting a Dell OpenManage expert assigned to them.

Just a heads up.

But please get back to us with your specs, there may be something that stands out.

VCP3 & VCP4 32846 VSP4 VTSP4
0 Kudos
RobFisher
Contributor
Contributor
Jump to solution

To clarify the Dell EqualLogic Multipath Extension Module (MEM) PSP is not the Dell OpenManage Bundle.  I haven't installed the OpenManage Bundle so don't have any experiance with it.

0 Kudos
FunRaiser
Contributor
Contributor
Jump to solution

Hello All,

We are currently using the 57711 in our testlab, but not yet in production.  I called dell to ask if there are really problems with the 57711 cards, and I refered to this post but they replied as following:

There are a few articles concerning Jumbo frames & VMWare, ID:71219  but nothing about
Broadcom/Intel cards.  As you know, unless there is an internal memo, we can't be
replacing parts based on 3rd party vendor forums that factually doesn't say what the
customer is saying.  If down the road they have problems then we can do a break-fix.

Can anyone tell me what the problems exactly are with these 57711 cards?

Is there any way that I reproduce these problems before bringing these cards into production?

Thanks,

0 Kudos
RobFisher
Contributor
Contributor
Jump to solution

As long as you install the lastest driver pack posted on VMware's site you shouldn't see the issue that much.  I'm currently getting about 400MB/s with the latest drivers installed doing 64KB seq read tests, where I was getting 8MB/s with the drivers included in the base install of ESXi 4.1.

I'm curious to know how the Intel 10Gb cards compare.

0 Kudos
chimera
Contributor
Contributor
Jump to solution

Hi,

Well I'm glad I ran into this post,... been having exactly the same issues and have been doing endless changes with Dell EqualLogic support on this issue.  Running a single Dell EqualLogic PS6010XV, Dell Broadcom Netxtreme II 57711 NIC's and Dell Powerconnect 8024F switches. Read latency was through the roof!  One big improvement initially came by upgrading the firmware on the PowerConnects to 3.1.4.8 A3 (Dell didn't recommend going to a major release yet)  Also ensure that: iSCSI connectivity is in anything BUT the default VLAN (jumbo frames not supported in default VLAN, Dell standardised on VLAN11 internally, so I went with the same), MTU is 9216 (not to be confused with MTU of 9000 on hosts and san), flow control is enabled and iSCSI is disabled on the switches and of course that your LAGS are configured correctly (and trunked in VLAN11)

Anyways, my read latency was still intermittent, jumping up to 200ms+ (prior it was intermittently 1,500ms or more)  It would explain why the (Windows) Veeam backup server SAN attached still managed to backup pretty quick.

I've just upgraded to the latest driver 1.62.15, seems to have resolved some immediate performance issues (its alot better than it was).  Clearly Broadcom/VMware are aware there are issues considering 1.60 was released 3 of March, and 1.62 released only 20 days later...

For those not to VMware savvy, do this to check and upgrade your driver:

1. On ESX console list your NIC's, confirm that the Broadcom one's are "bnx2x" - if not, replace any instance below with the correct name.

esxcfg-nics -l

vmnic10 0000:0e:00.00 bnx2x       Up   10000Mbps Full   00:10:18:9f:8c:44 9000   Broadcom Corporation NetXtreme II 57711 10Gigabit Ethernet
vmnic11 0000:0e:00.01 bnx2x       Up   10000Mbps Full   00:10:18:9f:8c:46 9000   Broadcom Corporation NetXtreme II 57711 10Gigabit Ethernet

2. Check your current driver version with:

esxupdate query --vib-view | grep "bnx2x"

rpm_vmware-esx-drivers-net-bnx2x_400.1.54.1.v41.1-2vmw.1.4.348481 @x86_64          installed     2011-04-18T17:33:02.337116+12:00

Driver version highlighted in bold above - eg: its old!

3. To update the driver, first migrate VM's off then put the host in maintenance mode

4. Download the VSphere CLI for your ESX version from here: http://www.vmware.com/support/developer/vcli/ and install it (if not already)

5. Download the latest Broadcom drivers, currently for me these ones here...

http://downloads.vmware.com/d/details/dt_esx41_broadcom_netxtremeii_032311/ZHcqYnR0anBiZHRAag==

6. Extract them to a folder on the same server as you put the CLI on, eg: into a folder C:\Broadcom\

7. Open a command prompt, change into "C:\Program Files (x86)\VMWare\VMWare CLI\bin" and run:

vihostupdate.pl --server ipaddress --install --bundle C:\Broadcom\offline-bundle\BCM-bnx2x-1.62.15.v41.2-offline_bundle-380522.zip

...where ipaddress is the IP address (or hostname) of the ESX server. You will be prompted for username and password, enter the root username and password, successful output will show as follows:

Please wait patch installation is in progress ...
The update completed successfully, but the system needs to be rebooted for the changes to be effective.

Connect to the console again and check your new driver version:

8. Restart the host

9. Connect to the console and run the same command again to confirm the new driver version, eg:

esxupdate query --vib-view | grep "bnx2x"

cross_vmware-esx-drivers-net-bnx2x_400.1.62.15.v41.2-1vmw.0.0.00000               installed     2011-05-27T13:52:18.509539+12:00


10. Exit maintenance mode, migrate some VM's back and test it!

Hope this helps someone.

If you're using EqualLogic, make sure you have the Dell EqualLogic Multipath Extension Module (MEM) PSP installed and configured

Only available in VMware Enterprise Edition or higher, which for us is not possible as our client is only on VMWare Advanced.  I'm running software iSCSI initiator with jumbo frames enabled, running alot better so far.

Update...

I've noted latency, according to SANHQ has still jumped up in the past 24 hours intermittently. However where it was 200ms+ before, its now peaking to 50ms (but only did twice in the last 24 hours). While latency should really be < 5ms for best performance, the true test I can take from the driver update is a Llinux backup that runs each night. The Linux VM has 2 x vmdk's that both reside on the SAN, the process backs up from the first (main disk with database) to the second (the backup disk)  When the client had an HP EVA4000 (4Gbps fibre channel), this process took about 18-20 minutes to run each night. When we went to the EqualLogic (with default 4.1 bnx2x drivers) the time was consistantly around 60-65 minutes = 3 times longer!  Last night the backup process with 1.62.15 drivers took 12 minutes to run!  Happy client again... but I still think the Broadcom driver could be improved looking at the random latency spikes 🙂

Cheers,

Chimera

(VCP3/4, VTSP4)

0 Kudos
chimera
Contributor
Contributor
Jump to solution

UPDATE: Performance is still not as good as it should be.  I have tried various benchmark tools to rule out the tool itself being at fault, and its running similarly to a 1Gbps PS4000 SAN that we have at another client.  If anyones reading this, FORGET ABOUT 10GbE BROADCOM NIC's... waste of bloody time...

0 Kudos
enderox
Contributor
Contributor
Jump to solution

chimera wrote:

... FORGET ABOUT 10GbE BROADCOM NIC's... waste of bloody time...

here here

I checked yesterday - no new drivers since the v1.62.15 that we had already tried unsuccessfully - at least none approved by and available from VMware.

Our move to Intel NICs resolved our performance headaches.  I pity those with blade setups where the NICs cannot be changed.

If I remember correctly, we had to disable JumboFrames to get usable performance.

I lost way too much time and sleep messing with these stupid NICs.

0 Kudos
craigcollings
Contributor
Contributor
Jump to solution

Hi Chimera,

I attempted to upgrade the Broadcom driver on three of our ESX4.1i servers over the weekend using the command.

esxupdate --bundle=<offline-bundle>.zip update

The update work ok, and verified that the correct vmkernal module was loaded and that there were no errors in the the message log file as per

http://download3.vmware.com/software/esx/bnx2x-README.TXT

I can still see the vmnic, but can no longer see the iSCSI storage adaptor in vSphere or via the console.

Any thoughts?

Craig

0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

How many physical NICs do you have in the server? At the moment the maximum allowed is 4 x 10GB only (no mix of 1GB nics).

In the past we've had 4 x 1GB NICs (onboard) and 8 x 10GB NICs and only 5 of the 10GB NICs showed up after a specific driver update. We had to end up disabling the onboard 1GB NICs and removing 4 of the 10GB NICs.

VCP3 & VCP4 32846 VSP4 VTSP4
0 Kudos
craigcollings
Contributor
Contributor
Jump to solution

Hi.

just 4 onboard 1G nics, and the 2 on the Broadcom 57711.

They appear as iSCSI storage adaptors fine until I updated the driver to 1.62.15 , rebooted then they dissapeared.

Craig

0 Kudos
ShannonZ
VMware Employee
VMware Employee
Jump to solution

VMware only support JF for SW iSCSI initiator currently.

And in MN.next (5.x) release,it will support JF for dependent/independent HW iSCSI initiators, such as  BCM 570x/5771x (dependent), qla4xxx (independent).

So I think if you configure MTU 9000 with your dependent HW iSCSI(BCM57711), VMware cannot gaurentee no issue won't happen with unsupported config.

My suggestion here is "Don't use MTU 9000 with BCM 570x/5771x NIC adapters.

Thanks,

Shannon

0 Kudos