Having some issues getting Broadcom 57711 NIC working on ESX/ESXi 4.1 with MTU 9000.
On ESX/ESXi 4.0 U2 I have no problems getting and MTU of 9000 working, I can push our EQL PS6010 SAN pretty hard and get about 550,000 KBps with IO Meter running inside 4 x VM's.
I have read documentation and have found that only an MTU of 1500 is supported for the 57711 NIC's on ESX/ESXi 4.1. This has somethig to do with the fact that these NICs have hardware iSCSI offloading, there are additional iSCSI adapters that appear under the storage adapters section on 4.1 hosts.
I have it all properly configured with 1:1 binding of the iSCSI nics to the iSCSI VMK ports using an MTU of 1500. When configured this was the max I can get out of the SAN is about 280,000 KBps. If I try the same process using an MTU of 9000 the VM's / host seem to stop responding.
While I understand that there is a limitation of 1500 when using the hardware iSCSI NICs I am unable to get the software iSCSI to work with an MTU of 9000 on ESX/ESXi 4.1. If I try software iSCSI using an MTU of 9000 the VM's / host seem to stop responding.
Is this also a limitation or am I missing something.
Is there anyone else out there experiencing the same problems as me?
VCP3 & VCP4 32846
VSP4
VTSP4
So, I think maybe this issue isn't completely fixed. I'm seeing some really slow performance when I do 64KB Sequential Read Tests, and lots of lag. When I get above 32k, my confguration falls apart. Is anyone else seeing this issue still? I'm using SQLIO to test with.
My setup:
ESXi 4.1 hosts:
Dell R810s
Broadcom 57711 10Gb nics (4 ports per server) 1:1 nic to vmk, Jumbo Frames across the board.
driver: bnx2x
Dell EqualLogic Mem Module is installed
VMGuests:
Paravirtualized SCSI Controller
Storage:
Dell EqualLogic PS6010XV Firmware 5.0.4
Network:
Brocade TurboIron 24x Switches - Jumbo Frames, Flow Control, Egress-buffer-threshold Max enabled
Test Results:
Average I/O Size KB: 64KB
Averate IOPS: 401.8
Average Lagency ms: 14.4ms
Average I/O Rate MB/sec: 25.1MB/sec
Average Queue Depth: 2
Average I/O Size KB: 32KB
Averate IOPS: 19,266.7
Average Lagency ms: 0.30ms
Average I/O Rate MB/sec: 602.1MB/sec
Average Queue Depth: 2
Hi Rob,
It sounds and looks like the same problem.
Updated drivers is the answer to your problem.
VMware ESX/ESXi 4.1 Driver CD for Broadcom NetXtreme II Ethernet Network Controllers
http://downloads.vmware.com/d/details/esx41_broadcom_netextremeii_dt/ZHcqYnRlaHRiZHRAag==
There are notes in this thread that give you step by step instructions on how to install them from vMA.
Some other things to consider:
Hope this gets your problem sorted.
Ah! That was the problem. I was running 1.54 instead of 1.60.50.v41.2. I thought that Update Manager would have installed the newer one, but for me, it didn't. I had to manually install the patch using the Offline .zip import method into Update Manager. I guess with the latest 4.1 update, there is a bug with the /scratch drive, it disappears after a while, and the fix is to restart your ESXi host (I'm using embedded), then try to scan and apply updates. (Hope they fix that one soon too).
Anyway, so I've updated to 1.60 on the bnx2x driver. I found in 4.0, it looks like there is a 1.62 driver, I wonder if/when it will come to 4.1?
My results, while better than before, are still not what I expect. It seems there is a 127.1KB cap on I/O size, and for my 64KB sequential read test, I'm only getting around 350MB. My 32KB test still gets closer to 600MB. I was expecting the 64KB test to get to around 700MB.
Glad to hear you've got it sorted.
I think there is still a bit of an issue.. but it's a lot better.. and maybe even usable now. The 1.62 driver for ESX 4.0 was released not too long ago. I hope the 1.62 driver for ESX/ESXi 4.1 is released soon. I wonder if it will help more?
Thanks!
Rob
Hi there,
One very very interesting thread.
v1.62.15 drivers for ESXi 4.1.0 update1 are now available.
Anyone had any success using these with Jumbo Frames and ESXi iSCSI s/w initiator ?
We've lost about 3 days (& evenings) with this latency issue.
initially seeing latency up to 16600ms with Jumbo Frames enabled.
latency down to the hundreds with Jumbo Frames disabled.
Now below 70ms with new bnx2x driver, with Jumbo Frames disabled. System now usable but obviously not performance optimised.
Still unusable with Jumbo Frames enabled.
I really didn't think we were using bleeding edge equipment, but the performance woes have caused a lot of heartache.
I suspect we would have spec'd Intel 10GbE cards if we'd seen this thread earlier or Dell had made us aware of the issues.
I haven't tried the new drivers yet, might wait a while :smileylaugh:
What is the hardware you are using?
How many NICs in total in the ESXi HOST?
Are you using distributed vSwitches?
The v.1.60.x driver made a huge difference for me, and the v1.62 driver made more improvement on my 64KB Sequential Read tests. Overall, the driver seemed to make my environment usable.
I'd say if you're not in production yet, try out the new driver, or even try it on one of your production ESX/ESXi 4.1 hosts to see if it makes a difference. Just move your guests over to the other one, and give it a try. Maybe build a test system really quick.
If you're using EqualLogic, make sure you have the Dell EqualLogic Multipath Extension Module (MEM) PSP installed and configured. You can also configured the SCSI controller as ParaVirtualized SCSI controller on non-OS volumes to see some improvement. What kind of switches are you using? Have you configured them per their recommended iSCSI configuration? Are you using them for iSCSI traffic only?
Hi,
apologies for not being more descriptive with our setup:
4 x Dell R910
quad 8core Xeon7550, 128Gb mem, mirrored SD cards for ESXi install,
12 NICs: 1 x 57711 10GbE (dual port), 4 x 5709c dual port GbE, 4 x onboard 5709c GbE
2 x PowerConnect 8024F
PS6510x 48disk SAS
PS6510e 48disk SATA
ESXi 4.1.0 update1 (build 348481):
completely separate iSCSI network using the 10GbE cards
using ESXi's iSCSI s/w initiator
no Distributed vSwitches
no OpenManage (iDRAC on these servers only)
Performance issues only discovered once when we went live, and started P2V'ing servers.
Dell have shipped Intel 10GbE cards to replace the Broadcoms. Lots of work still to do.
Message was edited by: enderox
I agree with your rant 1,000,000% :smileysilly:
Without a lot of proof at this point in time, I would NOT install the "Dell OpenManage Offline Bundle and VIB for ESXi".
I have two scenarios that have cause me problems and once I removed OpenMange from the equation they have both stopped. I am awaiting for EqualLogic to finish testing AND/OR a spare EqualLogic Shelf to re-test with.
Once again I must stress that problem 2 has occurred once and because it occurred on a production system I have not had the desire to test again. I have logged the case with EqualLogic however there has been delays in their testing because they are having troubles getting a Dell OpenManage expert assigned to them.
Just a heads up.
But please get back to us with your specs, there may be something that stands out.
To clarify the Dell EqualLogic Multipath Extension Module (MEM) PSP is not the Dell OpenManage Bundle. I haven't installed the OpenManage Bundle so don't have any experiance with it.
Hello All,
We are currently using the 57711 in our testlab, but not yet in production. I called dell to ask if there are really problems with the 57711 cards, and I refered to this post but they replied as following:
There are a few articles concerning Jumbo frames & VMWare, ID:71219 but nothing about
Broadcom/Intel cards. As you know, unless there is an internal memo, we can't be
replacing parts based on 3rd party vendor forums that factually doesn't say what the
customer is saying. If down the road they have problems then we can do a break-fix.
Can anyone tell me what the problems exactly are with these 57711 cards?
Is there any way that I reproduce these problems before bringing these cards into production?
Thanks,
As long as you install the lastest driver pack posted on VMware's site you shouldn't see the issue that much. I'm currently getting about 400MB/s with the latest drivers installed doing 64KB seq read tests, where I was getting 8MB/s with the drivers included in the base install of ESXi 4.1.
I'm curious to know how the Intel 10Gb cards compare.
Hi,
Well I'm glad I ran into this post,... been having exactly the same issues and have been doing endless changes with Dell EqualLogic support on this issue. Running a single Dell EqualLogic PS6010XV, Dell Broadcom Netxtreme II 57711 NIC's and Dell Powerconnect 8024F switches. Read latency was through the roof! One big improvement initially came by upgrading the firmware on the PowerConnects to 3.1.4.8 A3 (Dell didn't recommend going to a major release yet) Also ensure that: iSCSI connectivity is in anything BUT the default VLAN (jumbo frames not supported in default VLAN, Dell standardised on VLAN11 internally, so I went with the same), MTU is 9216 (not to be confused with MTU of 9000 on hosts and san), flow control is enabled and iSCSI is disabled on the switches and of course that your LAGS are configured correctly (and trunked in VLAN11)
Anyways, my read latency was still intermittent, jumping up to 200ms+ (prior it was intermittently 1,500ms or more) It would explain why the (Windows) Veeam backup server SAN attached still managed to backup pretty quick.
I've just upgraded to the latest driver 1.62.15, seems to have resolved some immediate performance issues (its alot better than it was). Clearly Broadcom/VMware are aware there are issues considering 1.60 was released 3 of March, and 1.62 released only 20 days later...
For those not to VMware savvy, do this to check and upgrade your driver:
1. On ESX console list your NIC's, confirm that the Broadcom one's are "bnx2x" - if not, replace any instance below with the correct name.
esxcfg-nics -l
vmnic10 0000:0e:00.00 bnx2x Up 10000Mbps Full 00:10:18:9f:8c:44 9000 Broadcom Corporation NetXtreme II 57711 10Gigabit Ethernet
vmnic11 0000:0e:00.01 bnx2x Up 10000Mbps Full 00:10:18:9f:8c:46 9000 Broadcom Corporation NetXtreme II 57711 10Gigabit Ethernet
2. Check your current driver version with:
esxupdate query --vib-view | grep "bnx2x"
rpm_vmware-esx-drivers-net-bnx2x_400.1.54.1.v41.1-2vmw.1.4.348481 @x86_64 installed 2011-04-18T17:33:02.337116+12:00
Driver version highlighted in bold above - eg: its old!
3. To update the driver, first migrate VM's off then put the host in maintenance mode
4. Download the VSphere CLI for your ESX version from here: http://www.vmware.com/support/developer/vcli/ and install it (if not already)
5. Download the latest Broadcom drivers, currently for me these ones here...
http://downloads.vmware.com/d/details/dt_esx41_broadcom_netxtremeii_032311/ZHcqYnR0anBiZHRAag==
6. Extract them to a folder on the same server as you put the CLI on, eg: into a folder C:\Broadcom\
7. Open a command prompt, change into "C:\Program Files (x86)\VMWare\VMWare CLI\bin" and run:
vihostupdate.pl --server ipaddress --install --bundle C:\Broadcom\offline-bundle\BCM-bnx2x-1.62.15.v41.2-offline_bundle-380522.zip
...where ipaddress is the IP address (or hostname) of the ESX server. You will be prompted for username and password, enter the root username and password, successful output will show as follows:
Please wait patch installation is in progress ...
The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Connect to the console again and check your new driver version:
8. Restart the host
9. Connect to the console and run the same command again to confirm the new driver version, eg:
esxupdate query --vib-view | grep "bnx2x"
cross_vmware-esx-drivers-net-bnx2x_400.1.62.15.v41.2-1vmw.0.0.00000 installed 2011-05-27T13:52:18.509539+12:00
10. Exit maintenance mode, migrate some VM's back and test it!
Hope this helps someone.
If you're using EqualLogic, make sure you have the Dell EqualLogic Multipath Extension Module (MEM) PSP installed and configured
Only available in VMware Enterprise Edition or higher, which for us is not possible as our client is only on VMWare Advanced. I'm running software iSCSI initiator with jumbo frames enabled, running alot better so far.
Update...
I've noted latency, according to SANHQ has still jumped up in the past 24 hours intermittently. However where it was 200ms+ before, its now peaking to 50ms (but only did twice in the last 24 hours). While latency should really be < 5ms for best performance, the true test I can take from the driver update is a Llinux backup that runs each night. The Linux VM has 2 x vmdk's that both reside on the SAN, the process backs up from the first (main disk with database) to the second (the backup disk) When the client had an HP EVA4000 (4Gbps fibre channel), this process took about 18-20 minutes to run each night. When we went to the EqualLogic (with default 4.1 bnx2x drivers) the time was consistantly around 60-65 minutes = 3 times longer! Last night the backup process with 1.62.15 drivers took 12 minutes to run! Happy client again... but I still think the Broadcom driver could be improved looking at the random latency spikes 🙂
Cheers,
Chimera
(VCP3/4, VTSP4)
UPDATE: Performance is still not as good as it should be. I have tried various benchmark tools to rule out the tool itself being at fault, and its running similarly to a 1Gbps PS4000 SAN that we have at another client. If anyones reading this, FORGET ABOUT 10GbE BROADCOM NIC's... waste of bloody time...
chimera wrote:
... FORGET ABOUT 10GbE BROADCOM NIC's... waste of bloody time...
here here
I checked yesterday - no new drivers since the v1.62.15 that we had already tried unsuccessfully - at least none approved by and available from VMware.
Our move to Intel NICs resolved our performance headaches. I pity those with blade setups where the NICs cannot be changed.
If I remember correctly, we had to disable JumboFrames to get usable performance.
I lost way too much time and sleep messing with these stupid NICs.
Hi Chimera,
I attempted to upgrade the Broadcom driver on three of our ESX4.1i servers over the weekend using the command.
esxupdate --bundle=<offline-bundle>.zip update
The update work ok, and verified that the correct vmkernal module was loaded and that there were no errors in the the message log file as per
http://download3.vmware.com/software/esx/bnx2x-README.TXT
I can still see the vmnic, but can no longer see the iSCSI storage adaptor in vSphere or via the console.
Any thoughts?
Craig
How many physical NICs do you have in the server? At the moment the maximum allowed is 4 x 10GB only (no mix of 1GB nics).
In the past we've had 4 x 1GB NICs (onboard) and 8 x 10GB NICs and only 5 of the 10GB NICs showed up after a specific driver update. We had to end up disabling the onboard 1GB NICs and removing 4 of the 10GB NICs.
Hi.
just 4 onboard 1G nics, and the 2 on the Broadcom 57711.
They appear as iSCSI storage adaptors fine until I updated the driver to 1.62.15 , rebooted then they dissapeared.
Craig
VMware only support JF for SW iSCSI initiator currently.
And in MN.next (5.x) release,it will support JF for dependent/independent HW iSCSI initiators, such as BCM 570x/5771x (dependent), qla4xxx (independent).
So I think if you configure MTU 9000 with your dependent HW iSCSI(BCM57711), VMware cannot gaurentee no issue won't happen with unsupported config.
My suggestion here is "Don't use MTU 9000 with BCM 570x/5771x NIC adapters.
Thanks,
Shannon