VMware Cloud Community
Box293
Enthusiast
Enthusiast
Jump to solution

Broadcom 57711 + ESX/ESXi 4.1 + Jumbo Frames

Having some issues getting Broadcom 57711 NIC working on ESX/ESXi 4.1 with MTU 9000.

On ESX/ESXi 4.0 U2 I have no problems getting and MTU of 9000 working, I can push our EQL PS6010 SAN pretty hard and get about 550,000 KBps with IO Meter running inside 4 x VM's.

I have read documentation and have found that only an MTU of 1500 is supported for the 57711 NIC's on ESX/ESXi 4.1. This has somethig to do with the fact that these NICs have hardware iSCSI offloading, there are additional iSCSI adapters that appear under the storage adapters section on 4.1 hosts.

I have it all properly configured with 1:1 binding of the iSCSI nics to the iSCSI VMK ports using an MTU of 1500. When configured this was the max I can get out of the SAN is about 280,000 KBps. If I try the same process using an MTU of 9000 the VM's / host seem to stop responding.

While I understand that there is a limitation of 1500 when using the hardware iSCSI NICs I am unable to get the software iSCSI to work with an MTU of 9000 on ESX/ESXi 4.1. If I try software iSCSI using an MTU of 9000 the VM's / host seem to stop responding.

Is this also a limitation or am I missing something.

Is there anyone else out there experiencing the same problems as me?

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
Reply
0 Kudos
1 Solution

Accepted Solutions
jeffa35
Contributor
Contributor
Jump to solution

Looks like an updated Broadcom Driver CD (1.60.50.v41.2) was recently posted in the vsphere downloads/drivers&tools area. I have not tried it yet, hopefully it will resolve this issue.

View solution in original post

Reply
0 Kudos
60 Replies
jeffa35
Contributor
Contributor
Jump to solution

Yes, I am having similar issues.

I am using the Broadcom 57711 NICS in Dell Servers (R710, 2950), EQL PS6510E array, and PC 8024F switches. Everything running perfectly under ESX 4.0, vm's will do 500MB/s. If I do a fresh install of ESX or ESXi 4.1, the volumes connect, and the servers will boot up. However, if I put a significant load on the SAN using iometer, the disk latency jumps up to 6000ms, and throughput drops to 1MB/s.

I am using the iSCSI Software Adapter. I am not doing VMK 1:1 port binding, just 1 active/1 standby NIC. For me, the problem exists with both standard frames or jumbo frames on the iSCSI Software Adapter. I have not yet tried using the new hardware(bnx2i) iSCSI initiator that is available with these cards. My reasoning is that the performance benefit of jumbo frames far outweighs TCP offload. We have plenty of CPU, so I'd prefer increased throughput.

I suspect the broadcom bnx2x driver that comes with 4.1 is the culprit. When I have some free time, I'll try running the hardware iSCSI adapter to see if my results are consistent with yours.

Reply
0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

Great to find someone else with the same problem. I have not made any progress. Right now I have a case open with VMware.

Initially they asked my to try unloading and disabling the bnx2i module as this is the Broadcom iSCSI hardware module?

If I try and unload it I am told it is unable to as it is busy.

~ # esxcfg-module -u bnx2i

Unable to load module bnx2i: Busy

If I disable it, it doesn't return any output.

~ # esxcfg-module -d bnx2i

I then reboot the server however nothing has changed, there are still hardware Broadcom iSCSI Adapters appearing under storage adapters.

We looked through the BIOS etc trying to find out how to disable the iSCSI offloading engine. In the BIOS it only shows the NICs for the 1Gb fabric A NICs, not the 10Gb NICs. They asked me to log a case with Dell.

Dell talked me through powering off the server, removed it from the chassis and removed the iSCSI offloading key (green thing). This only turns off iSCSI on the 1Gb NICs. I removed one of the 10Gb modules but it has no key to remove to disable iSCSI offloading.

Dell are going to get back to me today / tomorrow to see if they have come up with anything.

What we need is to be able to either disable the VMware hardware iSCSI driver from loading at bootup OR disable the iSCSI offloading engine on the cards themselves.

I will be continuing to persist with Dell and VMware until we get a working solution. They'res all this hype about 4.1 but there's no way I will be going to 4.1 without having jumbo frames enabled, the performance difference is significant.

I think this is one of those classic cases of using the latest technology and we are the ones that hit all these annoying problems that need to get ironed out. In 12 months time people will be raving about 10Gb iSCSI and ESX 4.1 .... but not right now ;o)

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
Reply
0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

OK I really wanted to make sure I wasn't missing something when trying to disable the bnx2i module. So I removed the 4 x 10GB NIC daughterboard cards and the 1GB iSCSI offloading key from the M710.

I then installed ESXi from scratch. Once installed the following information is shown about the module:

~ # esxcfg-module -l

Name Used Size (kb)

bnx2i 0 88

Here I can see the module is loaded but not used.

So now I unload the module.

~ # esxcfg-module -u bnx2i

Module bnx2i unloaded successfully

When I run the command "esxcfg-module -l" it no longer shows the module in the list.

Then I disabled the module from next boot.

~ # esxcfg-module -d bnx2i

Rebooted server and then listed the modules again:

~ # esxcfg-module -l

Name Used Size (kb)

bnx2i 0 88

So it appears the module is still being loaded even after disabling it. The documentation for the command states:

-d|--disable Disable a given module, indicating it should not be loaded on boot.

So VMware, what am I doing wrong here? I am disabling the module so it is not loaded on next boot however it is being loaded on next boot.

After powering off the M710, re-installed the cards and iSCSI key and powering back on again the host still shows additional hardware storage adapters.

I have a case open with VMware on this and they asked me to consult Dell. I have consulted Dell and yesterday I got an engineer on the phone who actually wanted to resolve this. I have sent him documentation on how to reproduce my problem (detailed, easy to follow steps). They are going to reproduce the problem in-house and get back to me.

I have also responded to VMware with todays results on how disabling a module for next boot does not work.

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
Reply
0 Kudos
VMmatty
Virtuoso
Virtuoso
Jump to solution

Not sure if this applies to either of you but wanted to make you aware of the following KB:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102936...

Some Broadcom 57711 models are causing purple screen crashes on hosts as described above. Hopefully you're not affected by this issue.

Matt

My blog:

Matt | http://www.thelowercasew.com | @mattliebowitz
Reply
0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

Thanks for that, luckily I've not experience a PSOD while troubleshooting this one.

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
Reply
0 Kudos
ORRuss
Enthusiast
Enthusiast
Jump to solution

Same for me with the gig version of that nic (5709) on a dl380g7. Here's the response I got from VMWare:

Looking internally, the bnx2i module is not a module which can be disabled on boot. The unload command is only applicable on certain modules, others are deemed 'core' and will not be disabled. The bnx2i is one of these. Even if you were to able to disable the module, it will load again on next boot.

Still searching for an answer...

orRuss

Reply
0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

Thanks for the input.

I'm all for new technologies like hardware offloading but it baffles me that VMware release this new technology without a way to turn it on or off.

This is preventing us from migrating to 4.1 because we don't want a 50% performance drop.

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
Reply
0 Kudos
ORRuss
Enthusiast
Enthusiast
Jump to solution

Firmware update seems to have resolved my issue. I now have four vmhba iSCSI storage adapters showing up by default, and all nics are available for networking as well.

My coworker discovered an update for this nic that was released in on 9/30. This is the 5709 (HP's NC382i) mind you.

Oddly, all four onboard nics are the same model, but two had a different firmware version from the other two out of the box.

orRuss

Reply
0 Kudos
randybw1
Contributor
Contributor
Jump to solution

I also have the 57711 cards with Dell 810s and I cannot get Jumbo Frames to work either. Please let me know what you find out from Dell/VMWare so far I've got nothing.

Reply
0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

A post with information but no resolution.

OK so at the moment VMware are asking me to ask Dell how to disable the iSCSI Offloading on these 57711 NICs. They are asking if there is a way to enable a network card BIOS on bootup so we can configure them.

At this point I had already gone through all of this with Dell and they said no. After seeing orRuss's response about firmware I thought I might double check Dell's website for any firmware update. Nothing there yet.

However I did download the Broadcom DOS diagnostics utilties. I have since downloaded the latest ones from the Broadcom website.

Looking through the manual there is no option to disable the iSCSI Offloading engine, this would be the ideal solution (maybe not ideal but a solution none the less).

There is an option to enable or disable management firmware. I assume this would be something like a BIOS screen you enter on bootup.

3.9 Command line option -mfw

cmd -mfw

Descriptbn: emble (1 )/disable (0) management firmware

Syntax: -mfw <1|0>

1: enable; 0: disable

Example:

At the DOS prompt enter:

uedaig -mfw 1 -dev 2

-Enable management firmware on device 2.

The commands I used to try and enable the management firmware is as follows:

uedaig -mfw 1 -dev 1

uedaig -mfw 1 -dev 2

And so on for all 8 devices I have.

Each command takes about 3 minutes to complete, it appears to run through four seperate tests (A, B, C, D).

After rebooting the host I don't see any new BIOS option to enter a management interface :o(

Additionally you can run the following command that skips the four tests:

uedaig -t abcd -mfw 1 -dev 2

Currently I have a Dell engineer assigned to this case who is pretty keen to resolve the problem. They are at the stage of escallating this to their EqualLogic team and so on. Will keep you all updated as to how this goes.

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
Reply
0 Kudos
Box293
Enthusiast
Enthusiast
Jump to solution

I urge you to log a case with both VMware and Dell about this.

The more people that report the problem the sooner we will get it resolved.

I'll keep you posted none the less.

VCP3 & VCP4 32846

VSP4

VTSP4

VCP3 & VCP4 32846 VSP4 VTSP4
Reply
0 Kudos
randybw1
Contributor
Contributor
Jump to solution

I agree, I will contact both tomorrow. Thanks for all the info and updates.

Reply
0 Kudos
randybw1
Contributor
Contributor
Jump to solution

Here's where I'm at with Dell:

I haven’t heard anything known on the Equalogic end, but could you check one other thing for me on this host – this Broadcom controller supports iSCSI offloading. I assume you are using iSCSI to access this SAN, but even if not, could you disable that feature? This NIC controller likely has a BIOS that comes up during POST (CTRL+C if I recall correctly) and make sure the option is turned off in there.

ESX should have it’s own (software) iSCSI offloading it can provide (should you want to use that feature). The vSphere client should allow that option if I recall.

Lastly, if that doesn’t work, or if the connection still seems a bit fuzzy, let’s ensure the NICs are running the latest firmware. The link for it is here:

http://support.dell.com/support/downloads/download.aspx?c=us&cs=555&l=en&s=biz&releaseid=R270088&Sys...

.

The update package is listed for RHEL, but will likely apply if you use the service console. If it doesn’t, you can use our OMSA Live CD to apply it as well. The link for that is here: http://linux.dell.com/files/openmanage-contributions/omsa-63-live/OMSA63-CentOS55-x86_64-LiveCD.iso

Applying the firmware is necessary because if this doesn’t resolve the issue, I will likely need to engage Broadcom for this issue.

Thanks!

I'm trying to update the firmware now. Even though I don't think this is going to help.

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot
Jump to solution

Having exactly the same issue, 57711 in esxi 4.1 against Equallogic causes extreme latency. I am thinking about ripping off the 57711´s and replace em with intel 10 gb cards.

Any solution by now?

Best regards,

Joerg

Reply
0 Kudos
randybw1
Contributor
Contributor
Jump to solution

Last message I got from the Dell Support rep I've been working with - "we are working with Broadcom on this issue but we’re still trying to see if VMWare will introduce a better version of the driver, or if Broadcom will come out with improved firmware. No telling yet which one comes out first."

I guess it's a waiting game at this point.

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot
Jump to solution

I am ripping out all 57711´s right now replacing them with intel x520´s.

Can not afford this huge latency problem.

best regards,

Joerg

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot
Jump to solution

Just finished the first swap. I can confirm the latency problem is gone with the intel x520 card.

joergriether
Hot Shot
Hot Shot
Jump to solution

follow-up: i isolated an esx machine with a 57711 and wanted to get deeper in this. i fired up huge network activity by transfering huge ammounts of data to the vm and did also a svmotion. It appears that the 57711 driver with esxi 4.1 at some point with high network activity will do weird things. In my testlab, which fired a esxi 4.1 with a 57711 configured as sw iscsi against a equallogic with 10 gb ports the latency suddenly went up to 8000 ms (!!!!). And i didn´t even used jumbo frames. Wohoo. i never ever saw a read/write latency more than 300ms from what i can remember Smiley Wink And even that was a bad experience. Now THIS is awesome. Awesome awful. It has to be related either to the driver or the 57711 firmware or both. Doesn´t really matter for me any more - i removed these adapters from production by now and will wait till a new driver/firmware is available.

Reply
0 Kudos
milanmbk
Contributor
Contributor
Jump to solution

ESX4.1 supports jumbo frames. Attached ESX4.1 Configuration Guide, Please review the Network Section in the Guide.

However the document also mentions about enabling TSO on vm's and there are only limited Vm's supported with TSO at this Point. It does not list w2008 vm. This is something Vmware Engineering has to answer?

TCP Segmentation Offload and Jumbo Frames

You must enable jumbo frames at the host level by using the command-line interface to configure the MTU

size for each vSwitch. TCP Segmentation Offload (TSO) is enabled on the VMkernel interface by default, but

must be enabled at the virtual machine level.

Enabling TSO

To enable TSO at the virtual machine level, you must replace the existing vmxnet or flexible virtual network

adapters with enhanced vmxnet virtual network adapters. This replacement might result in a change in the

MAC address of the virtual network adapter.

TSO support through the enhanced vmxnet network adapter is available for virtual machines that run the

following guest operating systems:

n Microsoft Windows 2003 Enterprise Edition with Service Pack 2 (32 bit and 64 bit)

n Red Hat Enterprise Linux 4 (64 bit)

n Red Hat Enterprise Linux 5 (32 bit and 64 bit)

n SUSE Linux Enterprise Server 10 (32 bit and 64 bit)

ESX Configuration Guide

60

http://www.vmware.com/pdf/vsphere4/r41/vsp_41_esx_server_config.pdf

Reply
0 Kudos