Re: ESX 3 NIC Teaming not working on IBM BladeCent...

omslaw · ‎11-02-2006

I have LS20 Blades in my IBM BladeCenter with 2 Cisco switches in the chassis. The blades have two NICs on them - one to each Cisco switch. I am successfully running NIC teaming with ESX 2.5.x.; both NICs are in the same 'bond'.

My issue is when I install ESX 3 and setup NIC teaming, my connectivity will drop; I can't ping/access the Service Console or any of the VMs. To resolve the issue, I've had to go in and set ONE of the NICs as Primary and the other as a Standby. Once I do this, all connectivity is restored.

Is this the only way that I'm going to get this to work? I would like to take advantage of both NICs. Has anyone else run across this? What's the trick?

nadger · ‎11-02-2006

Have you made any switch config?

Try changing the load balancing setting on the vSwitch try route based on IP hash and make sure the Network Failover Detection is set to link status only. I don't think beacon probing works with VLAN's (if you are using them)

omslaw · ‎11-02-2006

I'll make the change and try. The vSwitch was set at the defaults of 'route based on originating virtual port ID'.

Why change to IP hash? Wouldn't the default setting be better?

omslaw · ‎11-02-2006

Oh, also forgot...yes, I am using VLANs. No, I haven't made any changes to the Cisco switches in the chassis. I currently have 3 blades running ESX 2.5.x with the 'bonding' working.

I have 2 blades with ESX 3 (1 with 3.0 and 1 with 3.01) and they both have the same problem. Using the default settings, NIC teaming doesn't work. If I assign only one vmnic to the vswitch, then I'm able to get to the SC and the VMs.

I'll make the changes to the vSwitch and see if that helps.

Monoman · ‎11-02-2006

haven't made any changes to the Cisco switches in the
chassis. I currently have 3 blades running ESX 2.5.x
with the 'bonding' working.

If you haven't changed the switch configuration then that is probably the cause. You probably need to configure each of the interfaces as trunk ports

switchport mode trunk

spanning-tree portfast trunk

I hope that helps.

omslaw · ‎11-02-2006

They are trunk ports. The issue is with NIC Teaming.

If I only assign ONE vmnic to the vswitch, everything works fine; VLANs work, VMs can be accessed, etc.

When I add the second vmnic to the vswitch...that's when the problems start. It's like all network connectivity stops.

asyntax · ‎11-02-2006

You might want to try to leave the native Vlan for the configuration as untagged. I have see this cause issues.

Monoman · ‎11-02-2006

They are trunk ports. The issue is with NIC
Teaming.

Ok good. You originally said you did not change the switch config so I just wanted to make sure you did setup trunk ports.

If I only assign ONE vmnic to the vswitch, everything
works fine; VLANs work, VMs can be accessed, etc.
When I add the second vmnic to the vswitch...that's
when the problems start. It's like all network
connectivity stops.

Have you checked the stats on the switch interface and the logs? How about the ESX logs? Start with the basics.

Also, make sure your BC firmware are all up to date as well. Whatever the issue is we have with a BC, they always have us update to the latest firmware and the issue magically goes away.

andrew_hald · ‎11-02-2006

Are you try to use an aggregate bandwidth EtherChannel? If you want to do teaming, and not just failover, both NICs must connect to the same physical switch on the back side. We only want failover, so we have not enabled teaming and we attach our blade NICs to separate back-end switches.

Paul_Lalonde · ‎11-02-2006

Andrew is correct... both blade NICs must go to the same switch module to achieve link aggregation.

Your best bet is to use the standard teaming option of port group ID. You'll only get outbound load balancing but it's better than nothing.

Paul

omslaw · ‎11-03-2006

Unless I'm overlooking something, I can't have both blade NICs into the same switch...IBM has them physically connected (via the mid-plane) to two separate switches. Each chassis switch is then connected (via etherchannel) to two separate core switches.

Given that I only have two NICs on the blade and each connects to a separate switch in the chassis, what setting should I use in ESX3 for NIC teaming?

I've tried the default of 'Route based on originating virtual port ID' and 'Route based on IP hash'. Both of those options cause loss of connectivity on the VMs. Which would be the best option for the Blades and ESX3?

andrew_hald · ‎11-03-2006

Exactly.

We have our blades configured exactly the same way. "Route based on originating virtual port ID", "Link Status only" and "Yes." This configuration does not support IEEE 802.3ad link aggregation (aka EtherChannel). I am thinking that you may still have a problem with your switch config.

The correct switchport setup is etherchannel disabled, ports configured as trunked, no DTP negotiation. Just straight 802.1q tagged VLANs.

andrew_hald · ‎11-03-2006

Also, how are your portgroups setup? What portgroup(s) is/are your VMs attached to? Thanks.

omslaw · ‎11-07-2006

The blade ports on the switch are setup as follows:

interface GigabitEthernet0/1

description blade1

switchport trunk native vlan 4000

switchport trunk allowed vlan 2-12,14,15

switchport mode trunk

link state group 1 downstream

spanning-tree portfast trunk

spanning-tree bpdufilter enable

omslaw · ‎11-07-2006

Portgroup setup was kept simple...just the Network label and a VLAN ID. All other settings were left @ defaults.

I have 12 portgroups on the vswitch. Depending on the VM is which PG it will connect to.

As long as I have only ONE vmnic assigned as 'Active' in the vswitch, everything works fine. When I add the second vmnic also as an 'Active' adapter, thats when the problems start.

The_Ether · ‎12-06-2006

I had a similar problem with an HP 2824 switch.

I had to get the trunk type right, and the Load Balancing set as "Route based on IP hash"

I believe you can use Fast EtherChannel across switches, but that is a question for the network guys.

Here is what I found: http://theether.net/kb/100014

egeoffman · ‎02-07-2007

Excellent help, thanks guys - found this and sorted my issues straight away!

andreas_fatum · ‎03-22-2007

Fine it solved the issues for you but it doesn't really give any answers to the Bladeserver DualNIC / Dual-ESM (Ethernetswitchmodule) problem.

We have a similar setup: Bladecenter with two Nortel 20port L2-3 Switchmodules and LS21-Blades with DualNICs (with each NIC hardwired to one of the ESMs) and one external Nortel Passport 8600 core switch.

From each ESM we have a VLAN tagged trunk (MLT) to this core switch.

On the Blades has ESX 3.0.1 installed with one virtual Switch including portgroups for several VLANs, service console and VMkernel-If for VMotion.

However when adding the second physical NIC to the virtual switch all connection breaks completely and it's necessary to login to the server console via Bladecenter management remote console and manually remove the 2nd NIC from the vSwitch-configuration by issuing esxcfg-vswitch vSwitch0 -U vmnic1. (Confusing side-effect: On the secong Blade o nly vmnic1 works and when adding vmnic0 everything drops. Although the servers have been setup identical and also all switchports are configured 100% identical! Isn't that strange?)

Are we missing a point here? How should the physical switching be done to achieve active-active load-balancing and failover for the Blades?

Any good hints are welcome.

Regards,

Andreas

All

ESX 3 NIC Teaming not working on IBM BladeCenter