omslaw
Contributor
Contributor

ESX 3 NIC Teaming not working on IBM BladeCenter

I have LS20 Blades in my IBM BladeCenter with 2 Cisco switches in the chassis. The blades have two NICs on them - one to each Cisco switch. I am successfully running NIC teaming with ESX 2.5.x.; both NICs are in the same 'bond'.

My issue is when I install ESX 3 and setup NIC teaming, my connectivity will drop; I can't ping/access the Service Console or any of the VMs. To resolve the issue, I've had to go in and set ONE of the NICs as Primary and the other as a Standby. Once I do this, all connectivity is restored.

Is this the only way that I'm going to get this to work? I would like to take advantage of both NICs. Has anyone else run across this? What's the trick?

0 Kudos
17 Replies
nadger
Enthusiast
Enthusiast

Have you made any switch config?

Try changing the load balancing setting on the vSwitch try route based on IP hash and make sure the Network Failover Detection is set to link status only. I don't think beacon probing works with VLAN's (if you are using them)

0 Kudos
omslaw
Contributor
Contributor

I'll make the change and try. The vSwitch was set at the defaults of 'route based on originating virtual port ID'.

Why change to IP hash? Wouldn't the default setting be better?

0 Kudos
omslaw
Contributor
Contributor

Oh, also forgot...yes, I am using VLANs. No, I haven't made any changes to the Cisco switches in the chassis. I currently have 3 blades running ESX 2.5.x with the 'bonding' working.

I have 2 blades with ESX 3 (1 with 3.0 and 1 with 3.01) and they both have the same problem. Using the default settings, NIC teaming doesn't work. If I assign only one vmnic to the vswitch, then I'm able to get to the SC and the VMs.

I'll make the changes to the vSwitch and see if that helps.

0 Kudos
Monoman
Enthusiast
Enthusiast

haven't made any changes to the Cisco switches in the

chassis. I currently have 3 blades running ESX 2.5.x

with the 'bonding' working.

If you haven't changed the switch configuration then that is probably the cause. You probably need to configure each of the interfaces as trunk ports

switchport mode trunk

spanning-tree portfast trunk

I hope that helps.

0 Kudos
omslaw
Contributor
Contributor

They are trunk ports. The issue is with NIC Teaming.

If I only assign ONE vmnic to the vswitch, everything works fine; VLANs work, VMs can be accessed, etc.

When I add the second vmnic to the vswitch...that's when the problems start. It's like all network connectivity stops.

0 Kudos
asyntax
Enthusiast
Enthusiast

You might want to try to leave the native Vlan for the configuration as untagged. I have see this cause issues.

0 Kudos
Monoman
Enthusiast
Enthusiast

They are trunk ports. The issue is with NIC

Teaming.

Ok good. You originally said you did not change the switch config so I just wanted to make sure you did setup trunk ports.

If I only assign ONE vmnic to the vswitch, everything

works fine; VLANs work, VMs can be accessed, etc.

When I add the second vmnic to the vswitch...that's

when the problems start. It's like all network

connectivity stops.

Have you checked the stats on the switch interface and the logs? How about the ESX logs? Start with the basics.

Also, make sure your BC firmware are all up to date as well. Whatever the issue is we have with a BC, they always have us update to the latest firmware and the issue magically goes away.

0 Kudos
andrew_hald
VMware Employee
VMware Employee

Are you try to use an aggregate bandwidth EtherChannel? If you want to do teaming, and not just failover, both NICs must connect to the same physical switch on the back side. We only want failover, so we have not enabled teaming and we attach our blade NICs to separate back-end switches.

0 Kudos
Paul_Lalonde
Commander
Commander

Andrew is correct... both blade NICs must go to the same switch module to achieve link aggregation.

Your best bet is to use the standard teaming option of port group ID. You'll only get outbound load balancing but it's better than nothing.

Paul

0 Kudos
omslaw
Contributor
Contributor

Unless I'm overlooking something, I can't have both blade NICs into the same switch...IBM has them physically connected (via the mid-plane) to two separate switches. Each chassis switch is then connected (via etherchannel) to two separate core switches.

Given that I only have two NICs on the blade and each connects to a separate switch in the chassis, what setting should I use in ESX3 for NIC teaming?

I've tried the default of 'Route based on originating virtual port ID' and 'Route based on IP hash'. Both of those options cause loss of connectivity on the VMs. Which would be the best option for the Blades and ESX3?

0 Kudos
andrew_hald
VMware Employee
VMware Employee

Exactly. Smiley Happy

We have our blades configured exactly the same way. "Route based on originating virtual port ID", "Link Status only" and "Yes." This configuration does not support IEEE 802.3ad link aggregation (aka EtherChannel). I am thinking that you may still have a problem with your switch config.

The correct switchport setup is etherchannel disabled, ports configured as trunked, no DTP negotiation. Just straight 802.1q tagged VLANs.

0 Kudos
andrew_hald
VMware Employee
VMware Employee

Also, how are your portgroups setup? What portgroup(s) is/are your VMs attached to? Thanks.

0 Kudos
omslaw
Contributor
Contributor

The blade ports on the switch are setup as follows:

interface GigabitEthernet0/1

description blade1

switchport trunk native vlan 4000

switchport trunk allowed vlan 2-12,14,15

switchport mode trunk

link state group 1 downstream

spanning-tree portfast trunk

spanning-tree bpdufilter enable

0 Kudos
omslaw
Contributor
Contributor

Portgroup setup was kept simple...just the Network label and a VLAN ID. All other settings were left @ defaults.

I have 12 portgroups on the vswitch. Depending on the VM is which PG it will connect to.

As long as I have only ONE vmnic assigned as 'Active' in the vswitch, everything works fine. When I add the second vmnic also as an 'Active' adapter, thats when the problems start.

0 Kudos
The_Ether
Enthusiast
Enthusiast

I had a similar problem with an HP 2824 switch.

I had to get the trunk type right, and the Load Balancing set as "Route based on IP hash"

I believe you can use Fast EtherChannel across switches, but that is a question for the network guys.

Here is what I found: http://theether.net/kb/100014

0 Kudos
egeoffman
Contributor
Contributor

Excellent help, thanks guys - found this and sorted my issues straight away!

0 Kudos
andreas_fatum
Contributor
Contributor

Fine it solved the issues for you but it doesn't really give any answers to the Bladeserver DualNIC / Dual-ESM (Ethernetswitchmodule) problem.

We have a similar setup: Bladecenter with two Nortel 20port L2-3 Switchmodules and LS21-Blades with DualNICs (with each NIC hardwired to one of the ESMs) and one external Nortel Passport 8600 core switch.

From each ESM we have a VLAN tagged trunk (MLT) to this core switch.

On the Blades has ESX 3.0.1 installed with one virtual Switch including portgroups for several VLANs, service console and VMkernel-If for VMotion.

However when adding the second physical NIC to the virtual switch all connection breaks completely and it's necessary to login to the server console via Bladecenter management remote console and manually remove the 2nd NIC from the vSwitch-configuration by issuing esxcfg-vswitch vSwitch0 -U vmnic1. (Confusing side-effect: On the secong Blade o nly vmnic1 works and when adding vmnic0 everything drops. Although the servers have been setup identical and also all switchports are configured 100% identical! Isn't that strange?)

Are we missing a point here? How should the physical switching be done to achieve active-active load-balancing and failover for the Blades?

Any good hints are welcome.

Regards,

Andreas

0 Kudos