Re: Nic teaming not working

m0ther · ‎08-01-2007

Hi,

I'm having problems with nic teaming.

I have an ESX 3.0.1 server with two nics that I am trying team. Each connection is going into a separate physical switch. All the physical connections have been verified. I know the nics are both ok because if I configure them separately as individual service consoles they both work.

It's vSwitch0, with two physical adapters, vmnic1 and vmnic0, and has a vm network and a service console. I can only ping the service console when the team is configured as failover, any other config causes me to lose the virtual center connection the ESX host.

I have a linux machine (running vmware server) on the same network, in the same setup with bonded interfaces and it's working as expected.

Is there anything simple I could be missing as far as ESX config? I mean, this is supposed to be easy!

Thanks,

Curtis.

depping · ‎08-01-2007

what type of load balancing did you setup?

m0ther · ‎08-01-2007

I've tried all of them.

marvinb · ‎08-01-2007

If you are using LACP with IP Hash you have to have both connections to the same switch.

m0ther · ‎08-01-2007

Is that documented somewhere?

marvinb · ‎08-01-2007

I am pulling this from memory; but LACP requires that you be on the same switch. Its a switch side issue, not a vmware issue. The support for this set up is light, and expertise with the first level engineering is weak.

If you email me; i'll try to pull up what we did.

marvinb · ‎08-01-2007

We gave up on using LACP on our hp switches and went to trunking the ports. We set up the load balancing as IP Hash. This provides both failover and load balancing at the system side, but since the trunking for our switches requires that we be on the same switch, that is how its set up.

hhandersson · ‎08-01-2007

As far as I know it's correct that LACP requires all nics on the same switch. I've configured LACP on our HP Switch and had problems with VMs migrating from one host to another. We lost connectivity from the clients to some ports not running on the ESX's.

Opened a case with HP and they told me, that LACP is an active - passive configuration. ESX is not able to be passive so you need to configure nic teaming on the ESX servers and nothing on the switches. I've configured IP Hash based teaming and would have expected the switch to say it's a dynamic LACP trunk. But this is not happening.

I can tell you I'm confused.

Rumple · ‎08-01-2007

As far as I know this is how it works...

You basically don't touch anything on the switch side. plug in both nic's, assign them to the same Vswitch and you are "teaming" in a sense. Really its failover, not teaming...

In esx 2.5 the VM would pick which nic to use and stay on that nic until failure time then they needed to be rebooted to use the other pnic

In ESX 3 I think you define active and standby nic's within the portgroup to allow for the failover.

If you are connected to the same switch then you can do the trunking,etc and do true teaming /load balancing with the nic's with only about a single packet drop at failover time.

You cannot trunk across physical Switches (although i heard Cisco is coming out with the idea of Virtual Switches within the IOS so you can trick multiple switches into behaving like they are in the same chassis so you have full capabilities).

m0ther · ‎08-01-2007

It's funny because it works fine with bonding in Linux. In this config it's not redundant unless the whole switch dies, but I expected ESX to be able to do it. Guess not.

bggb29 · ‎08-01-2007

esx cannot run a dynamic lacp. It does not negotiate the link that way

m0ther · ‎08-02-2007

So, any of the teaming options, except for failover maybe, can't be done on separate physical switches?

vm2i · ‎08-03-2007

We have a similar network setup:

ESX server has connections to 2 physical switches (CISCO 4948) connected via etherchannel. Team is setup with MAC Hash and seems to load balance ok on outbound. It handles a switch failure fine and re-routes to the second path/switch - however when we fail / disable the etherchannel (between physical switches) we can no longer see anything connected to the other physical switch.

Any ideas on this scenario?

m0ther · ‎08-03-2007

That makes sense though doesn't it? If etherchannel (between two physical switches?) is turned off it doesn't work.

How about this post from a while ago:

http://www.vmware.com/community/message.jspa?messageID=381277

killjoy1 · ‎08-03-2007

We use 4948's as well. Even though ESX is supposedly LACP compatible when using mode active it doesn't work. We use an etherchannel on the ciscos and do mode on, then config the esx boxes to ip_hash. This works better for us than just using the esx teaming since its not really lacp. As far as having the nics as part of two groups on different switches we have experienced host flapping when trying that.

Monoman · ‎08-03-2007

I believe load balancing requires you to plug the physical NICs into the same physical switches. However, believe I saw on the forums there is an exception. Cisco Catalyst 3750 switches can be connected together with the stackwise cables and act as one switch.

vm2i · ‎08-04-2007

"That makes sense though doesn't it? If etherchannel (between two physical switches?) is turned off it doesn't work."

Well that's the bit that I may be misunderstanding - Each ESX server has connectivity to both physical switches (load balanced team - mac hash) and additionally there is an etherchannel link between the two switches. Note - there is no etherchannel between the ESX servers and the switch. So when the etherchannel link is taken down, each esx server still has a physical connection to each other and I was assuming (maybe wrongly ?) that new routes would get defined over these connections. This however doesn't seem to be the case, but I'm still unclear as to whether this is how it should be.

bggb29 · ‎08-05-2007

When you connect a vswitch with 2 pnics to seperate pswitches ( non cisco 3750's with stackwise cable) the mac addresses of your guests are seen on the switches this is the problem each pswitch then believes the mac address is associated with one of it's ports so it will not forward those packets to another device. When it checks it mac tables.

The 3750 shares the mac tables when stacked

m0ther · ‎08-09-2007

OK, now our network guys are telling me that stacking the switches turns them into one big physical switch and if one goes down then the other one does too.

bggb29 · ‎08-09-2007

That is not correct. I have had 3750's reboot and the remaiinder of the stack keeps running. And if you build a etherchannel back to your core you can loose one switch and keep your connection to the core. Providing that you split the etherchannel across a minimum of 2 switches. We have had 5 switches in a stack and fibre connections split across switches.

It does turn them into 1 big logical switch for ease of management by 1 ip address and 1 snmp monitor.

As long as you use 2 stackwise cables and correctly connect the stackwise cables you will have redundancy.

I have powered down a single switch to verify my connectivy in the past.