mcguirej
Contributor
Contributor

Unable to bond Etherchannel on ESXi 5.5

I couldn't seem to get an etherchannel working properly on a ESXi5.5 host, I'm not using vcenter and was attempting to aggregate the links to a Cisco 3750.

I referred to the kb article below, I configured IP hash on the ESXi host, set SRC-DST-IP aggregation algorithm on the switch, and disabled LACP, however when I disabled LACP per the guide, I lost connectivity altogether to the ESXi host, although all of the ports in the etherchannel showed '(P)' and actually correctly established a bundle.

If I re-configured the etherchannel to enable LACP, I could ping and connect to the host, however all of the ports in the etherchannel are showing '(I)' which means operating independently.

So I presume its the host configuration rather than the switch configuration which is incorrect but can anyone advise?

KB Article: VMware KB: Sample configuration of EtherChannel / Link Aggregation Control Protocol (LACP) with ESXi...

Switch config:

Cisco IOS Software, C3750 Software (C3750-IPBASEK9-M), Version 12.2(50)SE1, RELEASE SOFTWARE (fc2)

CM_CORE#sh run int po1

Building configuration...

Current configuration : 122 bytes

!

interface Port-channel1

description PG1 to IBM x3650

switchport trunk encapsulation dot1q

switchport mode trunk

end

CM_CORE#sh run int fa1/0/1

Building configuration...

Current configuration : 125 bytes

!

interface FastEthernet1/0/1

switchport trunk encapsulation dot1q

switchport mode trunk

channel-group 1 mode active

end

Group  Port-channel  Protocol    Ports

------+-------------+-----------+-----------------------------------------------

1      Po1(SD)         LACP      Fa1/0/1(I)  Fa1/0/2(I)  Fa1/0/3(I)

                                 Fa1/0/4(I)

0 Kudos
14 Replies
a_p_
Leadership
Leadership

Does it work with

channel-group 1 mode on (rather than active, see KB article)

André

0 Kudos
mcguirej
Contributor
Contributor

Hi there,

Thanks for the reply, I have actually tried setting it to on, and it doesn't seem to make a difference I'm afraid.  I'll be back on-site this weekend so I'll see if I can gather some more information.  Is there anything in particular I should collect from the host / switch.

Note at present although I'm not noticing any particular connectivity issues, the switch is reporting that the ports in the etherchannel are flapping.

0 Kudos
a_p_
Leadership
Leadership

There's basically not much more than configuring the physical switch, setting the policy to IP-Hash and to make sure all vmnics are active. Anyway, what's the reason you are channeling the ports? IMO there's no real benefit in doing this except for a very specific configurations, where outgoing traffic from VMs goes to a large number of targets.

André

0 Kudos
mcguirej
Contributor
Contributor

Hi,

Unfortunately the customer's core switch is 100Mb/s, and the host is directly attached to this running multiple servers.  Therefore if the bond is capable of delivering up to 4 individual 100Mb/s sessions the performance should be acceptable. In addition there is always the resiliency aspect to consider.

Since on this occasion I am only using their free hypervisor product, is there any way I can raise this as a potential bug with VMWare?

Is there perhaps an interoperability matrix so I can verify the firmware of the switch to be compatible?

0 Kudos
a_p_
Leadership
Leadership

It should be possible to open a per incident case with VMware. However, before doing this I would consider other solutions. I'm no dedicated network guy, but IMO the bond doesn't really have an advantage over multiple individual trunk ports, used with the default round-robin assignment (Route based on the originating virtual switch port ID). Using the default settings will distribute the VMs across the uplinks and thus kind of load balance the traffic.

Another option you may consider is to get a Gigabit switch, connect the host to this switch (again, using the default settings) and create an active LACP/PAgP bond between the physical switches.

André

0 Kudos
bayupw
Leadership
Leadership

Hi

I had this issue because the IP Hash load balancing configuration is not propagated to the Management PortGroup.

Try to follow the procedures described in this KB, it works in my case

VMware KB:    NIC teaming using EtherChannel leads to intermittent network connectivity in ESXi

Note: To reduce the chance of network connection loss please change the vSwitch load balancing to Route based on IP hash using this method

  1. Shut down all ports in the team from the physical switch leaving a single port as active. - (Management PortGroup e.g. Fa1/0/1)
  2. Change the load balancing to Route based on ip hash on the vSwitch and Management Portgroup.
  3. Configure the port channel on the physical switch. - (Fa1/0/1 - Fa1/0/4)
  4. Enable the ports on the physical switch. - (no shut Fa1/0/2 - Fa1/0/4)

Thanks,

Bayu

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
KBRGeek
Contributor
Contributor

Are you using standard switches or dvSwitches and have you enabled LACP via the web client?

0 Kudos
chriswahl
Virtuoso
Virtuoso

The KB article you reference is quite old. LACP is configured rather differently in vSphere 5.5. The "IP Hash" thing is from the days of static Etherchannels. Smiley Happy

First, make sure you have enabled the LACP feature on the VDS uplink port group which allows the VDS to respond to LACPDUs. Example below:

uplink-pgroup-lacp-enabled.png

For more details, check out this post I wrote on LACP in 5.5 for details and a video: Exploring Enhanced LACP Support with vSphere 5.5 | Wahl Network

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators
0 Kudos
mcguirej
Contributor
Contributor

Hi Chris,

Thanks for your input but I'm unable to access the web interface due to using only the free edition of ESXi for this particular installation.  Is there a means of accessing this setting via the VI Client instead?  Thank you.

0 Kudos
chriswahl
Virtuoso
Virtuoso

Using LACP requires a vSphere Distributed Switch (VDS) and Enterprise Plus licensing. The feature is also only available from the vSphere Web Client.

If you are using the free edition (and are thus limited to the Standard Switch) your only choice is a static port channel (Cisco IOS "mode on").

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators
0 Kudos
KBRGeek
Contributor
Contributor

The only GUI where you can make this change is the web but the alternate option is to perform it via the CLI.

VMware vSphere 5.1

0 Kudos
chriswahl
Virtuoso
Virtuoso

Just to be clear:

While ESXCLI will provide many "list" type of commands for the VDS, it is an ESXi host CLI; as such, it is unable to configure the required settings for LACP on a VDS due to the necessity of vCenter (control plane).

If the OP wants to use a static LAG, either ESXCLI or the vSphere Client should suffice.

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators
0 Kudos
mcguirej
Contributor
Contributor

So in summary, the information contained in the first KB is actually valid and is the only method of enabling an etherchannel on 5.5 without vcenter. Chances are the problem is caused due to the KB article which Bayu referenced.

0 Kudos
chriswahl
Virtuoso
Virtuoso

Yup. Every vSwitch port group connected to the port channel must be in "IP Hash" for a static LAG (Etherchannel). Otherwise the MAC address may end up flapping as the host will pick one uplink and the upstream switch may pick another, causing traffic to drop.

My comments are only in reference to LACP based on your earlier trial and error. Smiley Happy

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators