VMware Cloud Community
johnnybox78
Contributor
Contributor

ESXi 4.1 problem with Cisco Ether Channel and LACP

Hi to everybody, this is my first post and I apologize in advance if I've made some kind of mistakes about section or other (and for my awfulenglish).

My issue is about ESXi 4.1, as mentioned in the subject I try to configure a link aggregation between a HP Proliant DL380 G7 and a Cisco Catalyst 3750G.

I've already done similar configuration with others server (HP, IBM, Fujitsu and DELL) without problems, but in this case when I enable the second port of my link aggregation I completly lost the connection. I've tried both LACP and Cisco Etherchannel but it seems that the virtual switch doesn't try to negotiate the algorithm.

After 4 hours I've given up and I've installed ESX 4.1 as in all my other installation and magically everything works.

The configuration it's easy, 6 vnic (some onboard and some in a pci express card) in a unique vSwitch as "Route based on IP hash" and this conf at Cisco side:

interface Port-channel11
description ESX01 - AGGREGATION
switchport trunk encapsulation dot1q
switchport mode trunk
interface GigabitEthernet5/0/1
description ESX01 - VNIC0
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 11 mode on
!
interface GigabitEthernet5/0/2
description ESX01 - VNIC1
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 11 mode on
!
interface GigabitEthernet5/0/3
description ESX01 - VNIC2
switchport trunk encapsulation dot1q
switchport mode trunk
interface GigabitEthernet5/0/1
description ESX01 - VNIC0 - RACCORDO 1
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 11 mode on
!
interface GigabitEthernet5/0/2
description ESX01 - VNIC1 - RACCORDO 2
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 11 mode on
!
interface GigabitEthernet5/0/3
description ESX01 - VNIC2 - RACCORDO 3
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 11 mode on
etc etc...
The problem is the same if I use "channel-group 11 mode active" for LACP or "channel-group 11 mode auto", and obviously it' still present if I try without trunking.
Everything is the same both with ESXi and ESX but the firstone doesn't work! Any ideas?
Thanks in advance
Johnny
I wish I could be a Virtual Machine!
0 Kudos
7 Replies
v1rtual
Contributor
Contributor

Unless you are using Cisco NEXUS vSwitches LACP and  PAGP are not available features on your vswitches - so any aggregated links need to be configured in the static mode on the Cisco switches.

On a standard vSwitch the ONLY supported method for link aggregation (802.3ad) is IP hash based - this needs to be the same on the Cisco end but the default on a Cisco switch is src-mac.  This is a global Cisco command and will affect ALL your etherchannels on the Cisco switch so change with caution!  Just so as to be clear the Source MAC based algorithm on the vSwitch is not 802.3ad compliant so don't try and use that setting either.

port-channel load-balance {dst-ip | dst-mac | src-dst-ip | src-dst-mac | src-ip | src-mac}

Also best practice for configuring etherchannels on a cisco switch is to configure all settings at the port-channel level as they will propogate to all aggregated ports (including shutdown and no shutdown) - if you have IP hash aggregation configured already as the global setting on the Cisco switch then try shut then no shut at the port-channel level.

I came across similar problems when we were first configuring new environments and because we were unable to change the Cisco switch global command (due to multiple other uplinks to other switches) we settled for multiple individual trunks from the Cisco switch to the hosts.  We still load balance at the vSwitch level using port based and this loads the trunks fine in our experience although our maximum bandwidth on any give traffic flow is of course limited to 1Gb - but this would be a restriction in an IP hash based implementation anyway.

Alternatively upgrade to NEXUS!

From the attached vmware networking concepts doc (NIC Teaming Section):

Route based on IP hash

— Choose an uplink based on a hash of the source and destination IP addresses of each packet. (For non-IP packets, whatever is at those offsets is used to compute the hash.)

Evenness of traffic distribution depends on the number of TCP/IP sessions to unique destinations. There is no benefit for bulk transfer between a single pair of hosts.

You can use link aggregation — grouping multiple physical adapters to create a fast network pipe for a single virtual adapter in a virtual machine.

When you configure the system to use link aggregation, packet reflections are prevented because aggregated ports do not retransmit broadcast or multicast traffic.

The physical switch sees the client MAC address on multiple ports. There is no way to predict which physical Ethernet adapter will receive inbound traffic.

All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. (Contact your switch vendor to find out whether 802.3ad teaming is supported across multiple stacked chassis.) That switch or set of stacked switches must be 80

2.3ad-compliant and configured to use that link-aggregation standard in static mode (that is, with no LACP). All adapters must be active. You should make the setting on the virtual switch and ensure that it is inherited by all port groups within that virtual switch.

johnnybox78
Contributor
Contributor

Thank you v1rtual! I've already known some of that things about the "port-channel load-balance" command, indeed I've set it up to src-dst-ip.

The strange thing is that with the same configurationn ESX works fine with the port-channel team, instead it doesn't work with ESXi.

Are there some difference in the way ESX and ESXi manage the vSwitch?

Next week I could arrange a lab to test the issue more deeply!

Thank you again! Good we!

I wish I could be a Virtual Machine!
0 Kudos
johnnybox78
Contributor
Contributor

As expected today I've done some other tests with ESXi and ESX on a lab infrastucture.

I've tried the configuration on a IBM xSeries x3650 M2 with 4 NIC linked to a classic Cisco Catalist 3750G with advanced ip services IOS.

The results are the same, both configurations on ESXi and ESX  had all NICs joined in a vSwitch as "Route based on IP hash". On cisco side I've set up a Port Channel interface with 4 ports (mode on, mode auto and mode active), trying different load-balance settings.

Only with ESX the channel group works properly, with the pagp and even with the LACP (I've checked a configuration on an other custer where I had a HP Procurve 2900 switch, everything is ok without NEXUS). With ESXi it seems that you apply the IP hash alg but it doesn't work. I've enabled some debugs on Cisco side but nothing appends, when I turn on ever single port they come up, join the port-channel but I completly lost the connection to the host.

I think the problem is specific of ESXi, maybe the alg doent't work or maybe something miss in the configuration ok ESXi.

I've run out all the ideas! Does anyone have better clues?

TNX

Johnny

I wish I could be a Virtual Machine!
0 Kudos
5thfishie
Contributor
Contributor

Below is an example of a working config I have to an ESXi 4.1 Host on a 3650G.  Also ESX does not support DTP dynamic trunking protocol. When configuring a trunk port, set it to nonegotiate.  Hope that helps.

Interface Port-channel30

switchport

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 2-220,300-350

switchport mode trunk

switchport nonegotiate

spanning-tree portfast trunk

interface GigabitEthernet0/1

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 2-220,300-350

switchport mode trunk

switchport nonegotiate

channel-group 30 mode on

spanning-tree portfast trunk

interface GigabitEthernet0/2

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 2-220,300-350

switchport mode trunk

switchport nonegotiate

channel-group 30 mode on

spanning-tree portfast trunk

0 Kudos
johnnybox78
Contributor
Contributor

Thank you 5thfishie, I kown this type of configuration, because I've used it many times. But this is not a link aggregation, it's the default way wmware virtual switches work (a kind of round robin), not really balanced and  uneffective if you need a VM that needs more than 1 Gb of band (as in my case).

Trunking with tagged VLAN or Untagged port for a specific VLAN isn't significant bacause Link aggregation work at different level. I've tried both with trunk dot1q and access (ex. VLAN1) but the problem is the same, port channel don't work with esxi. Am I the only one who use link aggregation with vmware? :smileygrin:

Thanks to everybody

Johnny

I wish I could be a Virtual Machine!
0 Kudos
jordan57
Enthusiast
Enthusiast

Not sure if you have read this KB article yet but might be worth your time.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102275...

Blog: http://www.virtualizetips.com Twitter = @bsuhr
0 Kudos
LunThrasher
Enthusiast
Enthusiast

Hi mate, I did a blog post on this that may help you out with screen shots

http://www.sysadmintutorials.com/tutorials/vmware-vsphere-4/vcenter4/network-teaming-with-cisco-ethe...

Also make sure the switches are set for "port-channel load-balance src-dst-ip"

Tutorials for System Admins www.sysadmintutorials.com
0 Kudos