The Ciscos these ESX servers are connected to are configured with "src-mac" load balancing on the Etherchannels. We can't change this because of how it will affect other services so we need to work with it.
Of the 3 load balancing algorithms available on the virtual switches, only one seems to work reliably when Etherchannels are enabled, that being the IP hash mechanism. In a way, this is good because it's arguably the best of the 3 choices. When "port id" and "source mac" are used, we can either see nothing on the network or just bits of it (i.e. some hosts but not others).
However, is "IP hash" has supposed to work with "src-mac" on the physical switches? i.e. is this combination unsupported, or is it fine to use it?
This should work just fine. AFAIK the load balancing algorithms used on either side of an EtherChannel don't need to match - the algorithm set only controls frames outbound from the device it's set on; it doesn't have any impact on the way the device expects them back. You may encounter references on the forums to setting the vSwitch to use src-mac, but I believe these were only valid for ESX 2.x.
According to the ESX 3.5 Server Configuration Guide (page 56), the only load balancing option that supports EtherChannel is "Route based on IP hash". In other words, if you use mac-hash or port-id on the ESX side, ESX will not expect the frames for a single VM to arrive on multiple uplinks and will drop some of them.
Also see this excellent whitepaper: http://www.vmware.com/files/pdf/virtual_networking_concepts.pdf
It's been my (hard-learned) experience that IP-Hash needs to be set on both sides. When I had it on one side but not the other, I definitely had random and unpredictable drops.
Edit - My post refers specifically to 3.5
Message was edited by: ExCon
Thanks for the correction. I've never had to explicitly set the switch-side load balancing algorithm before on the mix of Cisco, Nortel and HP switches I've done this on to get it to work, hence my assumption.
Chris: It would be interesting if you could confirm whether you have problems with src-mac on the Cisco side. Maybe this is a CatOS/IOS revision-specific thing?
If you do have issues, the alternative is to use Source Port-based load balancing. This is usually fine; it spreads the VMs over available physical adapters. This is almost (but not quite) the equivalent of using an src-mac Etherchannel link anyway...
Suggestion: Remove the Etherchannels and use PortID.
IP_Hash with src_mac is not supported by cisco and doubt its supported by vmware.
Usually when I see setups with where the LACP configurations are not matching then IP addresses just disappear and are unreach able. So I would have expected in your case that with src_mac that ip_hash would result in missing networks.
However to make it simpler PortID does not require etherchannels or any form of lacp but will still load balance based on a simple vm - portid - pnic.
I'm using port id with no Etherchannels at the moment, we won't really notice the difference between that and ip hash with Etherchannels as the network throughput won't be particularly high on these hosts.
We might get switches just for the ESX servers at some point, in which case I will revisit as we will be able to adjust the settings there too.