VMware Cloud Community
hypercat
Contributor
Contributor

Odd link aggregation problem

I'm having an odd problem with link aggregation between an ESXi 5 host and a 3COM Baseline 2924-SFP Plus switch. This is a standalone ESXi host with a dual-port NIC card. The networking is set up with the management network and the VM Network on the same virtual switch.  I set up the ESXi host for static link aggregation as per the Etherchannel link aggregation article in the knowledgebase: Sample configuration of EtherChannel / Link Aggregation Control Protocol (LACP) with ESXi/ESX and Cisco/HP switches (1004048).

On the 3COM switch, I created a static link aggregation group of two ports.  The parameters on the switch for these ports are: 

STP - enabled

Port Fast - disabled

Root Guard - disabled

Port State - Forwarding

Port Role - Designated

RSTP Link Type - Auto

Duplex Mode - Full

Flow Control - disabled

I'm not a switch maven, so I don't know what all of these parameters mean, but it's set up the same as all of the other ports on the switch.

The issue is that the management network becomes unavailable after a period of time.  I can't connect to the host through any method and can't ping it. However, the virtual machines are operating normally and fully accessible. The other day, we did a test by disconnecting one of the ports from the switch, and the host became available again. Then I removed and recreated the LAG and was once again able to connect to the ESXi host through the VSphere client. I left a command line window open for a couple of hours constantly pinging the host IP address, and never saw a failed ping. However, a few days later I tried connecting to the host through VSphere again and got a connection error, and it won't ping again either.

I'm at a loss as to what's going on.  Any ideas?

Reply
0 Kudos
13 Replies
vmroyale
Immortal
Immortal

Note: Discussion successfully moved from VMware ESXi 5 to VMware vSphere™ vNetwork

Brian Atkinson | vExpert | VMTN Moderator | Author of "VCP5-DCV VMware Certified Professional-Data Center Virtualization on vSphere 5.5 Study Guide: VCP-550" | @vmroyale | http://vmroyale.com
Reply
0 Kudos
rickardnobel
Champion
Champion

hypercat wrote:

I'm at a loss as to what's going on.  Any ideas?

The basic result is that the Link Aggregation is not working as it should and likely is incorrect all the time, but because of the specific ways the traffic is balanced across the links it could look like it is functional for "some" traffic and not for other.

Before actually starting any troubleshooting on the setup I would like to ask for the reasons of using the IP Hash load balancing option, in the sense that if you have some specific need for the small advantage this gives?

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
hypercat
Contributor
Contributor

I don't understand the point of your question, but the answer is pretty much I'll take any advantage I can get. Perhaps you can explain in more detail what you mean by it being a "small advantage." It's entirely possible that I don't fully understand the pros and cons of link aggregation as related to VMWare.

Reply
0 Kudos
rickardnobel
Champion
Champion

hypercat wrote:

Perhaps you can explain in more detail what you mean by it being a "small advantage."

It is a quite small advantage over the default policy called Port Id.

With the Port Id NIC teaming policy you will get a decent load balancing over the VMs, you will get fault tolerance, you have the possibility to connect to two physical switches for increased network redundancy and you need no specific Link Aggregation setup at the pSwitches.

For IP Hash the only advantage is that a single VM could, if having multiple connections with different outside IP hosts, use both vmnics (physical nic ports) at the same time, where on Port ID a single VM only uses a single vmnic. The disadvantage is that it does not allow standard based connections to more than one physical switch and it is critical that both vSwitch and physical switch are correctly configured as a static LAG.

In your case it does seems like that physical switch is not setup the way it should to work with vSphere IP Hash, making this strange connection losses happens. From that my first question was if you does in fact really need the IP Hash or if going back to Port Id would be a good enough setup?

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
Josh26
Virtuoso
Virtuoso

Being a 3Com, it's probably a switch that predates a lot of the implementation guides out there.

TO make this easier, can you post a screenshot of your VMware vSwitch configuration, and a copy of the config on the switch relating to the team?

Reply
0 Kudos
hypercat
Contributor
Contributor

I can't connect to this host right now, but IIRC, there is no option to route based on virtual port ID.  Is it possibly because this is ESXi 5.0, not 5.1? Or because this is a standalone server without VCenter?

Reply
0 Kudos
rickardnobel
Champion
Champion

hypercat wrote:

I can't connect to this host right now, but IIRC, there is no option to route based on virtual port ID.  Is it possibly because this is ESXi 5.0, not 5.1? Or because this is a standalone server without VCenter?

The Port ID load balancing option is available in both 5.0 and standalone version, it is actually the default setting.

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
hypercat
Contributor
Contributor

I looked at another ESXi 5.x host where I'm also having similar aggregation issues.  I see that it does indeed have the option you mentioned. However, following the article I used about static link aggregation, it says to use the Route based on IP Hash option. It even says:  "Note: The only load balancing option for vSwitches or vDistributed Switches that can be used with EtherChannel is IP HASH."  Is this just an old article, or perhaps inaccurate for 3COM switches (as it refers specifically only to Cisco and HP switches). And if so, is there a newer/better tech article I can use to understand the options when setting up link aggregation for ESXi 5.x?

Reply
0 Kudos
rickardnobel
Champion
Champion

hypercat wrote:

However, following the article I used about static link aggregation, it says to use the Route based on IP Hash option. It even says:  "Note: The only load balancing option for vSwitches or vDistributed Switches that can be used with EtherChannel is IP HASH." 

The text is actually correct since it only relates to (static) Etherchannel on the physical switch, where you must use IP Hash.

However, if just make sure the VLAN tagging is correct on the switch ports you could just keep them without any Link Aggregation and use Port ID with the advantages above.

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
hypercat
Contributor
Contributor

I think my stubborn brain is finally understanding what you're saying.  I guess maybe my problem is that I'm too accustomed to thinking about hardware aggregation where both ends of the aggregation have to be set up exactly the same way.  So, if I am in fact getting it finally, what you're saying is to set up the virtual switch to do link aggregation with routing based on the Port ID, and then the hardware switch doesn't have any aggregation at all.  So, the virtual switch handles the routing and the physical switch just has two independent ports hooked up to the NICs on the ESXi host. Right?

Does this mean that I need to have two virtual NICs on the virtual machine and set them up for some sort of aggregation in order for that machine to benefit from the link aggregation, or is that a moot point also? I've always assumed that any virtual machine only needed one NIC to communicate with the virtual switch and would still get the benefit of any aggregation between the host and the physical network.

Reply
0 Kudos
hypercat
Contributor
Contributor

I did as you suggested, removing the link aggregation on the physical switch and setting the virtual switch to route based on virtual port ID. However, now I seem to be having some browsing and authentication issues.  Browsing seems to be very slow, particularly on Windows XP workstations, or when two or more users are browsing the same set of folders at the same time.  Also, users are getting some messages indicating that they don't have permissions to access resources that they clearly do have permissions to access.  If they close and reopen their computer browsing window, they can get access to those same folders.  Any ideas on these issues?

Reply
0 Kudos
rickardnobel
Champion
Champion

That kind of issues seems very strange by them selfs, but should not really be affected by any still potential error in the switch setup. Error messages like access denied in Windows filesystem should be on a much higher level.

But, just to verify the physical and virtual setup again, you have Port ID on your vSwitch and all portgroups? No active / standby settings or other?

Could you post the configuration from the physical switch on the ports connecting to the ESXi host?

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
hypercat
Contributor
Contributor

Well, it may be unusual to see those symptoms, but what I finally did, because it was getting worse and worse, was to remove one of the physical NICs from the vSwitch. So, now the vSwitch has only one physical NIC allocated to it, and everything is working fine.  However, the reason I wanted to do aggregation in the first place on this system was because this server is utlimately going to be serving out very large image files to a number of different users through a SQL database that runs on a different server.  I want to be sure that the server hosting the image files has very good response time. So, in other words, I want to be able to make this aggregation work.

Here is the configuration on the physical switch for those ports:

Port State: Enabled

Flow Control: Disabled

Speed: Auto (1000M)

Duplex: Auto (Full)

Spanning Tree:

Port:  23

STP: Enable

Port Fast: Disable

Root Guard: Disable
Port State: Forwarding
Port Role:  Designated
Speed: 1000M
Path Cost:  100

Priority: 128

RSTP Link Type:  Auto   
Designated Bridge ID: 32768-00:22:57:f5:91:c0    
Designated Port ID: 128-23    
Designated Cost: 0    
Forward Transitions: 1

There is only one VLAN.

Reply
0 Kudos