A quick question about link aggregation. How do we PXE boot and build an ESXi server that is connected to the network via an aggregate link?
We have ESXi 4.0 servers connected via a pair of 10GE NICs to Cisco Nexus switch infrastructure. We want to run link aggregation between the servers and the network, so we configure "Route based on IP hash" on the ESX host and the Cisco virtual Port-Channel (vPC) on the Nexus switches. The Cisco switches are configured for "static" link aggregation as the ESX servers do not support the Link Aggregation Control Protocol (LACP) i.e., dynamic link aggregation. The use of static link aggregation means that as soon as "link" is seen on the Nexus switch ports, the link is inserted into the port-channel.
This works fine and we can send traffic and receive traffic for a single VM across both VMNICs.
Now the server team want to update the ESX server. They reboot the host and select to PXE boot the server, a process which only uses one the two 10GE NICs i.e., the first physical NIC on the host. The second interface is operational i.e., "link" is established between the host and the Nexus switch, but the PXE client on the host does nothing with the NIC.
All traffic from the server is received by the Nexus switch on the physical port that connects to the first physical NIC, but the MAC associated with that NIC will be learnt by the Nexus via its port-channel interface. When the switch sends traffic to the server it is just as likely to utilise the physical port that connects to the servers second physical NIC, which during the PXE boot / build process, is not operational. This traffic will be dropped and the build process will fail.
Cisco have added functionality in IOS and NX-OS to suspend an individual link of a port-channel if LACP PDU are not received on that link, such that the remaining link acts as if it were a single switch port. This is a useful feature.... apart from the point above that ESX does not support LACP.
So this raises two questions for me:
1. Does this mean that, if we utilise link aggregation between the ESX host and the Nexus switch, that every time the server team want to update the ESX build version, they have to first contact the network team to have them disable vPC?
2. When will ESX support LACP i.e., dynamic Link Aggregation?
Thanks in advance.
While VMware doesn't support LACP natively, you can opt to use LACP in conjunction with a distributed switch and Nexus 1000V appliance. This is assuming Enterprise Plus licensing.
What is your goal for using etherchannel on a pair of 10GbE uplinks? The "route by IP hash" teaming policy will still only use a single uplink.
The Nexus 1000V is an option, but as you point out it requires licensing, and with that comes a cost that the client would rather avoid.
The rationale around using link aggregation is to acheive optimal load balancing. Your point that "route by IP hash teaming policy will still only use a single uplink" is true for a single session e.g., IP A to IP B can only use a single uplink, but different sessions from a single VM can be balanced to different uplinks e.g., IP A to IP B uses uplink 1 and IP A to IP C uses uplink 2. The KB article at http://kb.vmware.com/kb/1007371 explains how the hash is calculated and the uplink selected.
In the presence of a large number of VMs, good load balancing can be acheived with "Route based on originating virtual port ID”, but when there are only a few VMs with large volume transfers, the load balancing might not be so good.
I really would refrain from using link aggregation on the physical uplinks.
To achieve an optimal load balancing you can also use the VMware distributed switch with its default "load based Teaming" algorithm.
Ok, I get what you're driving at.
Unfortunately, neither routing by virtual port ID nor setting up etherchannel accomplishes active load balancing. The uplinks can still easily become saturated regardless of the number of VMs if an uplink is picked too often by either algorithm, or if a single VM's load is able to max out throughput on a link - these team policies won't balance the load to accomidate.
The only way to achieve active load balancing are with LACP (as we discussed earlier) and "load based team" which peetz points out above (new feature in vSphere 4.1 using a dvSwitch).
I was curious as I normally see etherchannel used with IP storage, not with regular VM traffic (or Management / vMotion). Here's a writeup I've done on using etherchannel with NFS, for example.
Thanks for the answers guys. Some food for thought there.
Interesting you'd avoid link aggregation. It's a technology we've used in the network space for many years (I'm a network engineer in case you hadn't worked it out already). It's pretty simple to setup, has fast recovery times, it's an open standard and with modern switches able to balance based on L4 (TCP/UDP) information, gives very good balancing of traffic across all links of the aggregate.
In terms of the load based teaming I'll talk with the server team to see if we might add this as a roadmap item. We can't use LBT currently as we're still on ESXi 4.0, but there's also the fact that vDS, as with the Nexus 1000V, comes at a cost which we'd rather avoid.
Perhaps LBT and some of the other features e.g., Network I/O Control, in conjunction with the scaling features meaning they need less vDS might make it a more attractive proposition.
So going back to the original question, I think the following summarises it:
- There's no simple way to overcome the issue I'd identified building via PXE boot when using link aggregation.
- There's no commitment from VMware to incorporate LACP into ESXi at this stage.