Soon we will be replacing the hosts in our small, fully collapsed, stretched vsan cluster that runs nsx-t. The existing networking hardware will not be updated at this time.
The new hosts will have 2 or 4 25Gb connections compared to the 2 10Gb connections that the old hosts had. Since we would like vsan to be able to use more bandwidth than a single link can provide we were planning to continue using LACP on the esxi hosts.
We are considering moving our edge gateways from virtual machines to bare metal. These would have 2 100Gb connections. With these systems we have the option to use LACP or not. My question is: What are the pros and cons of using LACP vs not? Here's what I've come up with so far:
Using LACP requires additional configuration and is more error prone: This seems to be a general community opinion but it's never been an issue for us and it isn't a long term concern.
Without LACP, to get redundancy and use the bandwidth of all ports, the edge should have multiple TEPS and transit networks. As with the last item, it's more work and potential for errors, but not a long term concern.
With LACP, network flows are balanced across the links. Without it, network devices (mac addresses) are balanced. Since the network flows to and from the hosts are limited by their lower link speeds and we have no plans to implement edge services that need more bandwidth than a single link can supply, this difference does not matter.
We are using Aruba 8325 switches configured with VSX multi-chassis lags. It is my understanding that these devices have optimizations that prevent forwarding of traffic from one switch to the other over the inter-switch link, unless it's required to reach it's destination. Without lacp, some traffic will flow through both switches, increasing latency and switch load.
Are there other issues for me to consider when designing the configuration for our edge gateways? For example, I have no idea if LACP impacts the ability of NSX to offload functionality to the NIC.
Thanks,
Erik