VMware Networking Community
Erik_Horn
Contributor
Contributor

Bug/Issue with vxlan and lacp

I have encountered an issue with vxlan and lacp and I’m wondering if anybody else has seen or can replicate this issue.

The issue appears to be that the vxlan network stack does not monitor the lacp port status. If one of the ports goes down, the outbound network traffic does not get rerouted to an operating link. In my case, it affected arp, which causes all communications to fail. The other non-vxlan port groups in the lacp trunk do not exhibit this problem.

Steps to produce the problem (in general):

  1. Configured VDS with lacp with 3 trunked ports.
  2. Migrate management, vsan, and vmotion port groups into lacp trunk.
  3. Configure cluster for vxlan using the vds.
  4. Confirm lacp link status with “esxcli network vswitch dvs vmware lacp status get”
  5. Confirm proper operation of each port group (ping something from the esxi host)
  6. Unplug one of the lacp ports
  7. Repeat 4,5
  8. Plug the port back in
  9. Repeat 6-8 for each of the remaining ports in the trunk group.

When a lacp port goes down, I expect a momentary loss of communications as long as at least one port remains up. What I'm seeing is a permanent loss of communications until the affected port comes back online.

Software:

NSX 6.3.3

vCenter, ESXi 6.5U1

Network Hardware:

Brocade VDX6740

Intel X710, Intel 82599EB, Mellonex ConnectX-4Lx (each tested separately)

Thanks,

Erik

Reply
0 Kudos
1 Reply
Erik_Horn
Contributor
Contributor

The nice people at support had a quick answer for my problem.

In my attempt to plan for the future, on the server side, I had four network ports configured for LACP. The network switch was configured for three ports and three cables were plugged in.

I other places where I've done this, I've never had a problem. It seems that vxlan does not handle the lacp port count being different on the server vs the switch.

Thanks,

Erik