VMware Networking Community
RussBurden
Contributor
Contributor

NSX Connectivity issues for VXLANs

Oh where to begin,

I have a lab setup that consists of :

          3xDell R220 Servers

          1xSynology NAS

          1xCisco SG300-20 L3 Switch

I have ESX 6.0 installed on the hosts with VCSA 6.0 as well.  NSX 6.1 is installed and was working on a couple of occasions.  What happens is, we have power outages here (mostly very short brown outs) and the first time all three nodes went down, one node went completely belly up and I ended up needing to remove the node from the cluster, reset the networking on the node, and then re-adding it back to the cluster.

I have since put a UPS in to keep it up..... mostly.  I had an extended power outage that lasted longer than the UPS, so therefore my systems went out again.  Everything comes back up, and after further troubleshooting, it seemed that all regular dVS work fine, but any VM connected to a VXLAN did not work at all.  After further troubleshooting, I have narrowed it down to one of the three hosts, I could VMotion a system off of that host to another host and it would begin working, vmotion it back, and it stops working.

I have tried removing the host from the cluster, rebooting, and reading, rebooting again to re-install the VIBs, I have also attempted manually removing the VIBs and rebooting to get them re-installed again.

I have seen on the controller that it is not seeing that particular node at all (as connected)

htb-1n-eng-dhcp10 # show control-cluster logical-switches connection-table 5001

Host-IP         Port  ID

10.1.2.5        13827 2

10.1.2.7        30655 3

The node that is not working is 10.1.2.6.

Now, I have tried resetting the controller, and now ALL nodes are not working for VXLANs, so I am reaching the end of what I know to troubleshoot without attempting a removal of NSX and re-doing the environment (I would really rather not, I want to figure this out in case I run into it again).

Does anyone have any suggestions or may have seen/ran into this before?

Tags (3)
0 Kudos
4 Replies
lvschie
Enthusiast
Enthusiast

There is a really nasty bug in NSX 6.1 where there is a race condition on boot between the VIBs and vDS configuration to be pushed to the ESX node in the right order (see here)


The easiest way to verify this is to SSH to the node and check /var/run/log/vmkernel.log for "Would block" errors.

If you see them, the only thing to do is restart the ESX node and check again (I've had occasions where more than 3 reboots were required).

From there verify all your ESX nodes are working (check with vmkping ++netstack=vxlan) and then restore controller functionality (redeploy them if necessary).

0 Kudos
RussBurden
Contributor
Contributor

I appreciate the response, but I am not seeing the Would block errors in the vmkernel.log file.  I have gone through the suggestions in the KB though and yes, when I first checked with esxcli, the vxlan option was actually not present until I performed a reboot.  The option is present and does show data now, but the controller still does not show a connection to the host and if I VMotion a guest to that host, it has absolutely no connectivity if it's network is a VXLAN, which is the same results as I had before.

It looks like the VTEP is not getting initialized or something...

[root@ZESX-02:/var/log] esxcli network vswitch dvs vmware vxlan list

VDS ID                                           VDS Name          MTU  Segment ID  Gateway IP  Gateway MAC        Network Count  Vmknic Count

-----------------------------------------------  ---------------  ----  ----------  ----------  -----------------  -------------  ------------

2f 1b 2d 50 78 2e e6 0b-0e f3 6d ef e8 13 1b 10  LabNetwork-Data  1600  10.1.210.0  10.1.210.1  bc:c4:93:ee:d0:95              1             4

[root@ZESX-02:/var/log]  esxcli network vswitch dvs vmware vxlan network vtep list --vds-name=LabNetwork-Data --vxlan-id=5000

Unable to find VTEP with specified parameter

0 Kudos
VCDX159
VMware Employee
VMware Employee

Can you be more specific about the NSX 6.1 version? NSX 6.1.x?

0 Kudos
RussBurden
Contributor
Contributor

I am running NSX 6.1.4.2691049

on VCSA 6.0.0.2997665

and ESX 6.0.0.2809209

0 Kudos