VMware Networking Community
bobbyccie
Contributor
Contributor
Jump to solution

Nested NSX Problem - Am I missing something?

Hi all,

I'm looking for some assistance with a nested NSX lab I'm trying to build. I have followed the examples on the net but seem to be hitting a silly problem getting basic VXLAN up and running. I'm sure it's something simple. Any help would be greatly appreciated.

I have a single ESXi 5.5 running...

- vCenter Server Appliance

- Nested ESXi host 1 (compute node 1)

  - Windows Server 1

- Nested ESXi host 2 (compute node 2)

  - Windows Server 2

- Nested ESXi host 3 (management and edge node)

  - NSX Manager

I have kept the base networking simple; the vCenter and nested ESXi hosts have a single vNIC (VM Network) and everything is sitting in 192.168.1.0/24. I then put all three hosts in a single cluster and setup a single Distributed Switch.

I have managed to:

- Register the vCenter with the NSX Manager

- Deployed a single NSX Controller (installed on nested host 3, also on 192.168.1.0/24)

- Prepared the hosts by installing the VIBs

- Created VTEP VMkernel interfaces (again I kept these on the same subnet - 192.168.1.0/24)

- Set Segment ID Pool (5000-5999)

- Created a new Transport Zone (unicast mode)

- Created a new Logical Switch (unicast mode)

- Connected Windows Server vNICs to Logical Switch

Ping tests between VTEP IPs works fine but VM traffic over VXLAN is not working.

Can anyone see anything obviously witch the above? Or could point be in the direction of what to check? I have hit the wall.

Many thanks

Bobby

1 Solution

Accepted Solutions
SpasKaloferov
VMware Employee
VMware Employee
Jump to solution

Keep also in mind the following issue

NSX VXLAN Enable Agent fails on ESXi hosts with error “Cannot complete the operation.”

ESXi host Enable Agent error "Cannot complete the operation." | Spas Kaloferov's ...

BR,

Spas Kaloferov

View solution in original post

13 Replies
rbenhaim
Enthusiast
Enthusiast
Jump to solution

Note: for Nested LAB  the VTEP NIC teaming must be "failover"

You need to check what NSX Controller see in your LAB.

Lets say you connect VM1 and VM2 to VXLAN 5001.

VM1 ip  172.16.10.11

VM2 IP 172.16,10.12

find which controller manage VM1,VM2 VXLAN. (if you have 3 controller). SSH to one of your 3 NSX controller and type:

nvp-controller # show control-cluster logical-switches vni 5001

VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs

5001     192.168.110.201 Enabled         Enabled   6           3

so now we know 192.168.110.201 is the controller that manage VXLAN 5001.

SSH to 192.168.110.201 and type:

nvp-controller # show control-cluster logical-switches  arp-table 5001

VNI      IP              MAC               Connection-ID

5001     172.16.10.11    00:50:56:a6:7a:a2 3

5001     172.16.10.12    00:50:56:a6:a1:e3 4

Does your controller in your lab know the IP/MAC/VXLAN of VM1 and VM2 ?

Richard__R
Enthusiast
Enthusiast
Jump to solution

Although it sounds like you did from the way you phrase your question, just to check - when you say ping tests between VTEPs are working is that with minimum or VXLAN packet size?

0 Kudos
SpasKaloferov
VMware Employee
VMware Employee
Jump to solution

Keep also in mind the following issue

NSX VXLAN Enable Agent fails on ESXi hosts with error “Cannot complete the operation.”

ESXi host Enable Agent error "Cannot complete the operation." | Spas Kaloferov's ...

BR,

Spas Kaloferov

bobbyccie
Contributor
Contributor
Jump to solution

Hi all,

Many thanks for the responses.

It turned out to be the agents not starting on the hosts as the NSX manager was not available during boot time. After restarting the hosts things started working. It seems the boot order of the nested hosts is important.

Thanks also for the useful controller verification commands. I can now see the VM MACs.

nsx-controller # show control-cluster logical-switches arp-table 5000

VNI      IP              MAC               Connection-ID

5000     172.16.10.1     00:0c:29:fe:3c:20 1

5000     172.16.10.2     00:0c:29:6a:ff:4b 2

Regards,

Bobby

SpasKaloferov
VMware Employee
VMware Employee
Jump to solution

HI ,

The boot order is not so important. If fails cause the hosts need access to the NSX Manager when they boot while they reinitialize the NSX Agent. So if you are rebooting hosts make sure the NSX Manager VM is always available. Might wanna reboot in groups and migrate the NSX Manager VM so that it is always on. If not you might use the workaround i've pointed in the article above. Same will happen if you remove a host from the clsuter or add new host to the cluster and the NSX Manager is not accessible.

BR,
Spas Kaloferov

Javel1n
Enthusiast
Enthusiast
Jump to solution

Hello All

I also built a Nested Lab that only use Workstation 10. Inside that workstation i use :

6 ESXi 5.5

1 Vcenter Aplliances

1 Win 7 ( For FTP Server and Oracle DB )

1 Vyatta Router

1 Openfiler ( For NFS )

My story, everytime i lab NSX, in tomorrow, i always re-installed NSX manager again. So it always same lab over and over again. My topology are simple :

VM guest --- NSX distributed router --- NSX edge router --- vyatta router --- internet

At first installation, everything worked perfectly. But if i start again tomorrow, VM guest cannot ping to default gateway at NSX distributed router, and NSX Edge router cannot ping to distributed router.

First i check connectivity. I am using OSPF. All route table are in there, but from distributed router, i cannot ping to 8.8.8.8, even there was a default route in routing table. So routing are not the issue, so i check something else.

Then i suspecting at controller nodes. Search google then found this blog >> Some useful NSX Troubleshooting Tips | CormacHogan.com

I check that with CLI, and this is what i found :

~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN

VXLAN ID  Multicast IP               Control Plane  Controller Connection  Port Count  MAC Entry Count  ARP Entry Count

--------  -------------------------  -------------  ---------------------  ----------  ---------------  ---------------

    5000  N/A (headend replication)  Enabled ()     0.0.0.0 (down)                  2                0                0

I do what he told, like switch from unicast to multicast, that switch back again. And the problem still in there.

Then i "stalking" this forum, and i found this thread.

Using this blog >>ESXi host Enable Agent error "Cannot complete the operation." | Spas Kaloferov's ...

I follow the step, and it worked!!

~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN

VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count

--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------

    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  10.0.99.1 (up)                  1                1                0

Now is "normal" situation again :smileygrin:

TLDR :

If you Lab, or maybe production server running VMware, then suddenly all your machine, and NSX manager died, and your check everything "like it should be", the problem maybe not in you interconnection, most likely its in controller.

The first thing you do is to ssh to ESXi host and do this command = esxcli network vswitch dvs vmware vxlan network list --vds-name=<YOUR VDS NAME>


If it show up like this :

~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN

VXLAN ID  Multicast IP               Control Plane  Controller Connection  Port Count  MAC Entry Count  ARP Entry Count

--------  -------------------------  -------------  ---------------------  ----------  ---------------  ---------------

    5000  N/A (headend replication)  Enabled ()     0.0.0.0 (down)                  2                0                0


Then type in your ESXi this >> /etc/init.d/netcpad restart


And now it should be like this :


~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN

VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count

--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------

    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  10.0.99.1 (up)                  1                1                0

Troubleshooting Step :

1. Check NSX Manager, is up or down.

2. Check Controller status.

3. Check installation status at host preparation, is it green or with red "resolve" in it.

4. Check ESXi with SSH, and use this command to determine if controller connection are up or down >> esxcli network vswitch dvs vmware vxlan network list --vds-name=<YOUR VDS NAME>

0 Kudos
SpasKaloferov
VMware Employee
VMware Employee
Jump to solution

HI,

i'm glad the post helped.

BR,

Spas Kaloferov

snoopia
Contributor
Contributor
Jump to solution

hi rbenhaimrbenhaim

>Note: for Nested LAB  the VTEP NIC teaming must be "failover"

Do yo mean VTEP NIC must be teaming *AND* set to  "failover" ?

OR

VTEP NIC teaming must be "failover" if there is more than 2 NIC.

im trying to setup nested NSX on my desktop PC and it has only 2 physical NIC. is it possible some how?

thanks in advance.

0 Kudos
rbenhaim
Enthusiast
Enthusiast
Jump to solution

regardless of the number of links you have, VTEP NIC teaming must be "failover" in nested environment.

0 Kudos
snoopia
Contributor
Contributor
Jump to solution

Thank you sir.

0 Kudos
rbudavari
Community Manager
Community Manager
Jump to solution

In a nested environment you can also use Load Balance - SRC ID or SRC MAC for the VXLAN teaming policy. Just can't use Etherchannel/LACP.

0 Kudos
SerkanUstundag
Contributor
Contributor
Jump to solution

Hello everyone !

i also plan to build a NSX lab but could not find any evaluation sets..

I even contacted my VMware partner but they did not even have it..

how did you find NSX installers ?

thanks in advance

0 Kudos
AndreTheGiant
Immortal
Immortal
Jump to solution

There isn't yet.

One simple way is attend at the ICM course and you will be enabled on Nicira web site.

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos