VMware Networking Community
bobbyccie
Contributor
Contributor

NSX-V 6.2 VXLAN Logical Switch Troubleshooting

Hi all,


I am having problems with a new install of NSX-V 6.2. I am unable to ping between VMs on the same Logical Switch.


Here are my build steps:


- Registered NSX Manager with vCenter

- Deployed three NSX Controllers

- Prepared hosts by installing the VIBs

- Created VTEP VMkernel interfaces

- Set Segment ID Pool (5000-5999)

- Created a new Transport Zone (unicast mode)

- Created a new Logical Switch (unicast mode)

- Migrated VM vNICs to Logical Switch

Everything looks OK in the vSphere Web Client, for example the Controller status is normal and VXLAN ping tests are fine between hosts.

Just no commination between VMs. Here are some outputs from the controller:


nsx-controller # show control-cluster status

Type                Status                                       Since

--------------------------------------------------------------------------------

Join status:        Join complete                                09/29 16:24:25

Majority status:    Connected to cluster majority                09/29 16:23:58

Restart status:     This controller can be safely restarted      09/29 16:24:20

Cluster ID:         ef3e531e-cc5a-4086-a1f4-d9f3b69077e7

Node UUID:          ef3e531e-cc5a-4086-a1f4-d9f3b69077e7

Role                Configured status   Active status

--------------------------------------------------------------------------------

api_provider        enabled             activated

persistence_server  enabled             activated

switch_manager      enabled             activated

logical_manager     enabled             activated

directory_server    enabled             activated

nsx-controller # show control-cluster connections

role                port            listening open conns

--------------------------------------------------------

api_provider        api/443         Y         2        

--------------------------------------------------------

persistence_server  server/2878     -         0        

                    client/2888     Y         0        

                    election/3888   Y         0        

--------------------------------------------------------

switch_manager      ovsmgmt/6632    Y         0        

                    openflow/6633   Y         0        

--------------------------------------------------------

system              cluster/7777    Y         0        

nsx-controller # show control-cluster logical-switches vni 5000

VNI      Controller      BUM-Replication ARP-Proxy Connections

5000     192.168.100.50   Enabled         Enabled   0         


nsx-controller # show control-cluster logical-switches connection-table 5000 <—— Empty output

nsx-controller # show control-cluster logical-switches arp-table 5000 <—— Empty output

nsx-controller # show control-cluster logical-switches vtep-table 5000 <—— Empty output

Can anyone suggest any troubleshooting tips?

Thanks,

Bobby

Reply
0 Kudos
4 Replies
dhanarajramesh

there are few thing wants to check, Did you enable MTU size more than 1600 your physical lan? what is the load balancing mechanism you are using for your DVS nic which you have chosen as transport for VTEP?  how many NICs u have been using per hosts?

pastedImage_0.png

multiple ip needed for multiple vtep

Reply
0 Kudos
bobbyccie
Contributor
Contributor

Hi,

I am using 1 NIC per host. Teaming policy is set to Failover.

Any more suggestions?

Thanks,

Bobby

Reply
0 Kudos
p0wertje
Hot Shot
Hot Shot

try logging in on the other 2 controllers and try the command you have in red.

1 of the controllers is responsible for the vtep and mac table. So it is possible that in your case it is controller-2

You can also login on the vmware host and ping the other vtep:

(vmk2 is the vtep interface and 192.168.249.241 is the vtep ip on the other host.

ping ++netstack=vxlan -I vmk2 192.168.249.241 -s 1572 -d

PING 192.168.249.241 (192.168.249.241): 1572 data bytes

1580 bytes from 192.168.249.241: icmp_seq=0 ttl=64 time=0.563 ms

1580 bytes from 192.168.249.241: icmp_seq=1 ttl=64 time=0.447 ms

If this does not ping, the mtu is probably incorrect on the physical switch.

from version 6.2 you also have centralized cli commands (run them from the nsx-manager cli)

For example:

manager> show controller list all

NAME                 IP                                   State

controller-3         192.168.249.251                      RUNNING

controller-1         192.168.249.250                      RUNNING

nsx-controller-node3 192.168.249.252                      RUNNING

manager> show logical-switch list vni 5000 host

ID                   HostName                               VdsName

host-83              192.168.249.15                         management dvs

host-11              192.168.249.25                         management dvs

host-186             192.168.249.20                         compute dvs 1

host-176             192.168.249.10                         compute dvs 1

manager> show logical-switch list all

NAME                 UUID                                 VNI        Trans Zone Name      Trans Zone ID

transport            9ce11b50-9af4-4d96-b3bf-65fdf847a8e5 5000       compute              vdnscope-2

web_tier             08bde95a-1e72-4ff2-b0de-bdd293610b41 5001       compute              vdnscope-2

network_test_lab01   dea2a8d0-fde3-468f-aa96-163fae13be70 5002       compute              vdnscope-2

network_test_lab02   afad1b08-d3d9-4393-ace5-a28a3d8b65c7 5003       compute              vdnscope-2

test_lab_management_network 61d52086-3f75-42f0-ac2f-7ae76536d4c0 5004compute              vdnscope-2

network_test_lab03   6a1afc92-25d0-45d7-bf68-d9d8cc14432b 5005       compute              vdnscope-2

manager> show logical-switch controller controller-1 vni 5002 brief

VNI      Controller      BUM-Replication ARP-Proxy Connections

5002     192.168.249.250 Enabled         Enabled   1

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved
Reply
0 Kudos
Richard__R
Enthusiast
Enthusiast

Hi - I would probably check to see that the netcpa agent on the hosts is communicating with the Controller cluster as well over TCP port 1234. You can do an 'esxcli network ip connection list|grep 1234' to see if there are some established sessions from one of the hosts in question. Seems a bit suspicious that your tables are empty on the Controllers. Check out /var/log/netcpa.log on the hosts as well - for those with the VMs on your logical switch you should see some VTEP joins.

Reply
0 Kudos