Re: NSX-V 6.2 VXLAN Logical Switch Troubleshooting

bobbyccie · ‎09-30-2015

Hi all,

I am having problems with a new install of NSX-V 6.2. I am unable to ping between VMs on the same Logical Switch.

Here are my build steps:

- Registered NSX Manager with vCenter

- Deployed three NSX Controllers

- Prepared hosts by installing the VIBs

- Created VTEP VMkernel interfaces

- Set Segment ID Pool (5000-5999)

- Created a new Transport Zone (unicast mode)

- Created a new Logical Switch (unicast mode)

- Migrated VM vNICs to Logical Switch

Everything looks OK in the vSphere Web Client, for example the Controller status is normal and VXLAN ping tests are fine between hosts.

Just no commination between VMs. Here are some outputs from the controller:

nsx-controller # show control-cluster status

Type Status Since

--------------------------------------------------------------------------------

Join status: Join complete 09/29 16:24:25

Majority status: Connected to cluster majority 09/29 16:23:58

Restart status: This controller can be safely restarted 09/29 16:24:20

Cluster ID: ef3e531e-cc5a-4086-a1f4-d9f3b69077e7

Node UUID: ef3e531e-cc5a-4086-a1f4-d9f3b69077e7

Role Configured status Active status

--------------------------------------------------------------------------------

api_provider enabled activated

persistence_server enabled activated

switch_manager enabled activated

logical_manager enabled activated

directory_server enabled activated

nsx-controller # show control-cluster connections

role port listening open conns

--------------------------------------------------------

api_provider api/443 Y 2

--------------------------------------------------------

persistence_server server/2878 - 0

client/2888 Y 0

election/3888 Y 0

--------------------------------------------------------

switch_manager ovsmgmt/6632 Y 0

openflow/6633 Y 0

--------------------------------------------------------

system cluster/7777 Y 0

nsx-controller # show control-cluster logical-switches vni 5000

VNI Controller BUM-Replication ARP-Proxy Connections

5000 192.168.100.50 Enabled Enabled 0

nsx-controller # show control-cluster logical-switches connection-table 5000 <—— Empty output

nsx-controller # show control-cluster logical-switches arp-table 5000 <—— Empty output

nsx-controller # show control-cluster logical-switches vtep-table 5000 <—— Empty output

Can anyone suggest any troubleshooting tips?

Thanks,

Bobby

dhanarajramesh · ‎09-30-2015

there are few thing wants to check, Did you enable MTU size more than 1600 your physical lan? what is the load balancing mechanism you are using for your DVS nic which you have chosen as transport for VTEP? how many NICs u have been using per hosts?

multiple ip needed for multiple vtep

bobbyccie · ‎09-30-2015

Hi,

I am using 1 NIC per host. Teaming policy is set to Failover.

Any more suggestions?

Thanks,

Bobby

p0wertje · ‎09-30-2015

try logging in on the other 2 controllers and try the command you have in red.

1 of the controllers is responsible for the vtep and mac table. So it is possible that in your case it is controller-2

You can also login on the vmware host and ping the other vtep:

(vmk2 is the vtep interface and 192.168.249.241 is the vtep ip on the other host.

ping ++netstack=vxlan -I vmk2 192.168.249.241 -s 1572 -d

PING 192.168.249.241 (192.168.249.241): 1572 data bytes

1580 bytes from 192.168.249.241: icmp_seq=0 ttl=64 time=0.563 ms

1580 bytes from 192.168.249.241: icmp_seq=1 ttl=64 time=0.447 ms

If this does not ping, the mtu is probably incorrect on the physical switch.

from version 6.2 you also have centralized cli commands (run them from the nsx-manager cli)

For example:

manager> show controller list all

NAME IP State

controller-3 192.168.249.251 RUNNING

controller-1 192.168.249.250 RUNNING

nsx-controller-node3 192.168.249.252 RUNNING

manager> show logical-switch list vni 5000 host

ID HostName VdsName

host-83 192.168.249.15 management dvs

host-11 192.168.249.25 management dvs

host-186 192.168.249.20 compute dvs 1

host-176 192.168.249.10 compute dvs 1

manager> show logical-switch list all

NAME UUID VNI Trans Zone Name Trans Zone ID

transport 9ce11b50-9af4-4d96-b3bf-65fdf847a8e5 5000 compute vdnscope-2

web_tier 08bde95a-1e72-4ff2-b0de-bdd293610b41 5001 compute vdnscope-2

network_test_lab01 dea2a8d0-fde3-468f-aa96-163fae13be70 5002 compute vdnscope-2

network_test_lab02 afad1b08-d3d9-4393-ace5-a28a3d8b65c7 5003 compute vdnscope-2

test_lab_management_network 61d52086-3f75-42f0-ac2f-7ae76536d4c0 5004compute vdnscope-2

network_test_lab03 6a1afc92-25d0-45d7-bf68-d9d8cc14432b 5005 compute vdnscope-2

manager> show logical-switch controller controller-1 vni 5002 brief

VNI Controller BUM-Replication ARP-Proxy Connections

5002 192.168.249.250 Enabled Enabled 1

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT | vExpert
Please kudo helpful posts and mark the thread as solved if solved

Richard__R · ‎10-01-2015

Hi - I would probably check to see that the netcpa agent on the hosts is communicating with the Controller cluster as well over TCP port 1234. You can do an 'esxcli network ip connection list|grep 1234' to see if there are some established sessions from one of the hosts in question. Seems a bit suspicious that your tables are empty on the Controllers. Check out /var/log/netcpa.log on the hosts as well - for those with the VMs on your logical switch you should see some VTEP joins.

All

NSX-V 6.2 VXLAN Logical Switch Troubleshooting