Unable to get VMs communicating in NSX environment

huchord · ‎03-07-2018

Trying to get one VM on one host pinging another VM on another host across VXLAN tunnel.

u1 is on host 10.100.28.48 knoppix2 is on host 10.100.28.115.

I ping from one VM to the other, but don't even get an ARP response.

Anything else I can provide?

Any suggestions are welcome.

bayupw · ‎03-07-2018

Did you have VXLAN working previously?

There's an exclamation mark on host 10.100.28.48, you may want to check that too.

You can perform these checks and see if any of the checks failed

I think you should be able to troubleshoot further based on these checks.

Check the NSX Dashboard see if there is any issue on host preparation or logical switch status

Check the channel communication of both ESXi hosts

Do a logical switch ping test from the UI using both minimum and VXLAN standard packet sizes

Do a vmkping test for the VTEPs from ESXi host CLI

vmkping ++netstack=vxlan <vmknic IP> -d -s <packet size>

See this KB: Testing VMkernel network connectivity with the vmkping command (1003728) | VMware KB

To validate this, ping using MTU smaller than 1500 e.g. 1470 then try again using MTU highter thatn 1500 e.g. 1570.

If the ping works with the smaller (1470) size, but not 1570, then you have MTU issue in your physical switch.

Do a Traceflow

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw

2cool2touch · ‎03-10-2018

You can automate the validation of your environment explained by Bayu using the healthchecks in NSX-PowerOps www.nsx-powerops.com

Sounds like you need to do VTEP to VTEP tests

Mid_Hudson_IT · ‎03-10-2018

So I don't see one thing in all of this, are the problems purely within NSX or are they having problems with the physical layer?

One thing to make sure is that the 10.100.28.x subnet is not the native vlan, from the looks of your nic's they all ride over one network.

I mean from the looks of things in the pictures everything is in the 10.100.28.x subnet, one thing to check I suppose is the physical switch. Case-in-point, is the links from the physical layer to the vmnicssetup as access or as a trunk uplink and then what is the native vlan in conjunction with the 10.100.28.x subnet? - if the 10.100.28.x subnet is the native vlan, try making the native vlan someithng like 4000, then if it's not the native vlan make sure you have encapsulation enable on the uplink since it should be a trunk. I guess it's worth asking, as you using unicast, hybrid or multicast?

is your 10.100.28.x super-subnetted?

Just some observations.

VCP5/6-DCV, VCP6-NV, vExpert 2015/2016/2017, A+, Net+, Sec +, Storage+, CCENT, ICM NSX 6.2, 70-410, 70-411

huchord · ‎03-11-2018

So decided to start fresh and it now works fine.

Had been using nested ESXi - now doing no nesting.

Think I'll stay clear of nested ESXi for a while.

Bayu - many thanks for the suggestions.

All

Unable to get VMs communicating in NSX environment