sreeve3939
Enthusiast
Enthusiast

Unable to get VMs communicating in NSX environment

Trying to get one VM on one host pinging another VM on another host across VXLAN tunnel.

u1 is on host 10.100.28.48 knoppix2 is on host 10.100.28.115.

I ping from one VM to the other, but don't even get an ARP response.

nsx1.JPG

nsx2.JPGnsx3.JPG

nsx4.JPG

Anything else I can provide?

Any suggestions are welcome.

Tags (1)
0 Kudos
4 Replies
bayupw
Leadership
Leadership

Did you have VXLAN working previously?

There's an exclamation mark on host 10.100.28.48, you may want to check that too.

You can perform these checks and see if any of the checks failed

I think you should be able to troubleshoot further based on these checks.

Check the NSX Dashboard see if there is any issue on host preparation or logical switch status

pastedImage_1.png

Check the channel communication of both ESXi hosts

pastedImage_2.png

pastedImage_3.png

Do a logical switch ping test from the UI using both minimum and VXLAN standard packet sizes

pastedImage_4.png

Do a vmkping test for the VTEPs from ESXi host CLI

     vmkping ++netstack=vxlan <vmknic IP> -d -s <packet size>

See this KB: Testing VMkernel network connectivity with the vmkping command (1003728) | VMware KB

pastedImage_7.png

To validate this, ping using MTU smaller than 1500 e.g. 1470 then try again using MTU highter thatn 1500 e.g. 1570.

If the ping works with the smaller (1470) size, but not 1570, then you have MTU issue in your physical switch.

Do a Traceflow

pastedImage_8.png

Bayu Wibowo | VCIX6-DCV/NV Author of VMware NSX Cookbook http://bit.ly/NSXCookbook https://github.com/bayupw/PowerNSX-Scripts https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
2cool2touch
Contributor
Contributor

You can automate the validation of your environment explained by Bayu using the healthchecks in NSX-PowerOps www.nsx-powerops.com

Sounds like you need to do VTEP to VTEP tests

0 Kudos
Mid_Hudson_IT
Contributor
Contributor

So I don't see one thing in all of this, are the problems purely within NSX or are they having problems with the physical layer?

One thing to make sure is that the 10.100.28.x subnet is not the native vlan, from the looks of your nic's they all ride over one network.

I mean from the looks of things in the pictures everything is in the 10.100.28.x subnet, one thing to check I suppose is the physical switch. Case-in-point, is the links from the physical layer to the vmnicssetup as access or as a trunk uplink and then what is the native vlan in conjunction with the 10.100.28.x subnet? - if the 10.100.28.x subnet is the native vlan, try making the native vlan someithng like 4000, then if it's not the native vlan make sure you have encapsulation enable on the uplink since it should be a trunk. I guess it's worth asking, as you using unicast, hybrid or multicast?

is your 10.100.28.x super-subnetted?

Just some observations.

VCP5/6-DCV, VCP6-NV, vExpert 2015/2016/2017, A+, Net+, Sec +, Storage+, CCENT, ICM NSX 6.2, 70-410, 70-411
0 Kudos
sreeve3939
Enthusiast
Enthusiast

So decided to start fresh and it now works fine.

Had been using nested ESXi - now doing no nesting.

Think I'll stay clear of nested ESXi for a while.

Bayu - many thanks for the suggestions.

0 Kudos