Cannot ping VMs on different ESXi hosts

abhisheksha · ‎02-11-2018

Hi,

I'm running a nested NSX homelab on vCloud Director. I've completed all the installation, and host preparation as you'll see from the images below. I'm trying to ping between 2 VMs that are connected to the same VNI. These VMs are on different hosts. The ping fails when they're on different hosts. When they're on the same host, I'm able to ping, indicating that there is something wrong on the VTEPs. Can you please have a look at the config below and let me know what is it that I'm doing wrong?

Many thanks for your help!

All the ESXi hosts are connected to MGMT_NW on VCD.

Settings Of MGMT_NW

All NSX controllers are seeing each other here.

Host Preparation is complete.

TRANSPORT VXLAN VLAN is set to 0.

VMs M1 and M2 are connected to the same VNI.

VNI is set to UNICAST.

bayupw · ‎02-11-2018

Make sure to enable promiscuous mode on the portgroups, see these 2 blog posts:

Why is Promiscuous Mode & Forged Transmits required for Nested ESXi? https://www.virtuallyghetto.com/2013/11/why-is-promiscuous-mode-forged.html

How To Enable Nested ESXi Using VXLAN In vSphere & vCloud Director

If you are still experiencing the issue, you might have similar situation as explained in this blog post: NSX and nested ESXi environments: caveats and layer-2 troubleshooting – vLenzker

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw

abhisheksha · ‎02-12-2018

Hi bayupw,

Thank you for your reply. I think the settings that you're talking about are already enabled because, I connected the VMs to a non-NSX PG on the same DSwitch, and even though the VMs were on different ESXi hosts, the VMs were able to talk to each other.

abhisheksha · ‎02-13-2018

UPDATE - So, there was an issue with the default gateway. I rectified the issue with the default gateway and now the VTEPs on the individual hosts are able to communicate with each other, but the VMs themselves are still unable to communicate. I'm stumped. :smileyconfused:

Hi,

I think the issue is, from any of the esxi host, I'm not able to ping the other host's VTEP IP. The pings itself are failing. What do you reckon?

I'm not even able to ping my own VTEP.

Here are the settings from the PG which was created when preparing the host.

I tried tweaking the security settings already, and I don't think it's the VCD underlay issue as it is already allowing traffic on non-NSX PGs for VMs on different hosts and even with packet sizes upto 1700 bytes.

Thank you,

Abhishek

abhisheksha · ‎02-13-2018

UPDATE 2:

I have 2 clusters - Management and Compute. On the management cluster, the communication health channel is clearing.

On the Compute cluster, I can see the below:

Based on this, I moved the M1 to Host1 of Management cluster, and M2, to Host2 of Management cluster, and they were able to ping each other. As far I understand, I think, the L2 is fine, but there's some network issue as above.

Mid_Hudson_IT · ‎03-10-2018

Can I ask your reasoning for setting the vlan setting to none?

as for the phyiscal L2 layer, your managment layer, is that the default vlan 1?

If you have a physical layer, what is your native vlan?

Are the links for the VXLAN going over the managment nics or thier own vDS switch and vmnics?

VCP5/6-DCV, VCP6-NV, vExpert 2015/2016/2017, A+, Net+, Sec +, Storage+, CCENT, ICM NSX 6.2, 70-410, 70-411

bayupw · ‎03-11-2018

Hi, what did you tweak on the security settings?

As explained in some blog posts in previous reply, in a nested environment you would want to set the promiscuous to accept and not all reject as per your screenshot

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw

jimmyeys · ‎03-11-2018

Please check the subnet mask for the vmkernal ports created by nsx or check theMTU of the switch

sarvp · ‎03-12-2018

Im new so please pardon if someone has already mentioned this. Did you check the DFW rules? Maybe some rule is blocking this, even though they are on the same subnet. Maybe a deny all rule at if DFW is enabled.

In addition, are both clusters part of the same Transport zone?

Thanks

Sarvjit