I'm running a nested NSX homelab on vCloud Director. I've completed all the installation, and host preparation as you'll see from the images below. I'm trying to ping between 2 VMs that are connected to the same VNI. These VMs are on different hosts. The ping fails when they're on different hosts. When they're on the same host, I'm able to ping, indicating that there is something wrong on the VTEPs. Can you please have a look at the config below and let me know what is it that I'm doing wrong?
Many thanks for your help!
All the ESXi hosts are connected to MGMT_NW on VCD.
Settings Of MGMT_NW
All NSX controllers are seeing each other here.
Host Preparation is complete.
TRANSPORT VXLAN VLAN is set to 0.
VMs M1 and M2 are connected to the same VNI.
VNI is set to UNICAST.
Make sure to enable promiscuous mode on the portgroups, see these 2 blog posts:
If you are still experiencing the issue, you might have similar situation as explained in this blog post: NSX and nested ESXi environments: caveats and layer-2 troubleshooting – vLenzker
Thank you for your reply. I think the settings that you're talking about are already enabled because, I connected the VMs to a non-NSX PG on the same DSwitch, and even though the VMs were on different ESXi hosts, the VMs were able to talk to each other.
UPDATE - So, there was an issue with the default gateway. I rectified the issue with the default gateway and now the VTEPs on the individual hosts are able to communicate with each other, but the VMs themselves are still unable to communicate. I'm stumped. :smileyconfused:
I think the issue is, from any of the esxi host, I'm not able to ping the other host's VTEP IP. The pings itself are failing. What do you reckon?
I'm not even able to ping my own VTEP.
Here are the settings from the PG which was created when preparing the host.
I tried tweaking the security settings already, and I don't think it's the VCD underlay issue as it is already allowing traffic on non-NSX PGs for VMs on different hosts and even with packet sizes upto 1700 bytes.
I have 2 clusters - Management and Compute. On the management cluster, the communication health channel is clearing.
On the Compute cluster, I can see the below:
Based on this, I moved the M1 to Host1 of Management cluster, and M2, to Host2 of Management cluster, and they were able to ping each other. As far I understand, I think, the L2 is fine, but there's some network issue as above.
Can I ask your reasoning for setting the vlan setting to none?
as for the phyiscal L2 layer, your managment layer, is that the default vlan 1?
If you have a physical layer, what is your native vlan?
Are the links for the VXLAN going over the managment nics or thier own vDS switch and vmnics?
Hi, what did you tweak on the security settings?
As explained in some blog posts in previous reply, in a nested environment you would want to set the promiscuous to accept and not all reject as per your screenshot
Im new so please pardon if someone has already mentioned this. Did you check the DFW rules? Maybe some rule is blocking this, even though they are on the same subnet. Maybe a deny all rule at if DFW is enabled.
In addition, are both clusters part of the same Transport zone?