skroesen
Contributor
Contributor

Unable to ping TEP interfaces

I am working on a NSX-T lab.  I am having trouble not being able to ping the vmkernel IPs of the TEPs (even from the host, can't ping itself).  The IPs are set via pool.  I can run the command "esxcli network ip interface ipv4 get" and see the assigned IP and that it is on vmk10.  The gateway is 0.0.0.0.  Shouldn't the IP of the GW set in the IP Pool be configured?  2020-04-16_18-08-25.jpg

Here is the IP Pool:

2020-04-16_18-10-23.jpg

Transport Node Profile, Overlay.  I assigned a physical nic vmnic1. 

2020-04-16_18-11-58.jpg

I am out of ideas.  Any hints to get me moving again? 

sk

0 Kudos
19 Replies
nipanwar
Enthusiast
Enthusiast

what you observe is expected. Since these are different stacks, ping will not work locally and will go out and route back in.

Can you try to ping TEP GW IP from host cli -> vmkping ++netstack=vxlan  -I vmk10 10.1.1.1

Check if your vmnic1 is physically UP.

If this doesnt work then most probably it seems TEP VLAN mis config. Check the TEP Vlan configured in your uplink profile and make sure same is trunked on physical infra.

0 Kudos
skroesen
Contributor
Contributor

Thanks for the help.  The ping failed with that command as well.  I will look closer at the underlying networking and configs to confirm. 

0 Kudos
skroesen
Contributor
Contributor

I have built this out in VMware Workstation.  I just cannot find where my issues is.  I am unable to get VMs to communicate accross the overlay network.  I looked everything over.  Since it is workstation, I am not using any VLAN/Trunks.  Here is a simplified version.  The vCenter and NSX managers are running on Workstation directly, plugged into the "management segment". 

Both ESXi host's vmkernal for management are in the management segment in workstation.

Both ESXi host's TEP physical NICs are vmnic1, plugged into the "Underlay Segment" in workstation. 

In NSX manager, I am using the default single nic uplink profile. 

I have posted my transport node profiles above. 

Am I missing something?  I have confirmed communications across all segments (host to host to router) etc.  Everything responds.  2020-04-19_10-34-44.jpg

0 Kudos
daphnissov
Immortal
Immortal

I recall this being a defect in VMware Workstation in that the MTU setting is broken. With that being the case, the ping isn't working because it's not allowing enough room in the frame for GENEVE. Not 100% sure about that, but nested labs like these are always problematic. It'd be far better if you can model this on vSphere.

0 Kudos
nipanwar
Enthusiast
Enthusiast

Can you share the configuration of your uplink profile used in your transport node profile?

Also run that above Vmkping command and please confirm if esxi-1 can ping esx-2 TEP IP.

0 Kudos
nipanwar
Enthusiast
Enthusiast

With lower MTU also it ll work as long as your actual   unencapsulated frame is less than 1400 bytes,

normal ping packet is 50-100bytes and plus Geneva header it ll be around 200bytes so we should not see MTU problem in a ping in this scenario.

0 Kudos
skroesen
Contributor
Contributor

15.5 resolved that issue, but I have been trying to confirm if MTU 9000 is on by default as I cannot find a setting anywhere in workstation to change it. 

0 Kudos
skroesen
Contributor
Contributor

Thanks again virtuallyme, I had to report to the office today for a bit, but I will try later when I return home where the computer I am running the lab is located. 

0 Kudos
daphnissov
Immortal
Immortal

It was *said* to resolve the issue, but I heard it didn't. The test to do here is ping with an explicit packet size starting low and walking up. Bottom line is, regardless of what Workstation says, if you cannot ping between TEPs with an MTU of 1572, your tunnels are not going to come up and it won't work. Easy way to check:  vmkping -S vxlan <TEP> -d -s 1572 -c 10

0 Kudos
skroesen
Contributor
Contributor

I am unable to ping between tep interface, or from the router to the tep interface.  I don't think it is MTU related at this point.  Here is my uplink profile.  Default configuratin from install. 

2020-04-20_13-58-42.jpg

0 Kudos
daphnissov
Immortal
Immortal

This uplink profile says you're not tagging the VLAN used for that traffic. The profile defines an MTU of 1600, but that has to be a capability on whatever the underlying network is. Focus on testing between TEPs on ESXi hosts at this point which this uplink profile doesn't apply to. As I said, if cannot ping between TEPs among ESXi hosts using that command I gave you, nothing is going to work. So don't pass go and don't collect $200 until that works.

0 Kudos
nipanwar
Enthusiast
Enthusiast

well, NSX-T configurations looks perfect.

I dont have workstation experience but I am assuming that for the overlay network (which is default Vlan)  you are connecting the ESXi-1 vmnic-1 and ESXi-2 vmnic1 via some network on workstation.

0 Kudos
skroesen
Contributor
Contributor

Thanks for the input.  No vlan tagging is used as the connection is not coming in on a trunk port, so is equivalent to default access port vlan1.  As a next step I am going to remove the NSX install from these hosts, setup a distributed switch between the hosts on the network segment uplinking to the same pnics I am currently using for NSX and see if two VMs can communicate accross that.  Basically, verify the network. 

0 Kudos
skroesen
Contributor
Contributor

virtuallyme, thanks for verifying my configuration.  Yes, both ESXi hosts are using vmnic1. 

0 Kudos
daphnissov
Immortal
Immortal

That'd be a good place to start. If you can ping across there, increase the MTU on the vDS to 1600 and try a ping with a larger MTU. Your goal is to reach at least 1572 with enough space to account for the encapsulation overhead.

0 Kudos
skroesen
Contributor
Contributor

I did this.  Setup two windows VMs, one on each host using a distributed switch and the same NIC I had configured in NSX.  . Enabled Jumbo Frames in Windows.  I can do a standard ping between the VMs, but cannot ping between them with 1572 packet size.  I need to dig into workstation. 

2020-04-20_16-34-49.jpg

0 Kudos
daphnissov
Immortal
Immortal

This seems to reinforce that Workstation either isn't setting a higher MTU despite being asked, or that application is broken in some regards.

0 Kudos
skroesen
Contributor
Contributor

Hold on!  I forgot to increase MTU for the distributed switch in my test.  I can ping with an 8000 packet size.  So it does appear I am getting jumbo frames between the host servers. 

2020-04-20_17-22-52.jpg

0 Kudos
skroesen
Contributor
Contributor

So I was able to prove communications between the two ESXi hosts over the physical nics using jumbo frames.  I just noticed in buried in the status, the geneve tunnel is down.  Are there any logs I can get to to determine why it is down?  I have a running VM on each host on the segment, trying to communicate with each other.  The tunnel should be up. 

2020-04-20_18-53-24.jpg

0 Kudos