VMware Networking Community
TryllZ
Expert
Expert
Jump to solution

VM Not reaching Internet via Edge ?!

Hi All,

My NSX network setup is as follows, there are no firewall restrictions anywhere.

TryllZ_0-1692393854208.png

Both VMs can ping each other which tells me T1 Gateway is working.

Both Edge nodes can ping firewall interface and internet as well.

TryllZ_0-1692394407588.png

TryllZ_2-1692394057283.png

TryllZ_1-1692394043349.png

Its the VMs that cannot reach the internet, the reply is coming from T0 interface.

TryllZ_3-1692394127207.png

Traceroute results with the following.

TryllZ_1-1692394464844.png

Any thoughts where the issue might be ?

Reply
0 Kudos
1 Solution

Accepted Solutions
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

You have no default route in the routing table of your T0 SR. Therefore the traffic in the traceflow is dropped. You have to set default originate on your firewall so that the default route is passed on to your T0.

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/

View solution in original post

44 Replies
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

How do you setup your route redistribution on your t0 and t1 route advertisement?

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

On T0 Gateway

TryllZ_0-1692442512114.png

On T1 Gateway

TryllZ_1-1692442552649.png

 

Reply
0 Kudos
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

have you created return routes on the firewall for your segments or are you using a dynamic routing protocol? it looks like your firewall can't send the traffic back. It knows the edge nodes, because they are in a network which is connected to your firewall.

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

Thanks,

I'm using dynamic routing with BGP, all routes are advertising fine in the router, and edge nodes as well.

Reply
0 Kudos
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

Is your TEP network functional? Can you do a traceflow under Plan-Troubleshoot? Can you ping the TEP IP addresses from your ESX server? 

ping ++netstack=vxlan <dst IP> -s 1600 -d 

Is your T0 activ-activ? If activ/actvi URPF Mode on none?

can you look your rounting table on the edge vm and look if the segments in the routing table of your sr t0

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

I'll do a traceflow once on the system.

I had checked, ESXi server can ping TEP addresses of all ESXi hosts in the TEP, will still recheck.

No, the T0 is not Active/Active when checked in edge CLI, its Active and Never Established (if I recall correctly). The firewall cannot ping the 2nd uplink interface IP addresses, 10.10.26.101, and 10.10.26.102. However, in the GUI the T0 HA is Active Active.

I had checked routing table in Edge, it had all the networks, including segments, I have all networks allowed in prefix list.

Will share the results in some time.

Thanks @DanielKrieger 

Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

Results..

ESXi Pinging TEP addresses, pinging ESXi TEP IP is 10.10.25.51, 10.10.25.52

[root@d-esx-srv-cn5:~] vmkping -I vmk11 -s 9000 -d -S vxlan 10.10.23.57
PING 10.10.23.57 (10.10.23.57): 9000 data bytes
9008 bytes from 10.10.23.57: icmp_seq=0 ttl=64 time=5.915 ms

--- 10.10.23.57 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 5.915/5.915/5.915 ms

[root@d-esx-srv-cn5:~] vmkping -I vmk10 -s 9000 -d -S vxlan 10.10.23.57
PING 10.10.23.57 (10.10.23.57): 9000 data bytes
9008 bytes from 10.10.23.57: icmp_seq=0 ttl=64 time=5.392 ms

--- 10.10.23.57 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 5.392/5.392/5.392 ms

Traceflow from Edge Node interface

TryllZ_0-1692454467305.png

Traceflow from VM to internet

TryllZ_1-1692454760475.png

BGP Summary from Edge Node (State is Active but Uptime/Downtime is Never)

edge2(tier0_sr[1])> get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            AD - Admin down, DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 10.10.25.102  Local AS: 65000

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

10.10.25.1                          65555       Estab 00:22:42     UP  40      28      11     4
10.10.26.1                          65555       Activ never        DC  0       0       0      0
Sat Aug 19 2023 UTC 14:21:35.584

Edge Node Logical Router, and Routing Table

edge2> get logical-router
Sat Aug 19 2023 UTC 14:21:15.633
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      4       6/5000
77f0d5e7-e687-48b2-83df-147cec4de28c   1      2054   SR-T0-GW                          SERVICE_ROUTER_TIER0        7       2/50000
b4a49245-8bb3-4c63-b455-40c56631a04f   3      2049   DR-T0-GW                          DISTRIBUTED_ROUTER_TIER0    5       2/50000
f6a2a880-335e-4092-9684-e4eeba1c70f1   4      2052   DR-T1-GW                          DISTRIBUTED_ROUTER_TIER1    6       4/50000

edge2(tier0_sr[1])> get route

Flags: t0c - Tier0-Connected, t0s - Tier0-Static, b - BGP, o - OSPF
t0n - Tier0-NAT, t1s - Tier1-Static, t1c - Tier1-Connected,
t1n: Tier1-NAT, t1l: Tier1-LB VIP, t1ls: Tier1-LB SNAT,
t1d: Tier1-DNS FORWARDER, t1ipsec: Tier1-IPSec, isr: Inter-SR,
> - selected route, * - FIB route

Total number of routes: 20

b  > * 10.10.13.0/24 [20/1] via 10.10.25.1, uplink-271, 00:23:27
b  > * 10.10.15.0/24 [20/1] via 10.10.25.1, uplink-271, 00:23:27
b  > * 10.10.23.0/24 [20/1] via 10.10.25.1, uplink-271, 00:23:27
b  > * 10.10.24.0/24 [20/1] via 10.10.25.1, uplink-271, 00:23:27
t0c> * 10.10.25.0/24 is directly connected, uplink-271, 00:23:32
isr> * 10.10.25.101/32 [200/0] via 169.254.0.130, inter-sr-278, 00:23:19
t0c> * 10.10.26.0/24 is directly connected, uplink-277, 00:23:32
isr> * 10.10.26.101/32 [200/0] via 169.254.0.130, inter-sr-278, 00:23:19
t1c> * 10.10.100.0/24 [3/0] via 100.64.0.1, linked-275, 00:23:24 <--- Segment Connected to 1st VM
t1c> * 10.10.200.0/24 [3/0] via 100.64.0.1, linked-275, 00:23:24 <--- Segment Connected to 2nd VM
t0c> * 100.64.0.0/31 is directly connected, linked-275, 00:23:32
t0c> * 169.254.0.0/25 is directly connected, downlink-280, 00:23:31
isr> * 169.254.0.128/25 is directly connected, inter-sr-278, 00:23:32
b  > * 192.168.1.0/24 [20/0] via 10.10.25.1, uplink-271, 00:13:18
b  > * 192.168.9.0/24 [20/1] via 10.10.25.1, uplink-271, 00:23:27
b  > * 192.168.11.0/24 [20/0] via 10.10.25.1, uplink-271, 00:13:18
t0c> * fc64:1a87:8e1f:3400::/64 is directly connected, linked-275, 00:23:33
t0c> * fe80::/64 is directly connected, linked-275, 00:23:33
Sat Aug 19 2023 UTC 14:22:22.305

URPF mode is Strict.

Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

Where it says Dropped for No Route found, is this for incoming traffic or outgoing traffic, from Edge Node ?

Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

Hi @DanielKrieger 

I think I understand why its not working.

I set static default route, and the VM's can reach internet. which tells me there is no default route set on the router for BGP to be advertised.
Tags (1)
Reply
0 Kudos
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

You have no default route in the routing table of your T0 SR. Therefore the traffic in the traceflow is dropped. You have to set default originate on your firewall so that the default route is passed on to your T0.

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

@TryllZYes that explains it. Our posts have just overlapped. You can specify in the BGP that the default route is passed along.

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

Great, thanks, will do so and test again..

On a similar note, why is the 2nd Edge Uplink on both Nodes in Active mode and not Established even though the HA is Active/Active in T0, any thoughts on that ?

Reply
0 Kudos
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

Bildschirmfoto 2023-08-20 um 11.47.10.png

 This are my FW Settings (PFSense Cluster) for my Neighbor.

The problem with the 2nd edge uplink could be manifold.

1. i would check if you can ping your firewall over the 2nd ip address of your edge node and vice versa.
2. check the bgp configuration, sometimes it's a simple number error of the IP or the update interface.
3. what does the NSX GUI show?
4. is the BFD profile correct, are the timers right?


PS: Kudos would be nice if I helped, because I still need them for my VMware Rewards profile 😄

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Tags (1)
Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

1. i would check if you can ping your firewall over the 2nd ip address of your edge node and vice versa.

I have checked it already, the firewall cnanot ping the 2nd interface on both edge nodes.

2. check the bgp configuration, sometimes it's a simple number error of the IP or the update interface.

All configurations are the same on both uplinks, the only thing being the 2 uplinks are connected to 2 interfaces on the same firewall.

3. what does the NSX GUI show?

For the 1st uplink it shows Success, I can see BGP exchange happening in both NSX and Firewall.

For the 2nd Uplink it shows Down.

4. is the BFD profile correct, are the timers right?

This is default and has been untouched.

Reply
0 Kudos
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

Okay, then we have a problem in your setup. Which firewall are you using? Single or cluster?
Is your lab nested?
How are your VLANs configured?
The edge IP must be pingable, even if no BGP neighborhood is established. As long as your layer 2 is not clean, no BGP will work.

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

The Firewall is OPNSense, single, for now, I might gowith HA or setup 2 firewalls, not sure yet.

Yes this is a nested lab.

Sorry unsure how to answer "How is my VLAN configured", its with sub-interfaces on the firewall.

I'll add my network diagram in a while, that should make the picture clearer.

NSX VLANs are as follows, Host TEP (VLAN 23), Edge TEP's (VLAN 24), and Edge Uplinks (Uplink 1 VLAN 25, Uplink 2 VLAN26). Edgeup Uplink portgroups in the Distributed Switch are are carrying VLANs 25, 24 (Uplink 1), and 26, 24 (Uplink 2).

Reply
0 Kudos
DanielKrieger
Enthusiast
Enthusiast
Jump to solution

What are your security settings on your uplink dvPG?
You need to allow promiscuous mode, mac address changes and forged transmits for it to work cleanly.

Bildschirmfoto 2023-08-20 um 12.14.53.png

----------------------------------------------------------------------
My Blog: https://evoila.com/blog/author/danielkrieger/
Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

I recall setting that to Allowed on the Baremetal, will need to double-check on the Edge Uplinks..

Reply
0 Kudos
TryllZ
Expert
Expert
Jump to solution

Thanks a lot @DanielKrieger appreciate all the help..