VMware Networking Community
Czernobog
Expert
Expert

NSX 6.3.3 - VMs behind DLR not recieving DHCP Offer from Edge

I have the exact same issue as posted here: https://communities.vmware.com/thread/517614 but since the thread is 2 years old I've decided to create a new post.

The DLR is configured to relay DHCP request to an Edge. The relay is configured as per documentation: the Relay Server is the Edge IP, Relay Agent is the Internal Interface of the DLR.

The NSX DFW as well as the Edge and DLR firewalls are configred to pass all communication.

When a VM is placed on the logical switch and powered on, the Edge recieves a discover and replies with an offer, however this does not reach the VM:

pastedImage_1.png

pastedImage_2.png

pastedImage_0.png

Is this configuration valid or do I have to use an external (Windows or other) DHCP server? AFAIK the issue was adressed in NSX 6.2.

Message was edited by: Czernobog, corrected NSX version to 6.3.3

Reply
0 Kudos
14 Replies
parmarr
VMware Employee
VMware Employee

This issue was addressed in VMware NSX for vSphere 6.2.5. For more information, see DHCP relay agents do not function in NSX (2147322) . As you are running 6.3.2, as the workaround mentioned on this KB, can you configure a DHCP server closer (fewer than 10 hops) to the virtual machines requesting addresses and point the DHCP relay to this server. An Edge Services Gateway (ESG) can be configured for this purpose if required.

Sincerely, Rahul Parmar VMware Support Moderator
Reply
0 Kudos
Czernobog
Expert
Expert

The DHCP Server (ESG) is only 2 hops away from the VM.

Like I mentioned above, the discovery does reach the DHCP, but the anwser does not seem to reach the VM.

Reply
0 Kudos
grosas
Community Manager
Community Manager

Hi Czernobog

is the DHCP Relay's  internal IP reachable from the ESG?  when you do debug packet display on ESG does DHCP Discover arrive on the same interface that the DHCP Reply is transmitting on?

_____________________________________
Gabe Rosas (VMware HCX team at VMware)
Blog: hcx.design
LinkedIn: /in/gaberosas
Twitter: gabe_rosas
Reply
0 Kudos
Czernobog
Expert
Expert

I've tried it again, here's the output on the ESG:

pastedImage_0.png

Here's a simple diagram:

pastedImage_1.png

The request is sent from the DLR internal interface to the internal ESG interface, the reply is sent back the same way.

The DHCP Relay is configured on the DLR, with the IP Adress = ESG internal interface, DHCP Relay Agent is VXLAN_CLIENTS.

Reply
0 Kudos
grosas
Community Manager
Community Manager

Have you looked at the debug packet display on the DLR?  If the dhcp reply from the ESG is seen on the DLR uplink interface, then the service is malfunctioning - then you should proceed with an SR. 

Maybe as last resort, first collect support logs - then try toggling the relay service off/on --> Test DHCP --> reboot --> Test DHCP -->  Redeploy the DLR appliance --> test DHCP. 

_____________________________________
Gabe Rosas (VMware HCX team at VMware)
Blog: hcx.design
LinkedIn: /in/gaberosas
Twitter: gabe_rosas
Reply
0 Kudos
Czernobog
Expert
Expert

It's not possible to query the internal vNic of the DLR, but AFAIK this is as designed:

pastedImage_0.png

There is no DHCP traffic on the uplink, when I run debug packet display interface vNic_2 dst_port_67 then no output is dispayed:

pastedImage_4.png

I will check if the further steps you have mentioned too.

Reply
0 Kudos
Czernobog
Expert
Expert

I've tried re-enabling the relay, also re-deployed the DLR with no effect on the DHCP functionality. It sill behaves the same. I've opened a SR and will see how it goes.

Reply
0 Kudos
grosas
Community Manager
Community Manager

Hi @Czernobog - in the previous comment, you mentioned not seeing DHCP packets on the DLR uplink.  Did you play around with the filter?  (maybe just DHCP/BOOTP Port?)  If the reply from the ESG is not reaching the DLR for whatever reason, then the underlying issue is not with DLR functionality

_____________________________________
Gabe Rosas (VMware HCX team at VMware)
Blog: hcx.design
LinkedIn: /in/gaberosas
Twitter: gabe_rosas
Reply
0 Kudos
Czernobog
Expert
Expert

If removed the filter on the debugger for the DLR Uplink port, ran debug packet display vNic_2 but did not see any communication from the ESG. But since you said, that recieving DHCP replies on the DLR Uplink would be a sign of a fault, than this would be a normal situation?

When trying to run the debugger on the DLR Internal interface debug packet display vNic_10 I get the error:

pastedImage_1.png

Reply
0 Kudos
grosas
Community Manager
Community Manager

You should definitely confirm end to end what is happening with packets before you dive down and consider the feature to be flunking.

On the DLR, you can listen for for bootp/dchp packets on any interface using debug packet display interface any port_67

On the ESG, previously you shared a screenshot that confirmed the ESG is getting the request, and a reply is being sent, but I don't think we definitely confirmed where the reply is going.


Were the packets from 10.95.244.209 to 10.95.80.1 seen on the ESG's vNic_0 or on some other interface? If the ESG doesn't have a route back to the 10.95.80.0 segment, it would send them out towards its default route, instead of symmetrically back to the DLR.  If there is an asymmetric delivery of the packet, you would need to relax RPF checks on the Edge interfaces

Unless the DLR has some filter display bug, you should see the BOOTP/DHCP packet from the ESG to the DLR; if that packet is not arriving, it's not really accurate to say the DHCP Relay is malfunctioning, the first order should be to confirm end to end delivery, then if you can confirm it, you can rely on the SR support to dive in and troubleshoot how the relay feature. is behaving.

--

Gabe

_____________________________________
Gabe Rosas (VMware HCX team at VMware)
Blog: hcx.design
LinkedIn: /in/gaberosas
Twitter: gabe_rosas
Reply
0 Kudos
Czernobog
Expert
Expert

I have checked the communication in both ways between the DLR, ESG and the client (with a static ip) and it is working.

The replies leave the ESG and are routed to the correct interface on the DLR.

I was in contact with Support and the support engineer pulled some net traces on the esx host, I'm waiting for an analysis now.

We also did a small test and created a new ESG uplink in the client network and there the client grabbed an ip without issues.

Reply
0 Kudos
mmalesa
Contributor
Contributor

Hi Czernobog,

were you able to deal with this issue? We also use version 6.3.3 and have very similar problems.

I was able to configure a pool on an ESG and it was successfully passed through a DHCP Relay on a DLR, but when I tried to do the same with DHCP server on a linux machine, the responses could get through the DLR. I saw them on ESG between Linux box and DLR, but they did not go through the DLR.

Best regards.

Reply
0 Kudos
lukearntz
Contributor
Contributor

Were the packets from 10.95.244.209 to 10.95.80.1 seen on the ESG's vNic_0 or on some other interface? If the ESG doesn't have a route back to the 10.95.80.0 segment, it would send them out towards its default route, instead of symmetrically back to the DLR.  If there is an asymmetric delivery of the packet, you would need to relax RPF checks on the Edge interfaces

I was having the same issues. I could see the request and offer in the ESG, but the response wasn't reaching the requesting VM.

The problem was the ESG did not have a route back to the requesting VM network.

ESG Interface: 172.16.0.1/24

VM requesting DHCP address from subnet 172.16.200.0/24 (this request was relayed via  DLR).

ESG did not have a route back to the 172.16.200.0/24 network. After resolving that issue DHCP worked flawlessly.

I have DHCP pools for multiple subnets (e.g., 172.16.100.0/24 and 172.16.200.0/24) on the ESG. The DLR relays from each interface to the ESG and the VMs get the result, but the ESG needs to have a route back to the offered subnet.

Reply
0 Kudos
tanurkov
Enthusiast
Enthusiast

it is better to capture on the switch port on both directions with --dir1 and --dir0  on the DLR and on VM also

VMware Knowledge Base

then check TTL again only reason that is not receiving is to be doped on the filters. 

regards Dmitri

Reply
0 Kudos