VMware Cloud Community
agregg
Contributor
Contributor

DHCP not exiting ESXi to physical network

I've spent a great deal of days researching and troubleshooting and tearing my network apart for this, so any help or insights will be greatly appreciated!

Brand new ESXi 8.0b install running on one of these new Lenovo ThinkStation P360 Ultras for my home lab.

I have really simplified my environment to test this issue. I'm down to two VMs and one physical host. No VLANs, or anything like that. Default install of esxi afaik.

VM1-dhcp: Ubuntu 22.04 running isc-dhcp-server

VM2-watcher: Ubuntu 18.04 correctly acquiring dynamic IP, and tcpdump

Physical: Windows 7 NOT acquiring dynamic IP and Wireshark.

VM1 dhcp logs look like:
DHCPDISCOVER from PHYSMAC via ens160
DHCPOFFER on 10.0.0.198 to PHYSMAC via ens160

over and over.

VM2 tcpdump sees both the discover and the offer (broadcast). Note, VM2 correctly goes through the entire DHCP sequence and obtains an IP address.

The physical host (plugged directly into the lan port on the server) does not get the DHCPOFFER packet.

I've gone so far as to use netcat (sudo nc -u -b 255.255.255.255 -p 67 68) and that also does not make it to the physical network. However, if I send to destination port 67 or 69, the traffic shows up in wireshark on the physical host. For a while, I thought my switch was doing something weird, but I've since removed that and plugged the test PC directly into the server. So afaict, the packet is getting lost in ESXi transitioning from virtual to physical, (or the NIC??)

I did not have this issue on an ESXi 6.5 that I was running for years.

agregg_0-1679259688386.png

 

Tags (1)
Reply
0 Kudos
5 Replies
ggggg_NZ
Contributor
Contributor

Hi there, 

 

Just curious is all devices are on the same subnet?

I have had an similar issue before and enabling forged transmitts and Promiscuous mode on the vSwitch fixed it for me but my Environment was nested.  you could give that a go.

 

Reply
0 Kudos
agregg
Contributor
Contributor

Thanks for the reply. They are on the same subnet, and I have set the vswitch and port group to allow both Promiscuous and Forged transmits. I hadn't rebooted the VM (if that matters) so I did the reboot, but didn't seem to change anything.

The VM dhcp server is on:

ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.24 netmask 255.255.255.0 broadcast 10.0.0.255
inet6 fe80::20c:29ff:fe30:c972 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:30:c9:72 txqueuelen 1000 (Ethernet)

tcpdump on the same VM says this:

02:46:44.445869 40:8d:5c:52:b0:89 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0x0, ttl 128, id 31046, offset 0, flags [none], proto UDP (17), length 328)
0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 40:8d:5c:52:b0:89, length 300, xid 0x64525595, secs 7168, Flags [Broadcast] (0x8000)
Client-Ethernet-Address 40:8d:5c:52:b0:89
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message (53), length 1: Discover
Client-ID (61), length 7: ether 40:8d:5c:52:b0:89
Requested-IP (50), length 4: 10.0.0.198
Hostname (12), length 8: "PHYSHOST"
Vendor-Class (60), length 8: "MSFT 5.0"
Parameter-Request (55), length 12:
Subnet-Mask (1), Domain-Name (15), Default-Gateway (3), Domain-Name-Server (6)
Netbios-Name-Server (44), Netbios-Node (46), Netbios-Scope (47), Router-Discovery (31)
Static-Route (33), Classless-Static-Route (121), Classless-Static-Route-Microsoft (249), Vendor-Option (43)
---
02:46:44.446320 00:0c:29:30:c9:72 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
10.0.0.24.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x64525595, secs 7168, Flags [Broadcast] (0x8000)
Your-IP 10.0.0.198
Server-IP 10.0.0.24
Client-Ethernet-Address 40:8d:5c:52:b0:89
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message (53), length 1: Offer
Server-ID (54), length 4: 10.0.0.24
Lease-Time (51), length 4: 14400
Subnet-Mask (1), length 4: 255.255.255.0
Domain-Name (15), length 20: "my.domain.com"
Default-Gateway (3), length 4: 10.0.0.1
Domain-Name-Server (6), length 4: 10.0.0.1

 

 

Reply
0 Kudos
agregg
Contributor
Contributor

I added a network card to the host and configured a new vSwitch and port group to use it. Then I moved the dhcp VM there, and DHCP started working.  So it seems to be something to do with the onboard NIC blocking DHCPOFFER packets. Sigh.

agregg
Contributor
Contributor

FWIW, I installed Ubuntu onto this P360 Ultra as the primary OS just to make sure ESXi wasn't somehow contributing to the issue, but the host still has the DHCP issue on the first port. The second port works fine.

This is what Lenovo says the NICs are:

Intel I219-LM (Integrated)
Intel Ethernet Connection I210-AT(Integrated)

This is what Ubuntu lspci says the NICs are:

6b:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-LM (rev 03)
6c:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

This is a pretty annoying problem that I spent far too long troubleshooting. Opened a case with Lenovo. Their entire and only response was:

  • Date Created: 03/22/2023 12:42 AM
  • Date Last Updated: 03/22/2023 01:57 PM
  • Status: Solved- Fixed on first response

IDK if it's a problem with the nic (or vPro/MEBx), or the way they have integrated it into this system. I did try disabling all of the remote management stuff in the BIOS, but did not change anything.

Reply
0 Kudos
ggggg_NZ
Contributor
Contributor

Thanks for keeping us updated. 

Im stumped!

Reply
0 Kudos