Re: Guest OS not acquiring address from DHCP nor j...

gregdavisfromnj · ‎12-03-2008

I have a Windows Server 2003 R2 guest OS in an ESX server that is not able to acquire an IP address from DHCP, nor join a Windows Active Directory Domain elsewhere on the network. Ironically, I have cloned the image a few times, and the other images are able to get IP addresses from DHCP. The Domain Controller is hosting the DHCP server on a physical box next to and in the same subnetwork as the ESX server hardware. The problematic image, oddly, had a different driver for the network card (was using E1000 or such), whereas the other images that were working had an AMD branded driver for the NIC, and their network was dubbed "flexible" from the VMware Infrastructure Client summary page for the images.

I uninstalled and then reinstalled VMWare tools on the image, deleted the network card, and then added a new network card; all to no avail. The new network card came up as the AMD branded one, and it then became flexible per the Infrastructure client summary page.

If I go with a static IP, then the networking all works. I don't want to use that as a solution as there are many servers (virtual and physical) in my lab, and I want to minimize the exceptions to convention as much as possible.

I must have fat-fingered something wrongly. Any ideas what could be the problem?

Many Thanks,

Greg

NTurnbull · ‎12-03-2008

Hi, welcome to the forums!

If you go into device manager, go up to View and make sure that the show hidden devices is ticked. Do you see any 'ghost' nics? If you do remove them, and reboot. I take it that you dont have this problem on other vms?

Thanks,

Neil

Thanks, Neil

Texiwill · ‎12-03-2008

Hello,

Moved to the VIrtual Machine and Guest OS forum.

Investigate the system logs to determine if the network is even starting. Is it on a vSwitch with access to the proper network?

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

SearchVMware Blog: http://itknowledgeexchange.techtarget.com/virtualization-pro/

Blue Gears Blogs - http://www.itworld.com/ and http://www.networkworld.com/community/haletky

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

gregdavisfromnj · ‎12-03-2008

I was mistaken in stating that DHCP was working for the other images. It is not. The issue seems to be passing DHCP traffic through the virtual switch between the physical network and the virtual network. Any ideas on how to do that? Is a DHCP relay functionality built into the ESX (3.5) server?

gregdavisfromnj · ‎12-03-2008

I didn't check the logs on the ESX host OS, but service console (vswif0) is getting a DHCP address when the host boots. Also, networking with static IPs on the guest OS images works. The ESX host was configured with a single vswitch0 with 64 ports, and which two port groups exist: the a "Service Console," and a "VM Network." Seven guest images are using the "VM Network" group, not the "Service Console" group. From the sound of things, "Service Console" is for management purposes, like ssh access to the ESX host, or VI Client access from other machines on the physical network. Both port groups are connected to the physical NIC through the virtual switch, from the VI Client point of view on the host's network properties page.

I am starting to think the solution is either a dhcrelay daemon, or DHCP forwarding through a firewall. I would expect that the virtual switch would be configurable to support one or both of those. I can't find a best-practices document anywhere as far as bridging the physical NIC to many virtual images, rather this switch approach which implies that the images make up a separate network segment.

Still stuck.

mcowger · ‎12-03-2008

The vSwitch is a layer2 only device, and so it doesn't participate in any of the DHCP stuff. Your pSwitch will need to act as a DHCP relay agent for the subnets on which your portgroups/VM live.

--Matt

--Matt VCDX #52 blog.cowger.us

gregdavisfromnj · ‎12-04-2008

My pSwitch networking all the physical machines together is 2 daisy-chained unmanaged 8-port switches; I cannot configure it for anything other than power-on or power-off. The goal is not to have the virtual machines in a separate network or subnetwork. I have roughly 12 physical machines including the ESX server. One has a static IP and is a DHCP/DNS server and Domain Controller. All the others have dynamic IPs. I have, so far, 7 virtual machines in the ESX server. The DHCP server has an ip address of X.X.25.5, and I want every other machine (virtual or physical) to be in the same network: X.X.25.10-100.

The Service Console portgroup has a vswif0 interface in it, which is able to acquire a DHCP address properly. Why does that one work? What is so special about a service console port group or a vswif interface? Is there something about ESX server that allows only one DHCP address to ever be leased unto the physical box? Does every DHCP request coming from inside an ESX server have the same MAC address in it, as seen by the physical world? Seems time for wireshark.

Thanks,

Greg

gregdavisfromnj · ‎12-04-2008

I used a packet sniffer to determine that there was no DHCP broadcast traffic leaving the virtual switch, at least from the virtual machine portgroup. I saw traffic from other physical machines, on the physical network so the DHCP server is not broken. What is blocking the broadcast traffic from the VM Network portgroup leaving the ESX server? How can I unblock it?

Thanks,

Greg

apatel1 · ‎12-04-2008

Hi Greg, based on your single broadcast domain, you should not need to have any DHCP relay agent/forwarding as some of the others might have suggested. The DHCP server is on the same broadcast domain as all of the potential DHCP clients. A DHCP relay agent would only be required if, for example, you had a DHCP server on the 192.168.1.0/24 metwork and you wanted it to serve clients on both the 192.168.1.0/24 and the 192.168.2.0/24 networks. The device doing the routing between these two segments would need to be configured to forward the DHCP traffic between the two segments, otherwise, the DHCP server would only be able to serve clients on the same segment as itself.

Would you happen to have a VLAN ID number entered in the VM Network port group settings? If so, you might want to make sure that that matches the VLAN ID entered in the Service Console port group settings since we know that DHCP works properly for that port group. Based on your description of your setup, that field should be blank in any port group you create, since your switch doesn't allow you to create VLANs.

Hope that helps! Please help me out by marking my response as "helpful" or "correct" if you feel that it was useful!

-Amit

Please help me out by awarding points for a "helpful" or "correct" response if you feel that it was useful! -Amit

gregdavisfromnj · ‎12-05-2008

Good point. As for broadcast traffic not leaving the switch, I found one VM in the ESX server, and in that troublesome portgroup, that actually can get a DHCP address. I verfied this with a packet sniffer. That VM is a domain controller itself, but does not have DHCP or DNS running. It is meant to be a standalone instance of a SharePoint server.

The only other difference between that VM and the problematic ones, is the (Server 2003) Advanced properties page for the network card driver. All VMs are using the VMware Accelerated AMD PCNet Adapter driver, dated 9/29/07, and version 2.0.1.8. The vmxnet.sys driver files are all the same size. On the misbehaving VMs, the advanced properties page only shows the following keys/values:

MTU=1500

Network Address=Not Present

TsoEnabled=1

However the properties page for the VM that can get a DHCP address has these additional keys/values:

External PHY=Autodetect

Full Duplex=Use Adapter Setting

IEEE 802.1p Tagging=Off

MP Mode=Off

TCP/IP Offload=Off

TP Mode=On

Seems suspicious.

I tried removing the NICs from the bad VMs and adding new ones, to see if that would change their setup. That didn't work. Tried updating VMware Tools on the VMs; no change. Someone mentioned deleting ghost or unused devices in the Device Manager. There are WAN Miniport drivers in the "hidden" list, but nothing that stands out as a dangling/malfunctioning/previous NIC.

apatel1 · ‎12-05-2008

It doesn't sound like you have any ghost devices, and those are usually only real problems when you have static IPs, like if you wanted to assign a static IP to a vNIC but that IP was previously assigned to a no longer-existing NIC. But either way, to be sure that there are none, you need to set the environment variable "devmgr_show_nonpresent_devices=1" before launching Device Manager and telling it to show hidden devices. You can do this in the command prompt and then launch Computer Management or Device Manager by running "compmgmt.msc" or "devmgmt.msc" in the same command prompt (it needs to be the same one to ensure that it picks up the correct environment).

After that, it sounds like you've isolated the issue to broadcasts not leaving your pSwitch, correct? Therefore, when looking at the traces, you see the DHCPDISCOVER from the "problem" DHCP client VM reaching your DHCP server on the pSwitch, but then the DHCPOFFER it sends in response doesn't make it back to the VM?

Please help me out by awarding points for a "helpful" or "correct" response if you feel that it was useful! -Amit

gregdavisfromnj · ‎12-05-2008

At this point, I have isolated the problem to specific virtual machines. So the virtual switch is probably not the problem. With a packet sniffer on the physical network, I either see no DHCP traffic coming out of the vswitch at all (where there should be), or the handful of DHCP messages assigning an address to a VM that for some reason works properly. The NIC driver for the good VM has more options in the advanced properties page than the NIC drivers in the bad VMs, even though it is the same driver. The OSes are the same. In fact, I created the bad VMs by cloning the good VM a few times, renaming the clones, uninstalling the components I didn't need, and changing the machine names manually. Because I didn't have it nor know how to use it, I didn't use Sysprep in that process.

Thanks,

Greg

apatel1 · ‎12-05-2008

You could try setting TsoEnabled=0 on the bad VM (this setting controls TCP Segmentation Offload, which, I believe, is only available with the Enhanced vmxnet adapter) and see if that helped. What edition of Windows Server 2003 R2 are these VMs? Standard, Enterprise, or Datacenter? 32-bit or 64-bit? And what is the vNIC used in each of them (shown in the Edit VM Settings dialog)?

Please help me out by awarding points for a "helpful" or "correct" response if you feel that it was useful! -Amit

gregdavisfromnj · ‎12-08-2008

Tried the above settings in the VMs, did not work. Found some "ghost" devices on the VMs, most likely left over from a p2v before my time; uninstalled/rebooted to no avail. I am going to try to make a better boiler plate VM and, next time, use SysPrep when cloning. Maybe that will make a difference. In summary, I give up.

Thanks,

Greg

kmayfield · ‎01-22-2009

All,

I am having a similar issue. I have two host machines - both Windows XP SP3 - each running its own copy of VMWare Workstation (NOT SERVER). On host 1, I have two Debian virtual machines. On host 2, I have two Red Hat virtual machines. I experienced this issue on the host running the two Debian machines. While doing an /etc/init.d/networking restart on either of the Debian virtual machines, I could monitor the request and response mechanisms on the Red Hat boxes (i.e. the virtual interfaces on those virtual machines saw the DHCPREQUEST, DHCPDISCOVER, and DHCPOFFER messages). So, I am certain the Debian hosts are requesting correctly, and the server is replying correctly.

After much consternation and configuring and re-configuring of the Debian hosts, I finally decided to set one of the VMs back to NAT addressing vice bridged addressing to debug the interface between the VMs and the base host. The NAT'd VM obtained an IP address from the virtualized VMWare DHCP server as expected. Then, without changing the second Debian VM's configuration (i.e. it was still bridged), I issued a network restart. The interface came up like a charm from the remote DHCP server. So, I set the first VM back to bridged mode and issued a network restart. It too came up like a charm.

It seems that the vmnet0 virtual interface gets confused and doesn't forward the DHCPOFFER messages for some reason. Yesterday, I was switching the Debian hosts between a corporate network and a stand-alone LAN, so I wonder if the varying DHCP replies caused the vmnet0 interface to malfunction. Certainly, a reset to NAT then back to bridged could be a workaround (assuming it works repeatedly, which I have not checked), but I'd sure like to know why the virtual interface is getting confused ...

Any ideas?

All

Guest OS not acquiring address from DHCP nor joining Windows Domain