alpatio
Contributor
Contributor

PXE won't boot - reset needed

Hello,

I'm creating new VM with UEFI boot, these are configured with VMXNET3 and it should boot with PXE to connect to a microsoft WDS server.

Our Setup use to work,  but in some cases Virtual Machines won't boot from PXE, the problem is that apparently any request is going to DHCP server, at least DHCP server logs remains empty. I

Virtual machine remains in this state until network boot fails for timeout.

pastedImage_2.png

I've have seen that If we want to continue with the installation we could do these alternatives.

stop/start-vm.

  or

reset VM

or

disconnect/connect Network

But any of these operations use to  require a couple of attempts, after that the WDS installation starts, boot is loaded without trouble and installation continues until the end without troubles.

Now our DHCP server is Microsoft, but I've already done some test with Open DHPCERVER, AND in both cases the behaviour is the same.

As I don't have any log at any place, then the problem apparently seems that is related with the Virutal Machine network card when It boots from Network.

Our  PXE setup is configured for only UEFI Virtual Machines, as we don't have WDS for Bios/legacy virtual machines. We are working with ESX 6.0u3, and new virtual machines  reports version 11

Are vmxnet3 working well with UEFi PXE?

Regards

0 Kudos
4 Replies
asajm
Expert
Expert

Hi alpatio

Check No PXE Boot from VM with EFI. Using Microsoft WDS

If you think your queries have been answered
Marking this response as "Solution " or "Kudo"
ASAJM
0 Kudos
alpatio
Contributor
Contributor

Hi,

All suggestion of this post that you Refer were checked, And I believe that our setup is correct, because Is working many times, but some times fails. for example If I start 4 Virtual machines to boot for WDS some times I'll have one that fails.

Wireshark was installed on our DHCP server, and when the Virtual Machine starts and remains in this stuck state ANY Broadcast package request appears on Wireshark, then apparently the Virtual Machine or the VMXNET3 is not working well when it performs the first DHCP request

Others checks done.

And when we checks all logs/events we can't see any clue:

- There is any request  to our DHCP server

- WDS eventlog/debug was enabled, but remains empty when this situation happens

After several restarts/resets/etc... the DHCP server receives the request , and Broadcast package is shown into Wireshark.

After resets all  process works fine,

DHCP receives the request, We could see this kind of log.

10,09/03/19,12:24:15,Assign,10.254.0.73,,005056B43EF6,,1988612274,0,,,,0x505845436C69656E743A417263683A30303030373A3F3F3F3F3A3F3F3F3F3F3F,PXEClient:Arch:00007:????:??????,,,,0

And WDS eventvier is populated with deployment methods.

Until now the only methods to 'fix' the stucked VM was

  stop/start Virtual machine,

  disconnect/connect the network interface

  reset VM (from VM Console)

And some times these reset-operations are required to be performed twice

0 Kudos
typerlc
Contributor
Contributor

I have the same problem, and I have found some interesting behaviour, and a possible "solution".

When this problem happens for me (vsphere 6.7u3), the pxe boot code says it is trying to find an IP address from a DHCP server.

pastedImage_0.png

But using tcpdump from the DHCP server, I can see no DHCP requests being broadcast.  Instead, there are lots of RARP requests being sent by the VM. e.g.

pastedImage_1.png

And the PXE boot DHCP client eventually times out and gives up.

If I migrate the VM so it is on the same ESXi host as the DHCP server (another VM), there is no problem, and it proceeds successfully.  For me, it only happened when the VM & DHCP server were on different ESXi hosts.

After some digging regarding the RARP requests from the VM, I found pages that linked this behaviour to the "Notify Switches" option in the vSwitch settings.  With Notify Switches set to Yes ... I see all of the RARP requests, and DHCP doesn't work as expected.  If I set Notify Switches to No ... then the RARP requests aren't sent by the VM, and DHCP starts working i.e. it immediately broadcasts a DHCP request, and gets the response.

Based on my reading ... it would be better to have Notify Switches set to Yes to improve speed of failover if a NIC dies.  But unfortunately, it seems to have this negative side-effect.

So, try turning "Notify Switches" to No, and see if that makes any difference to you.

0 Kudos
TIMETRIAD
Contributor
Contributor

It's an old post, but I'll add my recent solution for anyone using VMware Workstation 15 and up; not sure about 14 and below because I couldn't test those versions.

I'm running VMware Workstation 16.1.0 on a Windows 10 Pro (20H2) host.

I'm running a Windows Server 2019 Evaluation as a home lab for Windows deployment lab exercises. I have WDS, WSUS, SQL, and MDT running on it.

As for the PXE Boot issue. This seems to be an intermittent issue as I've encountered this issue about a year ago when using Workstation 15, and I stumbled on a solution that worked for me. Fast forward to now - Workstation 16 - I ran into the same problem again today; even after I just got done running a few images on the same VM I use for testing Windows 10 Pro deployments.

So, I remembered that I needed to check VMware's Virtual Network Editor and change the Bridged network from Automatic to my Gigabit LAN network card. This immediately solved my problem, and I was back in business.

In VMware Workstation, go to Edit > Virtual Network Editor > Change Settings button (lower right).

Be sure the Bridged network (VMnet0) is selected in the list, and then change the Bridged to: drop down to your preferred physical network adapter. Click OK, and then test your PXE boot to see if this solved your problem.

You can always set it back to Automatic if needed...

Some people are using NAT instead of Bridging, so it would be wise to try that one too. However, I tend to see more issues with that one than bridging, and I like to remote desktop into some of my VM's and NAT becomes an issue for me.

 

0 Kudos