TimMann's Posts

Thanks to those who tried "workaround option 2". If the instructions didn't work for you, please try to give some details about what you did and what state you ended up in. > When the installa... See more...
Thanks to those who tried "workaround option 2". If the instructions didn't work for you, please try to give some details about what you did and what state you ended up in. > When the installation is marked as invalid no one of these workarounds can be applied, right? How can we delete the mark to try to boot ESXi 6.7 U3 with the firmware boot mode of legacy BIOS? I didn't explain how to do that because I was thinking it would be more straightforward and have fewer pitfalls to get your previous installation to boot again, then reapply the update. On the other hand, reapplying the update could reapply the 6.7u3 bootloader, so you might end up needing to do the workaround twice -- once to get your previous installation to boot, then again immediately after the 6.7u3 re-upgrade, just before the first boot into 6.7u3. Ugh. For the adventurous, you can find the boot.cfg file of the bootbank containing the 6.7u3 installation and change the line that reads bootstate=2 or bootstate=3 and change it to bootstate=1. (If you see bootstate=0, the bootbank is valid, probably your older installation.) That will give the bootbank that was marked invalid another chance to try to boot. > And will this issue be fixed with a new update ZIP file so that we can install it and boot with UEFI mode? Of course this bug will be fixed in a future update. I'm just trying to be helpful for folks who are affected right now.
Here is an alternative workaround you can try if switching to legacy BIOS mode doesn't work for you. It's not easy, but I tried to make the instructions detailed. If anyone tries this, please let... See more...
Here is an alternative workaround you can try if switching to legacy BIOS mode doesn't work for you. It's not easy, but I tried to make the instructions detailed. If anyone tries this, please let me know how it went and if there is anything I can clarify further in the instructions. First, a note on rollback: Any time you install or upgrade ESXi and the first attempted boot into the new installation fails, that installation is effectively marked as invalid and you do not get the chance to try to boot it again.  ESXi recovery automatically rolls back to the previous installation, if any.  So if you would like to apply a workaround to get 6.7u3 to boot (not just to go back to 6.7u2), be sure to apply it prior to the first boot into the new installation.  If possible, install in legacy BIOS mode and switch to UEFI mode only after applying the workaround. Workaround option 1: Switching the firmware boot mode from UEFI to legacy BIOS will allow most affected machines to boot.  On some machines, the option to boot in legacy BIOS mode may be called CSM. Workaround option 2: Some machines may not have the option to boot in legacy BIOS mode.  On such a machine, you can manually copy the 6.7u2 bootloader into the system partition to replace the 6.7u3 bootloader. 1) Get a copy of the 6.7u2 bootloader.  On an ESXi installer ISO image, the bootloader is located at /efi/boot/bootx64.efi of the ISO9660 filesystem.  Copy this file to a USB drive.  Rename it to mboot64.efi.  Plug the USB drive into the affected machine. (Note: do not use the EFI shell to copy \EFI\BOOT\BOOTx64.EFI directly from a CD or ISO image.  That would give you the wrong file, taken from the El Torito boot image instead of the ISO9660 filesystem.) 2) Boot the affected machine into the EFI shell (not into ESXi).  If your machine does not offer the EFI shell as a built-in boot option, try http://refit.sourceforge.net/ for a downloadable boot manager that includes an EFI shell. 3) Find the filesystem names that EFI has assigned to the system partition on the boot disk and the USB drive containing your 6.7u2 bootloader.  You can do that by requesting directory listings of each filesystem with the EFI "dir" command, working upward from fs0:, until you find the ones with the expected contents.  In the example below, fs0: is the system partition and fs5: is the USB drive.    Shell> dir fs0:    Directory of: fs0:\      08/05/19  02:38a <DIR>            512 EFI      08/05/19  02:38a                   30 syslinux.cfg      08/05/19  02:38a               61,288 safeboot.c32      08/05/19  02:38a               93,672 mboot.c32              3 File(s)     154,990 bytes              1 Dir(s)    Shell> dir fs0:\efi    Directory of: fs0:\EFI      08/05/19  02:38a <DIR>            512 .      08/05/19  02:38a <DIR>              0 ..      08/05/19  02:38a <DIR>            512 BOOT      08/05/19  02:38a <DIR>            512 VMware              0 File(s)           0 bytes              4 Dir(s)    Shell> dir fs0:\efi\vmware    Directory of: fs0:\      08/05/19  02:38a <DIR>            512 .      08/05/19  02:38a <DIR>            512 ..      08/05/19  02:38a              172,224 mboot64.efi      08/05/19  02:38a               94,432 safebt64.efi              2 File(s)     266,656 bytes              2 Dir(s)    Shell> dir fs5:\    Directory of: fs5:\      03/26/19  01:52p              171,400 mboot64.efi              1 File(s)     171,400 bytes              0 Dir(s)    4) Copy your 6.7u2 mboot64.efi file onto the system partition, replacing the one that's already there.  Continuing the example above:    Shell> copy fs5:\mboot64.efi fs0:\efi\vmware\mboot64.efi    Overwrite fs0:\EFI\VMware\mboot64.efi? (Yes/No/All/Cancel):y    copying fs5:\mboot64.efi -> fs0:\EFI\VMware\mboot64.efi     - [ok]
Thanks to everyone who reported this. We've found the bug and are testing a fix. Booting in legacy BIOS mode is the easiest workaround. However I realize some of you have systems that won't bo... See more...
Thanks to everyone who reported this. We've found the bug and are testing a fix. Booting in legacy BIOS mode is the easiest workaround. However I realize some of you have systems that won't boot in that mode for one reason or another. You might consider reinstalling 6.7u2, using the option to preserve VMFS volumes on the disk. That's the next most straightforward solution for now, though obviously there is some pain in that... I'll post again if I think of a better workaround or a way to make the fix available in advance of the next release.
1) > it opens each file 3 times I finally got some time to look at your packet trace. Your trace shows that every TFTP read request is getting transmitted two or three times. The retransmi... See more...
1) > it opens each file 3 times I finally got some time to look at your packet trace. Your trace shows that every TFTP read request is getting transmitted two or three times. The retransmissions are coming from the same UDP source port, so if the server is actually receiving all of them, they should not appear as two or three separate opens, but as duplicate packets (retransmissions), which is harmless. It's odd that so many retransmissions are occurring, though. Maybe the server is a bit slow to respond. Anyway, mboot certainly is not "opening the file three times and closing it once". It has a matching close for every open. The retransmissions are happening at a lower layer, inside pxelinux's TFTP implementation. 2) > I really think having a patched mboot.c32 binary would be the best solution in this case. Again, the issue here is in pxelinux, not mboot. It's stock pxelinux 3.86 that is lacking the code to send a "close" packet when a file is closed before being fully read. mboot does call close, but stock pxelinux 3.86 treats that as a no-op. The only way to "fix" the original issue by changing only mboot would be to delete the mboot feature that gets the size of all files before downloading them so that it can generate a progress bar. I suppose the progress bar could assume all files are the same size, but that's very far off and so the bar would be come quite jumpy. You can find source for VMware's mboot in our open source disclosure ISO. There is also source on github at GitHub - vmware/esx-boot: The ESXi bootloader for the version of mboot used in 6.7u1.
p.s. Attached is the pxelinux.0 binary I tested and that we're using on own internal network at VMware.
My patch is for the base pxelinux. It should work on upstream syslinux-3.86 sources, though it's true that I tested it on a very lightly modified syslinux-3.86 tree that we use internally. I beli... See more...
My patch is for the base pxelinux. It should work on upstream syslinux-3.86 sources, though it's true that I tested it on a very lightly modified syslinux-3.86 tree that we use internally. I believe the sources for that are on our open-source disclosure package ISO image available from https://my.vmware.com/group/vmware/details?downloadGroup=ESXI670-OSS&productId=742 if you want to look. In your earlier message you discovered that our mboot.c32 is not the same as the one in syslinux-3.86. That's because it's actually a very different program that confusingly has the same name. I didn't need to change our mboot to fix this issue. Our mboot itself calls "close" on all the files it opens to check their sizes; it is the base pxelinux that treats the close as mostly a no-op and fails to send an abort. My patch fixes that. If you are interested, you can find recent sources for our mboot here: GitHub - vmware/esx-boot: The ESXi bootloader
In case you're still interested... VMware's bootloader opens all the boot modules to get their sizes, then later opens each one again to actually read it. In the bootloader code, we do close t... See more...
In case you're still interested... VMware's bootloader opens all the boot modules to get their sizes, then later opens each one again to actually read it. In the bootloader code, we do close the file, but the pxelinux 3.86 back end does not send a TFTP ERROR packet when that happens; calling close() just makes pxelinux forget about the file. I'm attaching a patch for pxelinux 3.86 that fixes this issue. It should apply fine to either the syslinux 3.86 source from our ODP ISO (see Download VMware vSphere) or the upstream 3.86.
Unlike Linux or Windows, ESXi has no such distinction as UEFI install versus legacy BIOS install. You can freely switch your host back and forth between UEFI and legacy BIOS after installation an... See more...
Unlike Linux or Windows, ESXi has no such distinction as UEFI install versus legacy BIOS install. You can freely switch your host back and forth between UEFI and legacy BIOS after installation and ESXi will continue to boot. Of course it still makes sense to ask whether the current boot used UEFI or legacy BIOS. I don't know offhand of an officially supported way to do that. However, from the ESXi shell you can use the following vsish invocation: vsish -e get /hardware/firmwareType
ESXi's mboot.c32 and mboot64.efi have similar names to a multiboot plugin that comes with syslinux/pxelinux, but they aren't meant to be general-purpose multiboot standard bootloaders that could ... See more...
ESXi's mboot.c32 and mboot64.efi have similar names to a multiboot plugin that comes with syslinux/pxelinux, but they aren't meant to be general-purpose multiboot standard bootloaders that could boot a Linux kernel.  ESXi started off using the multiboot standard, but we've been slowly making additions (which should be compatible, but aren't tested with anything other than ESXi). So although the newest ESXi mboot should be able to boot old ESXi versions going back quite far (possibly all the way to 3.5, though I haven't personally tested back that far), I would be more surprised if it does boot Linux than if it doesn't.
I personally wrote the document that you linked above and personally developed and tested every scenario. We network boot in UEFI mode daily within VMware. The one thing that does not work is ... See more...
I personally wrote the document that you linked above and personally developed and tested every scenario. We network boot in UEFI mode daily within VMware. The one thing that does not work is chaining to our bootloader from GRUB, due to an issue with GRUB as previously discussed.  Note that the document above does not describe any scenarios that involve chaining from GRUB, only booting mboot64 directly or chaining from ipxe to mboot64. It would be good for us to document the incompatibility with chaining from GRUB. I wasn't aware of it the time I was working on the document. Perhaps a KB article and/or a note in the next rev of the document.
Again, the reason that chainloading from EFI GRUB to VMware's bootloader does not work is because of bug(s) in GRUB. It's not because of a "flawed implementation" by VMware. I did look into th... See more...
Again, the reason that chainloading from EFI GRUB to VMware's bootloader does not work is because of bug(s) in GRUB. It's not because of a "flawed implementation" by VMware. I did look into this further after my last post. I put a workaround into the VMware bootloader (mboot64.efi) that compensates for the GRUB bug in device path formation. With the workaround, it is possible to chain from GRUB to mboot and boot from local media. The workaround will be included in the upcoming major ESXi release that is in beta now. mboot64 is backward compatible, so it will also be possible to boot older versions of ESXi using the new mboot version. Unfortunately, though, even with the workaround, chaining from GRUB to mboot64.efi still fails to network boot ESXi successfully. It appears that when chainloading, GRUB calls Stop on the EFI firmware's PXE base code protocol before handing off, so the protocol cannot be used to load the rest of ESXi over the network. I tested this only with GRUB2, so it's possible that it would work with GRUB Legacy, but I doubt it. I tried adding a workaround in mboot that calls Start again on the PXE base code protocol, but that doesn't work -- UEFI firmware apparently can't restart the protocol once it has been stopped. If you only need to boot ESXi off of local media after chaining from GRUB to mboot, that will work with the new mboot version.
There should be a way to achieve what you want to do. The following is mostly from memory since I want to answer this quickly and then go home for the weekend. First, see this new document:... See more...
There should be a way to achieve what you want to do. The following is mostly from memory since I want to answer this quickly and then go home for the weekend. First, see this new document: Installing VMware ESXi 6.0 Using PXE I think mboot (which is "bootx64.efi" in the PXE case) should actually be looking for boot.cfg relative to where it was loaded from. So if the DHCP server tells the client that the filename to load is foo/bar/mboot64.efi, then I am pretty sure mboot will look for boot.cfg first in foo/bar/boot.cfg. If it's not found there, mboot will try /boot.cfg. It may make a difference whether the filename given out by the DHCP server starts with "/". If you have access to the logs from your TFTP server, watch to see what filenames mboot tries to access after being loaded. Additionally, mboot does take a -c command line option saying where the boot.cfg file is. However, if you have your DHCP server telling the client that mboot is the filename to load, there is no way to pass a command line option to mboot. You could try using iPXE as the first file to load, then have iPXE chain into mboot. However, I (somewhat vaguely) remember there was an issue with the way iPXE passes options and you may find that the mboot from current ESXi releases can't recognize them. The boot.cfg files we ship on our installer CD's annoyingly give all the filenames as absolute pathnames, but you can edit the file to change that. There is syntax to set up a common path prefix, but you also have to delete all the leading "/" characters from the filenames to make it work. https://www.vmware.com/resources/techresources/10508 gives examples.
We have gotten a couple of queries about this, so I spent some time looking into it. The problem stems from a GRUB bug. According to the UEFI specification, file path strings in device paths a... See more...
We have gotten a couple of queries about this, so I spent some time looking into it. The problem stems from a GRUB bug. According to the UEFI specification, file path strings in device paths are supposed to be null-terminated (even though they also have a length field), but GRUB does not terminate them; it fills them with non-null characters right up to the end as given by the length field. Our bootloader has code that relies on the null termination to be there. When it's missing, the bootloader constructs an incorrect pathname for itself and chokes when trying to find the boot.cfg file in the same directory as itself. I am looking at adding a workaround in a future release, but there may also be other problems chainloading our bootloader from GRUB. For the USB stick you're trying to create, you might try using rEFInd to make the menus instead of GRUB.  (See The rEFInd Boot Manager.) I've had good results with that. Alternatively, iPXE can be used to create menus and chainload too.
It looks like you are inadvertently using the mboot.c32 plugin that comes with syslinux/pxelinux. That's the wrong plugin; you need the mboot.c32 that comes with ESXi. Get it from the latest vers... See more...
It looks like you are inadvertently using the mboot.c32 plugin that comes with syslinux/pxelinux. That's the wrong plugin; you need the mboot.c32 that comes with ESXi. Get it from the latest version of ESXi that you're trying to boot -- newer versions of mboot are backward compatible with older versions of ESXi. You will find our mboot.c32 in the root directory of the ESXi install CD. The reason our documentation says to use syslinux 3.86 is that ESXi's mboot.c32 plugin is built against the syslinux v3/v4 plugin API and specifically tested against 3.86. The newer plugin API from syslinux v5 or v6 is incompatible with plugins built against the older API. I am trying to get someone to work on porting our mboot to the newer syslinux API to make it easier for customers like you who are PXE booting multiple different OSes and want to use the latest syslinux. But no promises when/if that will happen. Could you please comment in this thread if you have success with the correct mboot.c32?
Great info, thanks. I was helping with ESXi bringup on Mac Pro here, and the Mac I was using had firmware MP61.0116.B05. Probably all the Macs we were working with had that firmware, so we didn't... See more...
Great info, thanks. I was helping with ESXi bringup on Mac Pro here, and the Mac I was using had firmware MP61.0116.B05. Probably all the Macs we were working with had that firmware, so we didn't know there was an issue with .B04. (There were some issues with .B05, but we put in workarounds for those.)
Tangentially to this thread: ESXi cannot function without ACPI. It's been years since ESX could run on a non-ACPI system. If you found a "disable ACPI" option on an HP server and ESXi was able to... See more...
Tangentially to this thread: ESXi cannot function without ACPI. It's been years since ESX could run on a non-ACPI system. If you found a "disable ACPI" option on an HP server and ESXi was able to boot with that selected, it wasn't really disabling ACPI. Who knows what it was really doing.