Here is an alternative workaround you can try if switching to legacy BIOS mode doesn't work for you. It's not easy, but I tried to make the instructions detailed. If anyone tries this, please let me know how it went and if there is anything I can clarify further in the instructions.
First, a note on rollback: Any time you install or upgrade ESXi and the first attempted boot into the new installation fails, that installation is effectively marked as invalid and you do not get the chance to try to boot it again. ESXi recovery automatically rolls back to the previous installation, if any. So if you would like to apply a workaround to get 6.7u3 to boot (not just to go back to 6.7u2), be sure to apply it prior to the first boot into the new installation. If possible, install in legacy BIOS mode and switch to UEFI mode only after applying the workaround.
Workaround option 1:
Switching the firmware boot mode from UEFI to legacy BIOS will allow most affected machines to boot. On some machines, the option to boot in legacy BIOS mode may be called CSM.
Workaround option 2:
Some machines may not have the option to boot in legacy BIOS mode. On such a machine, you can manually copy the 6.7u2 bootloader into the
system partition to replace the 6.7u3 bootloader.
1) Get a copy of the 6.7u2 bootloader. On an ESXi installer ISO image, the bootloader is located at /efi/boot/bootx64.efi of the ISO9660 filesystem. Copy this file to a USB drive. Rename it to mboot64.efi. Plug the USB drive into the affected machine.
(Note: do not use the EFI shell to copy \EFI\BOOT\BOOTx64.EFI directly from a CD or ISO image. That would give you the wrong file, taken from the El Torito boot image instead of the ISO9660 filesystem.)
2) Boot the affected machine into the EFI shell (not into ESXi). If your machine does not offer the EFI shell as a built-in boot option, try http://refit.sourceforge.net/ for a downloadable boot manager that includes an EFI shell.
3) Find the filesystem names that EFI has assigned to the system partition on the boot disk and the USB drive containing your 6.7u2 bootloader. You can do that by requesting directory listings of each filesystem with the EFI "dir" command, working upward from fs0:, until you find the ones with the expected contents. In the example below, fs0: is the system partition and fs5: is the USB drive.
Shell> dir fs0:
Directory of: fs0:\
08/05/19 02:38a <DIR> 512 EFI
08/05/19 02:38a 30 syslinux.cfg
08/05/19 02:38a 61,288 safeboot.c32
08/05/19 02:38a 93,672 mboot.c32
3 File(s) 154,990 bytes
Shell> dir fs0:\efi
Directory of: fs0:\EFI
08/05/19 02:38a <DIR> 512 .
08/05/19 02:38a <DIR> 0 ..
08/05/19 02:38a <DIR> 512 BOOT
08/05/19 02:38a <DIR> 512 VMware
0 File(s) 0 bytes
Shell> dir fs0:\efi\vmware
Directory of: fs0:\
08/05/19 02:38a <DIR> 512 .
08/05/19 02:38a <DIR> 512 ..
08/05/19 02:38a 172,224 mboot64.efi
08/05/19 02:38a 94,432 safebt64.efi
2 File(s) 266,656 bytes
Shell> dir fs5:\
Directory of: fs5:\
03/26/19 01:52p 171,400 mboot64.efi
1 File(s) 171,400 bytes
4) Copy your 6.7u2 mboot64.efi file onto the system partition, replacing the one that's already there. Continuing the example above:
Shell> copy fs5:\mboot64.efi fs0:\efi\vmware\mboot64.efi
Overwrite fs0:\EFI\VMware\mboot64.efi? (Yes/No/All/Cancel):y
copying fs5:\mboot64.efi -> fs0:\EFI\VMware\mboot64.efi
thank you for providing us the workarounds. But what about those people who have tried to update ESXi 6.7 unsuccessfully? When the installation is marked as invalid no one of these workarounds can be applied, right? How can we delete the mark to try to boot ESXi 6.7 U3 with the firmware boot mode of legacy BIOS?
And will this issue be fixed with a new update ZIP file so that we can install it and boot with UEFI mode?
I've updated from Build 13981272 to Build 14320388 through shell and had the same problem. Provided solution with EFI shell didn't work for me. What did work was:
1. Made bootable USB with Update2 (Build 13006603)
2. Booted from it and chose upgrade
3. Selected the disk with ESXI installation and chose upgrade with Datastores preservation
4. After the installation and first boot restarted and chose Repair option on boot screen (Shift+R)
5. There were 2 versions listed - Build 13006603 (Default) and Build 13981272
6. Pressed Y to change the default hypervisor to Build 13981272
7. Rebooted and all seems to be back the way it was before updating
Thanks to those who tried "workaround option 2". If the instructions didn't work for you, please try to give some details about what you did and what state you ended up in.
> When the installation is marked as invalid no one of these workarounds can be applied, right? How can we delete the mark to try to boot ESXi 6.7 U3 with the firmware boot mode of legacy BIOS?
I didn't explain how to do that because I was thinking it would be more straightforward and have fewer pitfalls to get your previous installation to boot again, then reapply the update. On the other hand, reapplying the update could reapply the 6.7u3 bootloader, so you might end up needing to do the workaround twice -- once to get your previous installation to boot, then again immediately after the 6.7u3 re-upgrade, just before the first boot into 6.7u3. Ugh.
For the adventurous, you can find the boot.cfg file of the bootbank containing the 6.7u3 installation and change the line that reads bootstate=2 or bootstate=3 and change it to bootstate=1. (If you see bootstate=0, the bootbank is valid, probably your older installation.) That will give the bootbank that was marked invalid another chance to try to boot.
> And will this issue be fixed with a new update ZIP file so that we can install it and boot with UEFI mode?
Of course this bug will be fixed in a future update. I'm just trying to be helpful for folks who are affected right now.
> I've updated from Build 13981272 to Build 14320388 through shell and had the same problem. Provided solution with EFI shell didn't work for me.
It would be helpful to know more details about what you did and where it failed, if you can provide them.
> What did work was:
> 1. Made bootable USB with Update2 (Build 13006603)
> 2. Booted from it and chose upgrade
> 3. Selected the disk with ESXI installation and chose upgrade with Datastores preservation
> 4. After the installation and first boot restarted and chose Repair option on boot screen (Shift+R)
> 5. There were 2 versions listed - Build 13006603 (Default) and Build 13981272
> 6. Pressed Y to change the default hypervisor to Build 13981272
> 7. Rebooted and all seems to be back the way it was before updating
Yes, that should work to get your older installation functioning again. To clarify this for others who are reading, build 13981272 is ESXi 6.7 EP 10. So the procedure above got back the old bootloader from 6.7u2, allowing 6.7 ep10 to boot again. As a side effect, the procedure also overwrites the unsuccessful 6.7u3 installation with a copy of 6.7u2.
I extracted the bootloader from the 6.7u2 ISO with 7zip and used rEFInd (which was recommended since the tool you linked was not maintained anymore) to start EFI shell. In my case USB was fs0: and installation was on fs3:
I don't remember exactly but IIRC it directly threw an error when I tried to boot.
Thanks a lot, Tim, Man 😉 Your (this) post saved my evening!
Simply replacing the efi binary solved the problem and I could successfully boot u3 after starting another vib-install. Thanks for the hint regarding the invalid bootbank and auto-rollback.
And the ElTorito hint! Instructions can't be clearer.
Btw. also C220 chipset here (Intel S1200v3RP) and the same error after u3 update - confirmed to be easily avoidable by your Workaround option #2.
While this workaround is all we need for the moment, I'd be highly interested in the real cause. Meaning, what was changed in the mboot64.efi, that breaks C220 systems. The CPU microcode upload?
If so, do we have the latest microcode despite this helpful but ugly workaround? Guess yes, but better to ask 😉
Like mentioned, any technical details are always welcome as well 🙂
To answer what was changed to break this: It's rather obscure. To work around an issue in some UEFI firmware, we added a sanity check on which UEFI memory types can contain valid page tables. The check was too strict, causing failure if the firmware puts page tables in some memory types that are unexpected but not illegal for that usage.
When will you provide an offical bugfix, this is my first install ever of ESXi 6.7u3 on a Lenovo ST550 (with the custom Lenovo image) and now this.
Strangely it installed without any problems but then after about 5 reboots this error message appeared.
The firmware probably had different page tables on this specific boot as you told, so ESXi failed to load this time.
But wouldn't this also mean the odds are fifty-fifty that it will successfully boot with a cold restart?
> When will you provide an offical bugfix
It ought to be fixed in the next regular patch release along the 6.7 line. I'm not allowed to give out details or dates for future releases. In the meantime we'll issue a KB article with the workarounds.
> But wouldn't this also mean the odds are fifty-fifty that it will successfully boot with a cold restart?
There's no reason it would be 50-50. From what we've seen so far, a few machines get the error every time, while most never do. Yours is the first machine I've heard of that gets the error sometimes but not always.
It ought to be fixed in the next regular patch release along the 6.7 line.
Will then a new update-from-esxi6.7-6.7_update03.zip file be provided to come to the update 3 releases?
It ought to be fixed in the next regular patch release along the 6.7 line.
> Will then a new update-from-esxi6.7-6.7_update03.zip file be provided to come to the update 3 releases?
I don't think so. I believe it will work to apply 6.7u3 *and* the subsequent patch (when the patch is released) without rebooting in between. I will check to be sure, though.
> I don't think so. I believe it will work to apply 6.7u3 *and* the subsequent patch (when the patch is released) without rebooting in between. I will check to be sure, though.
Yes, I checked with someone who's more familiar with the patch application process, and he said:
Customers who are in pre-67U3,
- They can apply 67 patch fix directly with patch deliverables via regular method of patching (release are cumulative and they will get all fixes of 67 U3 plus latest fix of patch).
We will test this path for sure during our patch testing.
Customers who are affected with UEFI boot issue on 67 U3 environments,
- They need to use workaround and they can directly apply the patch via regular deliverables.
We just went through this exact scenario ourselves. all our hosts (Cisco UCS B200-M4) were fine apart from an esxi host SuperMicro X9DR3-F we use for our backup layer. We have gone back to the last bootable image of 6.7 build 13981272 (using approach of booting from 6.7 update 2 boot media and repairing as described earlier) and things are luckily working again - we are still UFEI booting as before.
Will the notification of a re-released iso image update be via this forum or somewhere else?
I can imagine this to be a real pain for many shops using an assortment of current and old gen kit for virtualised backup/management hosts etc so it would be great if VMware can push an urgent fix.
If you need an urgent fix on supported hardware, please file an SR. Right now there is only a plan to get a fix into the next regular patch. Also there is not a plan to release an ISO with the next patch. Plans can be changed, but asking in the community forum isn't the most effective way to make that happen.
Note: I'm an engineer working on new development on our main branch. I'm not responsible for patch and update release plans. The most I do on those is to backport my fixes from the main branch when needed.
i have managed to fix the boot issue within replacing the efi files after i have executed the update to U3. It states that is was successfully but after replacing the .EFI the version displayed to me is the previous one. -> 13981272
So is this now related to be display only or i am still not at the latest version? I used the esxcli to update. On all previous patches i had to apply this worked fine.
Thanks for all your notes and effort Tim, however I just want to add how very disappointing is finding out that Vmware, after knowing about it a month, is still issuing this buggy version out for their customer base. After the problem was discovered, next steps known and the internal team engaged to resolve it, this version should had been pulled off any list of available updates, saving their customers the hassle and trouble. Because they didn't, now there are hundreds of hosts dead, needing manual recovery, with the result of negligence and poor decision making. There is only so much the SR can do, and of course they are not going to touch each of our systems - so again, the ball is on us to make it right. Surely our account rep will be notified, as things escalated pretty high around.
I'm having the same issue in my home lab. I tried Legacy Only boot and updated bios while disabling UEFI
so far it boots up from the fresh install and is working nicely. Maybe i'm going to update one esxi. Now its working fine for me .