VMware Cloud Community
GabGo
Contributor
Contributor

Bluescreen booting Windows VM with latest CU on AMD with VBS enabled

Hello Community!

We are currently migrating our virtual machines from our old Intel hosts to AMD hosts and ran into an issue with the latest Windows 10 21H2 Update.

We are using ESXi 7.0.3 Build 19898904 on HPE ProLiant Servers in a vCenter managed environment. Our old hosts running Intel Xeon E5-2660 v3 CPUs and our new hosts running AMD EPYC 74F3 CPUs.

Our Windows 10 21H2 (x64) VMs are on VM version 19 and have VBS enabled (in the VM configuration and Windows via group policy). These VMs are running fine on the Intel hosts. But if we start these VMs on the AMD hosts we get an bluescreen immediately with the following error message: “PNP DETECTED FATAL ERROR”.

We already did further investigations. The issue starts with the June 22 Preview Windows Update (KB5014023, Windows Build 19044.1741) and is also present in the latest July 22 Windows Update (KB5015807, Windows Build 19044.1826). It does not matter if VMware tools are installed or not.

To reproduce the problem,

  • create a new Windows 10 (x64) VM with VBS enabled in the VM configuration,
  • install Windows and patch to an affected build,
  • enable VBS via group policy (cmd > gpedit.msc >> Computer Configuration > Administrative Templates > System > Device Guard > Turn On Virtualization Based Security >> just enable and leave all other options untouched).

The VM starts if we disable VBS in the VM options and disable I/O MMU in the VM CPU configuration (just leave EFI, Secure Boot and HW virtualization support for the guest OS enabled).

Is there anyone with similar hardware (Zen 3 architecture) who can also reproduce the problem? We are trying to figure out if this is an AMD, HPE, VMware issue or maybe just a Microsoft introduced bug.

Thank you!

EDIT 1: Just patched an affected host to ESXi 7.0.3 Build 20036589 but the Problem persists.

0 Kudos
62 Replies
PhSLU
Enthusiast
Enthusiast


Are you using vTPMs in these VMs? This is - if I remember correctly - a requirement for using Secure Launch. You can check this with msinfo >> VBS services configured vs. running. We don't use vTPMs yet so we (luckily) already disabled Secure Launch a while ago.


You can enable Secure Launch without vTPM (wich we are - unluckily) ...

I opened a case at Microsoft ... will keep you posted. 

Stellarier333
Contributor
Contributor

We can confirm that Server 2019 VMs are crashing after installing KB5030214.

We are running 

  • Hypervisor:VMware ESXi, 7.0.3, 20036589
  • Model:ProLiant DL385 Gen10 Plus
  • Processor Type:AMD EPYC 7402 24-Core Processor

Our VMs are using SCSI Controller " VMware Paravirtual".

As soon as we are running into the hanging Windows Logo screen, we are able to revert the MS update with these steps:

- Turn off VM
- Change SCSI Controller from "VMware Paravirtual" to "LSI Logic SAS"
- Deactivate VBS, I/O MMU and secure boot on VM virtual layer
- Boot into CMD (Windows Recovery environment)
- Sign in with local Administrator
- Mkdir C:\scratch
- dism /english /image:C:\ /Get-Packages /Format:Table
- dism /image:C:\ /scratchdir:C:\scratch /cleanup-image /revertpendingactions
- Power off VM
- Change SCSI Controller from "LSI Logic SAS" to " VMware Paravirtual"
- Activate VBS, I/O MMU and secure boot on VM virtual layer
- PowerOn VM
- VM is boot and screen appears "We couldn't complete the updates. Undoing changes. Don't turn off your computer"
- VM is running again

As soon as the VM is booting in OS again without installed KB5030214 we have performed this:

Deactivated VBS, I/O MMU and secure boot on VM layer, deactivating VBS via GPO. Deleted the Credential Guard EFI variables by using bcdedit like Microsoft has described it here: Disable Credential Guard with UEFI lock 

We did set this regkey additionally:

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\DeviceGuard]

"ConfigureSystemGuardLaunch"=dword:00000000

After installing KB5030214, we are still facing the same result: "VM is hanging in Windows Logo screen".

Why did you migrate the VMs from AMD to Intel Host? Was it because you needed the get Windows Recovery environment. In our case we have got it, like though changing the SCSI Controller like desrcibed above.

After you have re-enable VBS and set Secure Launch Configuration to "Not configured", where you able to patch the VM with KB5030214 successfully?

0 Kudos
mkaetm
Enthusiast
Enthusiast

"Why did you migrate the VMs from AMD to Intel Host? Was it because you needed the get Windows Recovery environment. In our case we have got it, like though changing the SCSI Controller like desrcibed above."

I migrated the VMs to Intel, because I had some issues with disabling the VBS while Windows was running in safe mode. And on Intel they were able to boot and I could disable VBS then. 

"After you have re-enable VBS and set Secure Launch Configuration to "Not configured", where you able to patch the VM with KB5030214 successfully?"

I did not test this scenario. In our case the patch was installed before. 

 

0 Kudos
GabGo
Contributor
Contributor

After you have re-enable VBS and set Secure Launch Configuration to "Not configured", where you able to patch the VM with KB5030214 successfully?

I can confirm that it is not enough to set Secure Launch to "Not configured" when it was enabled before. In this case it stays enabled (configured). You can check the configured VBS services through the "System Information" application (msinfo). So you have to set Secure Launch to disabled explicitly and reboot at least once.

In our environment I uninstalled KB5030214, set Secure Launch to disabled in the corresponding GPO, rebooted the VM and verified through msinfo that secure launch was not configured and installed KB5030214 again after that.

0 Kudos
Stellarier333
Contributor
Contributor

In our last testing we had missunderstood the correct GPO configuration to get the VMs running after installed KB5030214. 

In the meanwhile Microsoft has provided an official workaround to us for Windows Server 2019 VMs, running on ESXi with AMD CPUs.

Official workaround for now

  1. Workarounds:

    Workaround 1 of 2: Disable Secure Launch Policy under HKLM\SOFTWARE\Policies\Microsoft\Windows\DeviceGuard\ConfigureSystemGuardLaunch

Stellarier333_0-1695982731807.jpeg

2. OR
Workaround 2 of 2: Use VMWARE ESXI Hardware 17 or below

We have tested it with two VMs, running on ESXi7.0.3m. The first workaround has worked for us for the affected VMs!

We Activated VBS, I/O MMU and Secure boot on VM layer. Activated VBS via GPO with "Secure Launch Configuration = Disabled"

Result: After patching VM with MS September Updates, the VM is booting!!!

We hope that Microsoft will find the bug to fix it also with enabled Secure Launch Configuration on ESXi and AMD CPUs.

0 Kudos
PhSLU
Enthusiast
Enthusiast

another patch Tuesday nightmare for those of us running ESXi on AMD and who care about VBS 

This time Windows Server 2022 with VBS enabled breaks after installation of  KB5031364.

Disabling Secure Launch is not enough. This time it seems to be deeper.

0 Kudos
GabGo
Contributor
Contributor

I can confirm the issue with KB5031364 when running on ESXi 7. On ESXi 8 with VM version 20 the VM boots fine.

This time it seems the BSOD occurs even without enabling VBS via Group Policy. It's enough to enable VBS at the VM configuration. 

EDIT: You have to enable VBS in Windows to get the BSOD. It's the same behavior like the Windows 10 boot BSOD. You can work around by only disabling I/O MMU at the VM configuration (you have to uncheck VBS first) and at least still get (some) VBS features in Windows (check with msinfo).

0 Kudos
steven_vanpraet
Contributor
Contributor

We have the same issue after applying the October MS security patches for Windows Server 2022 with a Dell AMD Epyc 3 servers. Disabling IO MMU is a workaround however VBS is not running anymore. Moving the VM to Intel based HW is working fine. Running ESX 7 u3 - O (22348816) with the lastest Dell updates. 

0 Kudos
GabGo
Contributor
Contributor

I forgot to mention that if you disable I/O MMU in the VM configuration you have to set the Windows VBS settings (GPO) from "Secure Boot and DMA Protection" to "Secure Boot" only.

We do not have DMA protection enabled in our environment because it was not enabled in the MS Security Compliance Toolkit Baseline we use.

0 Kudos
TVolke
Contributor
Contributor

Setting the DVD/CDROM drive to ide and remove the SATA controller, solves the problem for us. You have to change to IDE and save the settings, before removing the SATA controller. It can not be done in one operation.

0 Kudos
GabGo
Contributor
Contributor

I can confirm this works. This is definitely the better workaround until a fix is available.

0 Kudos
RuneD
Contributor
Contributor

I've had issues with Server 2019 since August 2023, Server 2016 since September 2023, and now Server 2022 since October 2023.
Most affected were 2 Server 2019's that were setup in January 2020 on some i5 CPU, before running on Intel Xeon Gold 5215's until this summer when they were migrated to AMD EPYC 7313.

 

Anything installed on the i5' the workarounds listed does not work for:
* Remove VBS
* Remove Secure Boot
* Remove IOMMU
* Remove CD/DVD + SATA Controller
* Upgrade ESXi to 8, it has caused new issues related to VSAN Stretched licensing since its a 2 node cluster with a witness (which no longer is allowed on a standard license it seems...)

 

Anything installed on the Xeon's directly the workaround related to IOMMU is the fix. Anything earlier refuses to budge...

0 Kudos
GabGo
Contributor
Contributor

I assume you are also using VBS which causes your problems.

If you have installed KB5030214 for Server 2019 you have to disable Secure Launch in the group policy either by starting the VM on a unaffected host or by uninstalling the update first (which is a bit inconvenient if you are using paravirtualized hardware): https://communities.vmware.com/t5/VMware-vSphere-Discussions/Bluescreen-booting-Windows-VM-with-late...

Another reason I could imagine is the VM hardware version of these old Server 2019 VMs. I guess these were created on ESXi 6.7, did you upgrade the VM hardware after migrating to ESXi 7? Maybe there are some VM settings not set correctly.

We have a similar issue with our VMs migrated to ESXi 8 and VM hardware version 20 regarding the VM (VMX) setting "chipset.motherboardLayout" which has to be set to "acpi" to solve the boot BSODs.

RuneD
Contributor
Contributor

Due to issues with production, I've been forced to reinstall the servers back to ESXi 7.0u3 due to lack of response from Dell which has its direct support in the licenses.

I tried a new restore from a (sadly) patched VM that has the updates installed, prior to backing it up I made sure to disable the GPO's that enforce VBS/Secure Launch/Credential Guard, rebooting, and then taking a new backup - to no avail. I disabled everything I wrote above + stopped applying the MS Security Baseline hardening for the server that enabled Secure Launch amongst other settings.

The VM is still unbootable. It was VM hardware for 6.7, now its 7.0u2. No setting in the VMX that refers to "chipset.motherboardLayout", adding it in with "acpi" didnt help, still BSOD'ing.

0 Kudos
GabGo
Contributor
Contributor

I assume you are referring to your Server 2019 VMs.

Can you boot the VM on an unaffected host? If yes you have to apply a group policy which explicitly disables Secure Launch. Disabling the group policy itself is not enough.

If you are not able to boot the VM anymore you have to remove the patch first as described in the link in my previous post.

The motherboard layout switch is only relevant on ESXi 8 hosts.

RuneD
Contributor
Contributor

Oh, I didnt read that correctly the first time around.

Applied a policy which actively disables Secure Launch, that did the trick. I can keep VBS/Secure Boot enabled on the VM as a big bonus - less reconfiguration.

So:
* Deploy GPO that disables Secure Launch on the working VM's running on Intel processors.
* Made sure to gpupdate /force since I was not going to wait <90 minutes for a refresh
* Backed up 
* Restored to AMD processor, now it starts and doesnt BSOD. No other steps in regards to VBS/IOMMU/Secure Boot were needed.

Massive thanks and if you ever come by Oslo I owe you a drink & pizza.

 

0 Kudos
K1ngb0rA
Contributor
Contributor

0 Kudos
GabGo
Contributor
Contributor

I can confirm that KB5032198 fixes the Server 2022 VBS issue.

It also seems that KB5032196 fixes the Server 2019 Secure Launch issue.

0 Kudos
DJMcKinnon
Contributor
Contributor

Any 2022 server (VM) that received the October CU will not install the November CU: KB5032198 - simply fails with 0x8024200B - event id 20.

Ran DISM health check and SFC, no corruption - has anyone else experienced this?

 

0 Kudos
GabGo
Contributor
Contributor

We skipped the October CU in production.

On a test VM with installed October CU (KB5031364) the November CU (KB5032198) installed fine. Are all your Server 2022 VMs affected?

You could also try to clear the Windows Update Software Distribution folder which solved some update issues for us in the past: https://www.thewindowsclub.com/software-distribution-folder-in-windows

 

0 Kudos