VMware Communities
Technogeezer
Immortal
Immortal

VMs that used to work no longer boot on 22H2 TP

I've encountered 2 instances of Linux VMs with kernels that used to boot on the 21H2 version of the Tech Preview that no longer boot on the 22H2 version:

Fedora 36 (with frozen kernel version 5.16.11-200.fc35.aarch64), and Ubuntu 22.04 (with frozen kernel version 5.15.0-18).

Obviously something changed in the Tech Preview that broke these VMs.  That means a complete reinstall of Fedora, as well as putting any plans to use Ubuntu on the back burner.

@Mikero Was this intentional or a regression?

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
22 Replies
treee
Enthusiast
Enthusiast

You should read the testing guide as it is mentioned on page 4 (not particularly but you have to read between the lines) as well as on page 25 (it is specifically mentioned in the table as "Linux vm hangs at boot") 😁 

I think the right order is to get your vm upgraded before you upgrade to the new tech preview. After installing the updates in your linux vm shut it down because a. due to the newer kernel it is not going to boot on the previous version of the tech preview and b. the newer tech preview seems to be unable to resume the vm's of the older tech preview version (you'll get a dialog mentioning the state is faulty with the option to keep it or bin it).

Kernel requirements now seem to be 5.18 at least to get a properly working vm and 5.19 if you need 3D graphics (which has another bug: enabling 3D and setting it to use 8GB of mem will crash VMware Fusion, at least with OpenSUSE Tumbleweed and Fedora Rawhide (whichever latest-greatest version of them is available today (29 Jul)).

0 Kudos
Mikero
Community Manager
Community Manager

It's expected but there's a more specific answer that I'm having the team help draft up.

There have been a ton of our-bug-no-its-your-bug-no-its-a-kernel-bug-wait-maybe-a-driver-bug sort of interactions so I just want to make sure I get the details right.

-
Michael Roy - Product Marketing Engineer: VCF
0 Kudos
Mikero
Community Manager
Community Manager

On the crash with 3D enabled (and I don't think it matters the gpu memory size, but that's interesting if you notice it does, i'd like to know more and test!), that should probably go away "in a few days" with 5.19 updates.

Fedora has pulled in that fix (run dnf update --refresh, reboot, should work after that), but that patch did come in late in the 5.19 schedule... It almost made it into 5.19-rc8 but I think it was just after the cutoff. Fedora pulled in the patch manually (but it's not on the ISO yet, hence needing to refresh), but it's expected to land in 5.19 'GA' for all.

My working kernel version after updating in Fedora Rawhide is 5.19.0-0.rc8.20220727git39c3c396f813.60.fc37.aarch64

Basically, the patch bypasses an area of code in the hypervisor that the kernel shouldn't have been going down.

We are probably going to double-patch so the vmx doesn't crash (we should never crash!), but its expected that 5.19 handles things such that it doesn't down down this path, and instead 'just works' wrt 3D.

All roads lead to 5.19 😉

-
Michael Roy - Product Marketing Engineer: VCF
0 Kudos
Mikero
Community Manager
Community Manager

Lastly I'll say this...

We actually built our 3D gfx drivers on Ubuntu 20.04 up to 20.04.3 using in-house-built 5.16, 5.17 and 5.18 kernels.

We manually patched and built around all the bugs, and upstreamed all those patches to the kernel, our drivers and to the Mesa libraries.

We just need to convince Ubuntu to hurry up and pull in those fixes.

-
Michael Roy - Product Marketing Engineer: VCF
0 Kudos
Technogeezer
Immortal
Immortal

If there are "reading between the lines", then the space between those lines must be pretty tiny indeed. These non-Ubuntu kernels I'm using booted fine on 21H1. It appears that VMware fixed something that wasn't broke, at least for older 5.14, .15, and .16 releases that didn't have the security fixes installed that referenced the CPU capability register.

The issues I'm having are not only with Ubuntu - Tumbleweed, Fedora and CentOS Stream all have the same issue.. It would not have been my expectations that a VM whose kernel used to boot on the prior TP would fail to boot on this one.

It's a good thing I saved away a copy of the 21H1 TP, because I'm in the process of doing exactly what you said. Removing the holds on package updates and letting the distros update to their latest. We'll see how that works once I reinstall 22H2 TP.

Oh, and for Ubuntu, I installed the latest mainline 5.19rc release kernel on my currently working 22.04LTS franken-VM and we'll see if that will work once I upgrade. I hope @Mikero has luck getting Canonical to listen to him because Ubuntu in general is a mess and right now my least favorite Linux distro.

I'm also going to check if Ubuntu has daily builds of their next release like they did for Jammy (22.04).

People that expect to run older ARM Linux releases are really going to be hosed with this Tech Preview version.

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Technogeezer
Immortal
Immortal

Success!

I upgraded the kernel on my Tumbleweed, CentOS 9 Stream, Fedora 36 and Ubuntu working VMs on the 21H1 TP before upgrading to the 22H2 version.

All now boot successfully. But what a PITA.

Here are the kernel versions that were installed after I removed all holds on the kernel updates:

Tumbleweed: 5.18.11

CentOS 9 Stream: 5.14.0-134

Fedora 36: 5.18.13-200

Ubuntu 22.04 - I downloaded the latest 5.19 mainline kernel from https://kernel.ubuntu.com/~kernel-ppa/mainline/ (that's 5.19 rc8) and manually installed it on my working Ubuntu 22.04 VM that I had frozen with an older 5.15-18 kernel as its default. Boots like a charm on the 22H2 TP and could remove the acpi=force kernel boot argument hack.

 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
0 Kudos
aliceyoung
Contributor
Contributor

Fedora 35 does not boot/install either. Though reverting to 22H1 still works. That seems to be the only option if you're invested in the RH 

I was hoping that the 22H2 release would support Rocky 9, RHES 9, Centos 9, Fedora 36, etc. Just in server mode, no GUI. But they don't get far into the startup before freezing up, sometimes with CPU at (100 x cores)%, other times with low CPU.

So far, I'm only able to boot/run Fedora rawhide 37 on 22H2. It could take some time before a compatible kernel is in a RHES release.

I guess Linux was not the focus of 22H2.

 

0 Kudos
aliceyoung
Contributor
Contributor

I upgraded the kernel on my Tumbleweed, CentOS 9 Stream, Fedora 36 and Ubuntu working VMs on the 21H1 TP before upgrading to the 22H2 version.

Could you elaborate on the steps, starting from .iso? Since these don't run on any ARM Vmware version, how did you update the kernel?

0 Kudos
treee
Enthusiast
Enthusiast

I downloaded the Rawhide image (used the link from the testing guide), created a new vm with it, customised it to enable 3D, booted the vm and started the installer. Somewhere during the loading of the installer Fusion notifies me it has crashed and asks me to reload or not (when you do the game starts again).

If I use the image for OpenSUSE Tumbleweed instead it does the same thing. When I leave the 3D thing disabled it will not crash and allow me to install them in both cases. I haven't yet tested with setting a lower amount of memory, might be that it doesn't like the 8GB setting (I am doing this on an MBP with the M1 Max 32c GPU and 64GB of mem so that should be able to handle it). The other settings for the vm are 4 cpu and 16GB mem. Will look at this again tomorrow to see what the other GPU mem options do.

@aliceyoung for OpenSUSE Tumbleweed use the currently available iso file. I've used that one and successfully installed a vm with it on the 22H2 TP version.

0 Kudos
Technogeezer
Immortal
Immortal

@aliceyoung These were VMs that were built and running on the 21H1 TP. These were not new builds from an .iso.

I had used instructions from the distributions to put a hold on the kernel updates that I knew would not boot on the TP.

What I did was to undo the holds on the VMs (reversing the instructions to put a hold on the kernel packages). Then  I ran the VMs through another cycle of package updates. That installed a kernel that would not boot on the 21H1 TP. When I upgraded to the 22H2 TP, these VMs now booted with the updated kernels.

As indicated by @treee, for a new VM start from a current .iso installer from your distro of choice (at least for non-Ubuntu distributions). Don't even waste your time at present with Ubuntu. For some reason they don't think they need to bring forward fixes to kernel bugs that pretty much everyone else has picked up. Try Ubuntu when you hear someone here say they've had success it out of the box or with minimal workarounds. 

Fedora 35 installers likely contain a kernel that won't boot. Use a Fedora 36 ISO, boot it with generic graphics (the option is in the grub "Troubleshooting" menu), the install and let it update. to a newer kernel that will work fine. I posted how to do this earlier.

 

 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
todivefor
Enthusiast
Enthusiast


@Technogeezer wrote:

Success!

Ubuntu 22.04 - I downloaded the latest 5.19 mainline kernel from https://kernel.ubuntu.com/~kernel-ppa/mainline/ (that's 5.19 rc8) and manually installed it on my working Ubuntu 22.04 VM that I had frozen with an older 5.15-18 kernel as its default. Boots like a charm on the 22H2 TP and could remove the acpi=force kernel boot argument hack.

 


@Technogeezer I'm stuck somewhere between 21H1 and 22H2. W11 works better on 22H2, and Ubuntu won't boot. Under 21H1 Ubuntu boots with old kernel (5.15?), but W11 is dead because I upgraded. I would like to get this straight about Ubuntu. Boot Ubuntu 22.04 under 21H1 TP with old kernel, update with 5.19 rc8 kernel, shutdown Ubuntu, then boot Ubuntu with 22H2 TP. Are those the correct steps.


Macbook Air M1, Ventura 13.5, Fusion Player 2023 TP
0 Kudos
todivefor
Enthusiast
Enthusiast

@Technogeezer Just tried what I said worked perfect!


Macbook Air M1, Ventura 13.5, Fusion Player 2023 TP
0 Kudos
aliceyoung
Contributor
Contributor

@Technogeezer  Thank you for the hints. I tried a few things but still did not get far. My preferred distro is Rocky 9. I have TP 21H1, 22H1 and 22H2 here. I got TP 21H1 from: https://customerconnect.vmware.com/downloads/get-download?downloadGroup=FUS-PUBTP-2021H1&download=tr...

My specific steps:

Removed prior VMware app and all "*vmware*" from ~/Library
Installed 21H1
Selected CentOS-Stream-9-latest-aarch64-dvd1.iso or Rocky-9.0-aarch64-minimal.iso
Choose operation system: Fedora 64-bit Arm (also tried "other linux 5.x kernel")
On boot, troubleshooting -> install in text mode, or just Install
Four lines appear:

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...

CPU use of VMware stays at 100%, makes no further progress.

Anything I'm missing? If I can get this installed, my understanding is that next, I should:
- upgrade all packages including the kernel but then shut down
- upgrade Vmware to TP 22H2
- do not upgrade VM

Edit:
This does work: Install fedora 35 server (kernel 5.14) on VMware 21H1, dnf update everything, shutdown, upgrade VMware to 22H2, start fedora (kernel 5.18) with upgrade VM.
During installation from .iso, the above 4 lines also appear, quickly followed by the rest of startup.

0 Kudos
Technogeezer
Immortal
Immortal

My bet is that both CentOS 9 Stream and  Rocky Linux 9 ISOs have kernels that have been updated with the security fixes - especially if you have downloaded them recently. If so they will not boot on TP versions before 22H2.

I just built a new VM for CentOS 9 Stream (downloaded the ISO on July 25) on the 22H2 TP. I configured it as a Fedora 64-bit (close enough to its RHEL/CentOS kin), gave it a 32GB disk, 4 vCPU and 4GB of memory. It boots fine into the graphical installer and installs just fine.

Unfortunately I have had no luck with Rocky Linux 9 or RHEL 9. My guess is that they have some of the kernel issues that @Mikero has alluded to in the Tech Preview Testing guide. As Fedora and CentOS are upstream releases to RHEL 9 and Rocky 9, it doesn't look like the upstream changes that enables the Fedora and CentOS kernels to boot on the TP have made it into RHEL 9 and Rocky 9. I would hope eventually that whatever the upstream releases did will hit the downstream releases.

 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
0 Kudos
aliceyoung
Contributor
Contributor

I just built a new VM for CentOS 9 Stream (downloaded the ISO on July 25) on the 22H2 TP. ... It boots fine into the graphical installer and installs just fine.

You are correct! The Centos 9 copy I downloaded and failed to boot was from July 7. I just downloaded it again and tried it, and yes, TP 22H2 installs and runs it, no problems so far. I have been avoiding CentOS since it went to "stream" and Rocky became a drop-in replacement, but I guess now I will have to pay attention to CentOS again since it's the most relevant RHEL-derivative distro that works.

 

 

0 Kudos
exiledpoacher
Contributor
Contributor

Can't get Centos to connect to the network. How did you achieve this?

UPDATE: I should have selected Bridged Network, Auto Detect, WiFi. Seems OK now..

0 Kudos
Technogeezer
Immortal
Immortal


@aliceyoung wrote:

I have been avoiding CentOS since it went to "stream" and Rocky became a drop-in replacement, but I guess now I will have to pay attention to CentOS again since it's the most relevant RHEL-derivative distro that works.

 

 


Hold tight for a few on RHEL 9 / Rocky Linux 9. I may have found a workaround that allows these installers to boot and install. Keeping fingers crossed....

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
0 Kudos
Technogeezer
Immortal
Immortal

As the ancient Greeks would say: Eureka!

(and this might explain a bunch of other behaviors on older kernels that used to work as well....)

It appears that the latest Tech Preview has broken older versions of the ARM64 Linux vmwgfx drivers. For Rocky Linux 9 and RHEL 9 (and possibly other kernels that share the same vmwgfx drivers),  the specific behavior seen is that at boot the console display blanks and displays a blinking cursor for a while, then the cursor stops blinking and no further progress is seen.

There is a workaround to get the Rocky Linux 9 and RHEL 9 installers to work:

  • Boot normally, and highlight "Instal Red Hat Enterprise Linux 9.0" (or "Rocky Linux 9.0"), but do not hit Enter.
  • Press 'e' to edit the selected item
  • On the line that starts "linux /images/pxeboot", add the following kernel argument: modprobe.blacklist=vmwgfx. The result should look something like this:
    Technogeezer_0-1659292157498.png

     

  • Now press Ctrl-x to continue boot. The installer will boot, and the distribution will install.

The unfortunate side effect of this is that the blacklisting of the vmwgfx driver persists into the installed Linux VM. 

(For the more technically inclined, the file /etc/modprobe.d/anaconda-blacklist.conf that is found in the VM contains an entry that prevents loading the vmwgfx driver).

That means that the standard Linux frame buffer driver is in use, and the graphics resolution can not be dynamically changed by the VM. However, there is a workaround for this that allows the frame buffer display to be larger. Still not able to be dynamically resized, but it will give you more screen to work with until Red Hat (or Rocky Linux) provides a kernel with an updated vmwgfx driver.

  • Log into the VM and sudo to root.
  • Edit /etc/default/grub
  • Add a line to the top of the file that reads
    GRUB_GFXMODE=1920x1200
  • Change the line that reads
    GRUB_TERMINAL_OUTPUT="console" 
    to
    GRUB_TERMINAL_OUTPUT="gfxterm"
  • Save your changes and exit the editor.

The resulting /etc/default/grub should look something like this:

GRUB_GFXMODE=1920x1200
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
# GRUB_TERMINAL_OUTPUT="console"
GRUB_TERMINAL_OUTPUT="gfxterm"
GRUB_CMDLINE_LINUX="crashkernel=2G-:448M rd.lvm.lv=rl/root rd.lvm.lv=rl/swap"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

Now replace your existing grub menus by executing the following:

# grub2-mkconfig --output /boot/grub2/grub.cfg 
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done

Reboot and your VM will now have a 1920x1200 graphical display.

I'm going to go back and see if this workaround solves other ills of older installers, Ubuntu, and/or existing VMs... Stay tuned!

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
0 Kudos
Technogeezer
Immortal
Immortal

Another update:

Ubuntu is still a no-go unless you have an existing working VM that had been upgraded to a 5.19 mainline kernel. I'm not wasting any more time on it until they get their act together.

My first test with an older installer was Fedora 35. It exhibited the "hang on boot" behavior. I did find that it will boot and allow you to install by blacklisting the vmwgfx driver workaround. After installation to the hard drive and reboot to run on the newly installed VM, a package update is available that I applied via "dnf upgrade". It brought the kernel level to 5.18.13-100, and I was able to remove the vmwgfx blacklist (like RHEL 9, the blacklist persists due to an entry found in /etc/modprobe.d/anaconda-denylist.conf). After removal of that entry and reboot, the ability to resize the screen has returned.

This is looking promising as a way to get those other non-Ubuntu existing VMs working. Put the vmwgfx on the blacklist in grub to boot the existing VM, run a package update so that newer kernels get installed, and then remove the blacklist if the kernels don't have the updated driver..

 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
0 Kudos