VMware Communities
Technogeezer
Immortal
Immortal

Fedora 35 kernel 5.16.14 will not boot

Fedora 35 just pushed an update to kernel version 5.16.14 - this will not boot. Prior kernel 5.16.11 would boot ok. There's no response from the kernel after GRUB loads it.

@treee and I have both encountered this. I

This issue also impacts OpenSUSE Tumbleweed as it's using the same kernel version.

So either something is broken in that kernel, or there's something that it doesn't like with the TP. Either way this is a potential showstopper for new Fedora or OpenSUSE users, and the community in general if it spreads to other distros as they pick up 5.16.14 kernels.

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
33 Replies
Technogeezer
Immortal
Immortal


@k_ronny wrote:

 but I know that at least qemu (in an unreleased version) works well with all newer kernel versions.


@k_ronny , I'm curious: Is QEMU running in virtualization or emulation mode? 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
treee
Enthusiast
Enthusiast

Most likely it is in virtualisation mode. Further up this thread there is a link to the kernel mailing list where it is explained that the issue isn't in Apple's hypervisor framework (which they are referring to as HVF) but in how virtualisation software such as QEMU, Parallels Desktop and VMware Fusion/Workstation/etc. are handling a certain register. QEMU has already addressed the issue but both Parallels and VMware need to address it. From what I read on the Parallels forum is that they are already looking into this. No word from VMware yet but that doesn't necessarily mean they aren't aware of the issue already.

On that same kernel mailing list thread you also find a reply from Greg K-H that it probably isn't a good idea to make changes to the linux kernel on this as the register could be used for something (else) in the future. In other words: this really needs to be addressed in the virtualisation software.

fabienmagagnosc
Enthusiast
Enthusiast

@Technogeezerand @k_ronny => the Qemu patch is not solvingthe issue

the code inserted in Qemu

+    case SYSREG_ID_AA64ISAR2_EL1:
+        /* We do not support any of the ISAR2 features yet */
+        val = 0;
+        break;

Cause the CPU to be at 100% and that's it ... no progress at all.

The post on the kernel.org is more interesting, as this map the "bare metal" to the "virtualization engine"

This is because detection of the clearbhb instruction support requires accessing SYS_ID_AA64ISAR2_EL1.  Commenting out the two uses of supports_clearbhb in the kernel now yields a successful boot.
Qemu developers seem to have found this issue as well[1] when trying to boot 5.17 using HVF, the Apple Hypervisor Framework.  This seems to be some sort of platform quirk on M1, or at least in HVF on M1.  I’m not sure what the best workaround would be for this.  SYS_ID_AA64ISAR2_EL1 seems to be something added in ARMv8.7, so perhaps access to it could be gated on that.  


First of all : Qemu found this issue on the Apple Hypervisor framework

Bare-metal and KVM works, and parallel, as Qemu and Fusion are affected

This really is a Parallels bug. These kernels run fine on bare metal
M1 and in KVM. QEMU was affected as well, and that was fixed in their
HVF handling. HVF itself is fine.

So this should be punted back to the hypervisor vendor for not properly
implementing the architecture (no ID register is allowed to UNDEF).

Now why this kernel on bare metal works and why KVM works ?

For the bare metal, nothing special here, I suppose the kernel and M1 are "in sync" and work properly.

 

Now, why KVM works ? it's probably becasue KVM IS AN HYPERVISOR, meaning, it's probablt running as hypervisor on linux on M1, when Parallel, Qemu and Fusion are using the hypervisor provided by Apple.


If we "try to visualize" it :

  • Apple M1 <=> OS : Linux with kernel <=> hypervisor :  KVM  <= this is working
  • Apple M1 <=> OS : macos <=> hypervisor : Apple Framework <= this is not working

and we can see that Qemu, Parallel and Fusion are not working as they all rely on the Apple Hypervisor

 

And the conclusion is clear, from the Kernel.org team, which is NOT dedicated t othe Apple M1, but the ARM 64 architecture (aarch64)

As a *very* short term solution, that's probably the right thing to do.

However, this register is bound to grow new uses over time, and 
disabling
these features in a distro kernel is going to impact all users, unless
your particular kernel build is strictly limited to M1.

Thanks,

 

 

 

 

fabienmagagnosc
Enthusiast
Enthusiast

@treee=> the Qemu team is actually "hiding the register", when the kernel team is actually pushing more for a long term solution as the register is usefull

Note : thanks to summarize, my post is way too long sadly with the detailed explanations (to be validated over time and corrected)

Reply
0 Kudos
treee
Enthusiast
Enthusiast

Yeah, just read the piece of code you posted. That looks more like something you'd do when in a panic so I guess they are very much aware how big this issue is. I think we'll have to wait the coming days.

Reply
0 Kudos
fabienmagagnosc
Enthusiast
Enthusiast

@Mikero=> it seems that there is a blocking issue in the Apple Hypervisor Framework affecting Parallel, VMware Fusion (here we discuss about it) and Qemu.

the kernel 5.13 is the first officially supporting the Apple M1 nicely, but since then, the kernel work on baremetal and with KVM (integrated with the kernel, so baremetal here), but not with Apple HVF solutions (llike Fusion or Parallel)


sorry to bring you into this, but I suppose you got some idea ? maybe some news or information about it ?

Reply
0 Kudos
k_ronny
Enthusiast
Enthusiast

@Technogeezer I use it in "virtualization mode", i.e. I use this command line in qemu:

qemu-system-aarch64 \
    -uuid ${UUID} \
    -name fedora-desktop \
    -machine type=virt \
    -accel accel=hvf \
    -cpu host \
...

And thanks to Akihiko Odaki https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5 I have a full functional Fedora / Debian Gnome desktop with hardware accelerated graphics.

Reply
0 Kudos
k_ronny
Enthusiast
Enthusiast

Reply
0 Kudos
fabienmagagnosc
Enthusiast
Enthusiast

@k_ronny=> it's actually indicated that Qemu is "hiding" the register : a bodge (the solution in place from Qemu team) < hack (trying to make is fully fuctional but not with a direct solution) < solution (the one we speak, and virtually perfect) < implementation (the real implementation to make it working perfectly, reproductible, and long term offering ALL the functionalities !)

static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, 
uint32_t rt)
     case SYSREG_OSDLR_EL1:
         /* Dummy register */
         break;
+    case SYSREG_ID_AA64ISAR2_EL1:
+        /* We do not support any of the ISAR2 features yet */
+        val = 0;
+        break;
     default:
         cpu_synchronize_state(cpu);
         trace_hvf_unhandled_sysreg_read(env->pc, reg,

 

and bacically, as indicated by the kernel developer, this is not a solution, but a quick and dirty way to make it works ... it's a bodge

and as mentioned in the kernel thread : the current register is used by a function in 2 places, but it planned to be more used in the future.

 

Reply
0 Kudos
k_ronny
Enthusiast
Enthusiast

@fabienmagagnosc You are right. But for now, it is a solution. The commit message of the current patch is:

Recent Linux versions added support to read ID_AA64ISAR2_EL1. On M1,
those reads trap into QEMU which handles them as faults.

However, AArch64 ID registers should always read as RES0. Let's
handle them accordingly.

This fixes booting Linux 5.17 guests.

I read from it that they are satisfied with it. And I think VMware could treat it that way too, at least until Apple has a better solution to offer.

fabienmagagnosc
Enthusiast
Enthusiast

I do think they'll need to work on it, as if I read the ARM documentation well, this allow to read the CPU specific implementations available

"Provides information about the features and instructions implemented in AArch64 state." and this one (after SAR0-EL1 (security) and SAR1-EL1 (generic it seems) provides memory enhancement (copy, pointer management ? ...)

 

https://developer.arm.com/documentation/ddi0595/2021-12/AArch64-Registers/ID-AA64ISAR2-EL1--AArch64-...

 

Hopefully, Vmware, Parallel (and others based on Apple HVF) can get this done sooner than later !  At least Qemu got it running (and me, I'm looking at my Fedora running with 5.13 with sadness now)

 

Reply
0 Kudos
k_ronny
Enthusiast
Enthusiast

@fabienmagagnosc 

after reading a little more, I see it like this:

the Apple M1 chip uses the ARMv8.4 (ARM) instruction set (https://en.wikichip.org/wiki/apple/mx/m1)

In https://developer.arm.com/documentation/ddi0595/2021-03/AArch64-Registers/ID-AA64ISAR2-EL1--AArch64-... we can find this statement:

Configuration
This register is present only from Armv8.7. Otherwise, direct accesses to ID_AA64ISAR2_EL1 are UNDEFINED.

And in https://developer.arm.com/documentation/ddi0595/2021-12/AArch64-Registers/ID-AA64ISAR2-EL1--AArch64-... there is this:

Configuration
Prior to the introduction of the features described by this register, this register was unnamed and reserved, RES0 from EL1, EL2, and EL3.

 So I think for the Apple M1 the QEMU solution is the right one.

Tags (1)
Technogeezer
Immortal
Immortal

I may be stirring up a hornets nest, but this Begs the question: if the register isn’t there, unnamed and undefined in some ARM v8 architectures, why is the Linux kernel insisting on accessing it? Or isn’t there enough information for the kernel to use to determine that it shouldn’t try to access it?

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
fabienmagagnosc
Enthusiast
Enthusiast

Based on those registers descriptions, it seems that they provide informations about features accessible by the kernel and usable, a little bit like the cpuinfo, with all the silicon features accessible on x86/amd64 

 

i suppose that in the future, it’ll be possible for the kernel to access the cpu information before hand, and based on the result decide which implementation (software or hardware) to use.

 

i would say the efforts going into the ARM64 echo system is new, and even we could see some efforts from HP (moonshot systems if I recall well) and in openbsd/freebsd (I use to cross compile on x86 for ppc and arm before), I suppose the Linux community will see massive investment (time and obviously $$$) as larger players (like Apple, Qualcomm, Google … even Microsoft or Amazon for their clouds) are “in the game now” (I mean for the desktop now)

Reply
0 Kudos