VMware Communities
acdingman
Contributor
Contributor

[FIXED] vmwgfx module won't load in upgraded VMs

This idiot remembered to look for module.blacklist, but not nomodeset. Which was there, and causing the problem. Easily fixed with sudo grubby --update-kernel=ALL --remove-args=nomodeset and a reboot.

I had a couple Fedora VMs from the previous version of the tech preview. It was easy enough to get them bootable again using rescue mode from a Rawhide installer CD to install an updated kernel. However, it seems that nothing I do to the upgraded VMs makes it possible to load the vmwgfx kernel module. As a result, vm tools integration can't interact with the screen resolution. However, a freshly installed Rawhide VM loads the module and works just fine.

`vmwgfx` is definitely not blacklisted. In fact, the string "vmwgfx" does not appear in any file under /etc or /boot on either VM, confirmed with `$ sudo grep -ERH vmwgfx /boot/ /etc/`, which returns nothing. Likewise it hasn't been blacklisted on the kernel command line, confirmed with `/proc/cmdline`. Not to mention that anything there should have shown up in a config file under /etc or /boot.

I do see a very slight difference in the presented virtual hardware. A simple `lspci` on the two VMs produces identical output, the relevant piece of which appears to be `00:0f.0 VGA compatible controller: VMware Device 0406`. However, there is a slight difference in the verbose output. On the functional VM, I see:

`00:0f.0 VGA compatible controller: VMware Device 0406 (prog-if 00 [VGA controller])
Subsystem: VMware Device 0406
Flags: bus master, medium devsel, latency 64, IRQ 49
Memory at 3d000000 (64-bit, non-prefetchable) [size=4M]
Memory at 70000000 (64-bit, prefetchable) [size=128M]
Capabilities: [40] PCI Advanced Features
Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [60] MSI-X: Enable+ Count=6 Masked-
Kernel driver in use: vmwgfx
Kernel modules: vmwgfx
`

On the VM that can't load the driver, I see:
`00:0f.0 VGA compatible controller: VMware Device 0406 (prog-if 00 [VGA controller])
Subsystem: VMware Device 0406
Flags: bus master, medium devsel, latency 64, IRQ 255
Memory at 3d000000 (64-bit, non-prefetchable) [size=4M]
Memory at 70000000 (64-bit, prefetchable) [size=128M]
Capabilities: [40] PCI Advanced Features
Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [60] MSI-X: Enable- Count=6 Masked-
Kernel modules: vmwgfx
`
The only differences appear to be that the non-functional device has both MSI and MSI-X disabled, and most likely as a result of not using message-signalled interrupts, is on a different interrupt number. Also, although LSPCI is able to identify that `vmwgfx` is the correct driver, it isn't loaded. Attempting to do so manually produces:
`$ sudo modprobe vmwgfx
modprobe: ERROR: could not insert 'vmwgfx': No such device
`

The corresponding .vmx files show little difference that I can identify as relevant. The non-functional VM does contain a number of vmotion.svga.* parameters that are not present in the functional VM. However, it doesn't help to remove them. Even when I carefully power down the VM, delete the entries, and then start it again using `vmrun` and the VMX file name, those entries get added back and the behavior remains the same. A full `diff -u` is attached. In the diff `rawhide-server` is the one that works right, and `Fedora` is the one that can't load the module.

Reply
0 Kudos
7 Replies
acdingman
Contributor
Contributor

Also worth mentioning: all the above was true before I upgraded the installed distribution to Rawhide as well as after, on several kernel versions that worked on the machine that was originally installed with Rawhide and never had a Fedora 35 or 36 kernel.

Reply
0 Kudos
Technogeezer
Immortal
Immortal

Lots of questions

  • What was the installed version of Fedora at the time that this happened?
  • What packages did you install to fix the problem at the time?
  • Could you post the system log from the time of boot?
  • What are the kernel packages (kernel.aarch64, kernel-core.aarch64, kernel-headers.aarch64, kernel-modules.aarch64, kernel-modules-extra.aarch64, etc...) that are installed on the machine that's exhibiting the problem? Also, is libdrm.aarch64 installed?
- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
Technogeezer
Immortal
Immortal

You are certainly not running the same kernel versions on the VM that's running and the one that isn't.

Have you tried to upgrade the kernel on the non-working version to the same one as the one that's working?

The differences in the vmx file in the "svga" entries appear to me to be OK depending on if you have 3D support turned on in the VM's Display settings.

I see that the VMX files indicate that Fedora server is running on the VM that isn't working? And if so, are you running a graphical interface on Fedora Server?

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
acdingman
Contributor
Contributor

The VM that doesn't work was originally installed as F35 under the first tech preview, then upgraded to F36, which initially didn't boot under the 22H2 release. As of the time I gave up on my own troubleshooting and posted this, it has been upgraded to "branched" and then to "rawhide".

The packages I installed to correct the boot problem were whatever the newest kernel in the Fedora 36 repo was at the time. 5.19.something. I have not installed anything specifically to address the vmwgfx module issue.

Here's what's still installed for kernel options on the one that can't load the module:
$ sudo dnf list installed kernel*
Installed Packages
kernel.aarch64 5.19.3-300.fc37 @Fedora
kernel.aarch64 5.19.7-300.fc37 @updates-testing
kernel.aarch64 6.0.0-0.rc4.31.fc38 @Fedora
kernel-core.aarch64 5.19.3-300.fc37 @Fedora
kernel-core.aarch64 5.19.7-300.fc37 @updates-testing
kernel-core.aarch64 6.0.0-0.rc4.31.fc38 @Fedora
kernel-doc.noarch 6.0.0-0.rc4.31.fc38 @Fedora
kernel-headers.aarch64 6.0.0-0.rc4.git0.1.fc38 @Fedora
kernel-modules.aarch64 5.19.3-300.fc37 @Fedora
kernel-modules.aarch64 5.19.7-300.fc37 @updates-testing
kernel-modules.aarch64 6.0.0-0.rc4.31.fc38 @Fedora
kernel-srpm-macros.noarch 1.0-15.fc37 @Fedora

Every kernel release for F36 since I got the upgraded VM to boot under 22H2, as well as the two F37 kernels from "branched" and the one from "rawhide", have failed to load the vmwgfx module with the same message.

Yes, libdrm is installed. Currently at `libdrm-2.4.112-1.fc38.aarch64`

Which "system log" do you want? dmesg? A full journal dump?

 

Reply
0 Kudos
acdingman
Contributor
Contributor

At the moment I copied the two VMX files, no, I was not quite running identical kernel packages on the two. But I have, with the same difference in outcome. The machine originally installed from a Rawhide boot image works, and the one that has been upgraded from Fedora 35 cannot load the module.

So yes, I have have tried upgrading them to precisely identical kernels, and it didn't help. And at the moment I'm posting this, I have both running `6.0.0-0.rc4.31.fc38.aarch64`.

The VM that is working is named 'rawhide-server', was installed with the 'server' variant, and yes, it has a Gnome on Wayland GUI installed -- using the same "Fedora Workstation" environment group that pulls it in when using the Workstation variant.

The VM which is not working is named 'Fedora' and was installed using the 'workstation' variant originally, which includes Gnome on Wayland by default. DNF tells me that adding the environment group for the 'server' variant would pull in network multipath support, the Cockpit web admin panel, and some dependencies for esoteric LUKS features that I have no need for in these VMs. Nothing at all related to video or guest tools.

Reply
0 Kudos
acdingman
Contributor
Contributor

Fixed. Effing thing still had `nomodeset` in the kernel arguments and I was blindly looking for module blacklists.

Reply
0 Kudos
Technogeezer
Immortal
Immortal

Yeah, it's sometimes difficult to keep up with which distro uses which blacklisting methods.

Not even consistent in the RHEL family. CentOS Stream and RHEL use /etc/modprobe.d blacklisting, while Fedora uses kernel arguments. 

Glad you were able to find it. It's frustrating when the problem turns out to be the loose nut holding the wheel, right?

🙃

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos