VMware Horizon Community
Juhana1
Enthusiast
Enthusiast

Linux Ubuntu 20.04 architecture issue on VMWare Horizon client session start - traps: llvmpipe-0[XX]

 
Hi,
 
Installing a Ubuntu Linux 64-bit to a VMWare compute resource described below succeeds but we are unable to launch user sessions to the VM via VMWare Horizon Client. Upon launching a session, a black screen appears and the mouse cursor appears briefly, then the session window just disappears and returns to the view of available pools. SSH connection to the VM works and this kind of entries are visible in dmesg (-T) after the launch attempt:
 
[Tue May 4 09:25:13 2021] rfkill: input handler disabled
[Tue May 4 09:25:15 2021] rfkill: input handler enabled
[Tue May 4 09:25:18 2021] rfkill: input handler disabled
[Tue May 4 09:25:21 2021] rfkill: input handler enabled
[Tue May 4 09:25:23 2021] rfkill: input handler disabled
[Tue May 4 09:25:26 2021] show_signal: 35 callbacks suppressed
[Tue May 4 09:25:26 2021] traps: llvmpipe-0[3150] trap invalid opcode ip:7f0c980b8087 sp:7f0cdb05a340 error:0
[Tue May 4 09:25:26 2021] rfkill: input handler enabled
[Tue May 4 09:25:28 2021] rfkill: input handler disabled
 
Especially the "traps" line may suggest incompatibilites in the architecture, I read. Session launch attempts via VMware Horizon Client only cause this entry in the viewagent-debug.log:
 
2021-05-05T19:22:37.677Z DEBUG <pool-2-thread-1> [DesktopHandler] [DesktopID: 7] Desktop was destroyed now.
2021-05-05T19:22:37.677Z DEBUG <pool-2-thread-1> [IpcConnectionMgr] Clean up magic for client 7, magic is 37145-***-d9595
2021-05-05T19:22:37.677Z DEBUG <pool-2-thread-1> [DesktopManager] [DesktopID: 7] The session of desktop is null, don't send the AGENT_ENDED event.
2021-05-05T19:22:37.677Z DEBUG <Script Runner> [LinuxUtilities] Running script: /usr/lib/vmware/viewagent/bin/CleanupLogFiles.sh
2021-05-05T19:22:37.678Z DEBUG <Script Runner> [LinuxUtilities] waiting for process to terminate, script: /usr/lib/vmware/viewagent/bin/CleanupLogFiles.sh
2021-05-05T19:22:37.685Z DEBUG <Script Stdout> [LinuxUtilities] Keep all logs
2021-05-05T19:22:37.686Z DEBUG <Script Stdout> [LinuxUtilities] finished
2021-05-05T19:22:37.686Z DEBUG <Script Runner> [LinuxUtilities] process terminated with rc 0, script: /usr/lib/vmware/viewagent/bin/CleanupLogFiles.sh
2021-05-05T19:22:37.686Z DEBUG <Script Runner> [LinuxUtilities] Script finished: /usr/lib/vmware/viewagent/bin/CleanupLogFiles.sh
2021-05-05T19:22:37.687Z DEBUG <Script Runner> [LinuxUtilities] thread finished..
 
Strange thing is, if we add a NVIDIA GRID vGPU to the VM and install Nvidia's latest drivers, the VM works alright, session is launched via Horizon Client, no "traps" in dmesg. Any clues what is going on? Happy to share more info on our configuration if needed.
 
VM OS & kernel info:
 
# lsb_release -a
LSB Version: core-11.1.0ubuntu2-noarch:printing-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
 
# uname -a
Linux vdi-cbd-logo1 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
 
Compute resource info:
 
    Hypervisor: VMware ESXi, 7.0.2, 17630552
    Model: ProLiant DL380 Gen9
    Processor Type: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
    Logical Processors: 56
 
0 Kudos
19 Replies
zhiminli
VMware Employee
VMware Employee

Please let us know the version of the Linux agent installed in Ubuntu 20.04 VM. 

0 Kudos
Hangl
VMware Employee
VMware Employee

Do you use multiple monitors?

If so, please check if the whole resolution exceeds 8192(8K) in any direction, landscape or portrait. For details, please consult the section "Multiple Monitors" in the page https://docs-staging.vmware.com/en/VMware-Horizon/2006/linux-desktops-setup/GUID-67F7E8D6-E98C-4242-...

0 Kudos
Juhana1
Enthusiast
Enthusiast

 

Hi,

Thanks for the reply:

~# cd VMware-horizonagent-linux-x86_64-2103-8.2.0-17771892/
~/VMware-horizonagent-linux-x86_64-2103-8.2.0-17771892# cat Product.txt
VMware-horizonagent-linux-x86_64-2103-8.2.0-17771892

0 Kudos
Juhana1
Enthusiast
Enthusiast

 

Hi,

 

Our setup for the VM does support multiple monitors but I did the launch attempt from a single monitor using my laptop monitor that has display size 14" and resolution set to 1920 x 1080 (resolution recommended by Win10).

0 Kudos
Hangl
VMware Employee
VMware Employee

Hi,

As the limitation of llvmpipe, it panics if the resolution exceeds 8K. But seems your case is different.

Could you provide the logs collected by running /usr/lib/vmware/viewagent/bin/dct-debug.sh?

Thanks,

Hang

0 Kudos
Juhana1
Enthusiast
Enthusiast

 

Hi Hang,

Thanks for your time. Can you provide me a secure way to deliver the logs? An email address or maybe open a new case for us on the closed Customer Support side? My organization has an account there.

Best,

Juhana

0 Kudos
Hangl
VMware Employee
VMware Employee

Hi Juhana,

If possible, you can send the logs to me and our team, hangl(at)vmware(dot)com and linux-agent-bj-dev(at)vmware(dot)com

I'll give it an investigation and respond ASAP.

Thanks,

Hang

0 Kudos
Juhana1
Enthusiast
Enthusiast

 

Hi Hang,

Thanks, logs are in the email now. The attachment size is 28MB but I will much rather deliver them like this than post here publicly as sensitive information may be enclosed.

But thanks for helping us.

Best,

Juhana

0 Kudos
Hangl
VMware Employee
VMware Employee

Hi Juhana,

I didn't get the mail. Please double check.

Regards,

Hang

0 Kudos
Juhana1
Enthusiast
Enthusiast

 

Hi Hang,

 

Thanks for the notification.  I received an autoreply email from vmware.com that the mail was too large to send. I will use a local delivery system for large files next. You should receive a download link via email.

0 Kudos
Hangl
VMware Employee
VMware Employee

Hi Juhana,

In syslog, there are some suspicious errors,

 

 

May  4 09:35:24 vdi-*** gnome-shell[5096]: Getting invalid resource scale property
May  4 09:35:24 vdi-*** kernel: [  621.258117] traps: llvmpipe-0[5103] trap invalid opcode ip:7f741c022087 sp:7f7451e8d340 error:0

 

 

Did the customer ever set the scale factor?

Hangl_0-1620641020436.png

 

0 Kudos
Juhana1
Enthusiast
Enthusiast

 

Hi Hang,

 

Thanks for the reply and your input - and sorry for the delay. I don't think it's possible that the customer has set the scale factor. The crash happens right after when a session is started via VMWare Horizon client and no desktop ever appears, just a black screen for 1-2 seconds and then disappears.

I wonder if we can set and force a valid scale factor for the session from the command line somewhere?

Moreover, our setup for these VMs should be that the MATE desktop is loaded by default so I'm surprised to see gnome-shell entries in the log. I need to investigate that all settings are OK regarding the default desktop.

0 Kudos
Hangl
VMware Employee
VMware Employee

Hi Juhana,

Yes, your desktop environment is MATE. However the greeter(for login) is still gnome-shell and the issue just occurred at that time.

And seems that the gnome-shell was updated from 3.36.4 to 3.36.7. So I suggest to downgrade the gnome-shell and reinstall viewagent see if the issue is still there.

0 Kudos
Juhana1
Enthusiast
Enthusiast

 

Hi Hang,

 

Thanks for the advice. I removed the vGPU from the instance, uninstalled the NVIDIA module (we had installed this between our correspondence here as a workaround for the customer). Then I downgraded gnome-shell:

~# apt install gnome-shell-common=3.36.4-1ubuntu1~20.04.2 gnome-shell=3.36.4-1ubuntu1~20.04.2

...

~# gnome-shell --version
GNOME Shell 3.36.4

Then re-installed the viewagent, rebooted the machine. Unfortunately, the "traps"-entries still appear in the logs and the behavior is the same, in VMWare Horizon Client no session opens to this machine, a black screen briefly appears, then disappears and the client returns to the list of available VM pools.

-Juhana

0 Kudos
Hangl
VMware Employee
VMware Employee

Hi Juhana,

Since the issue occurs in gnome-shell(greeter), I do suspect this is strongly related to it.

Please try below two ways separately.

1. Reinstall gnome-shell and reboot without reinstalling viewagent.

2. Disable all gnome-shell extensions.

gsettings set org.gnome.shell disable-user-extensions true

and move all sub-folders out of /usr/share/gnome-shell/extensions/

 

0 Kudos
SCOlivier
Contributor
Contributor

has anyone tested this or hace a solution?

0 Kudos
RickW80
Contributor
Contributor

I had the same basic issue using Proxmox as the hypervisor instead of VMware.   This forum post got me in the right direction.

I was able to solve the problem by removing /var/lib/gdm3/.cache/mesa_shader_cache  that was somehow causing the scaling error, and the resulting llvm problems.   My VM image had been created from a system using an NVidia GPU for display, in case that's helpful.

Juhana1
Enthusiast
Enthusiast

 

Hi all,

This issue seems to be corrected in Ubuntu 20.04 kernel 5.11.0-27-generic

Running VMware Horizon view agent version 8.2.0-17771892
 
Thanks to all participants,
 
-Juhana K
0 Kudos
Perttu
Enthusiast
Enthusiast

Hi, 

Apparently this is not a kernel issue but a rare occasion being tied to an image, where the VM is cloned from.

In case that the original image has been previously running on a newer processor architecture, there might be precompiled caches at /var/lib/gdm3/.cache/mesa_shader_cache pointing to instructions, which are not supported on the current hardware. Hence invalid opcode.

Apparently there is a fix on its way. Newest Redhat has it, maybe Ubuntu soon as well.

https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1982746

And personally I'm running the following snippet as part of the post-customization userscript (the 'log' there is just a function call to a logging wrapper)

CACHEDIR=/var/lib/gdm3/.cache/mesa_shader_cache
if [ -d "$CACHEDIR" ]
then
  rm -rf $CACHEDIR
  log "removed $CACHEDIR"
else
  log "$CACHEDIR not found"
fi

0 Kudos