AlexWhiteraft
Contributor
Contributor

NVIDIA Grid vGPU M10 performance issues with PCOIP protocol (high GPU utilization)

Jump to solution

We have been working on implementing M10 vGPUs in our VMware environment and have been experiencing performance issues. We worked with NVIDIA to verify that the environment is setup correctly. Here is quick bullet point list for the environment.

  • vSphere 6.7
  • Host have VMware ESXi, 6.7.0
    • PowerEdge R740
    • Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
    • 768 GB RAM
    • 2x M10 GPUs
  • Horizon 7.4.0
  • Linked Clones
    • Windows 10 1803 (also tried 1809 & 1709 builds)
    • 4 core
    • 8 GB RAM
    • M10-1B vGPU profile
    • Teradici based zero clients (PCOIP)

What we have noticed is after our first small set of user testing and no issues we began a larger test and noticed that once we hit about 15 users per M10 we began getting performance issue reports. The users are seeing lag in the interface, a right click on the desktop might take 5-10 seconds for the context menu to appear. The same could be seen with the start menu. Additionally these issues were only occurring on vGPU VMs. On the same host non vGPU VMs were not experiencing the same slowdowns. We began to notice that the pcoip_server_win32.exe was using a lot of CPU and GPU time via the task manager. We began trying different version of the vmware agent and direct connect, various revisions of the driver for esxi. We tried standalone fresh copies of windows 10 and various build numbers. Thus far no combination we have attempted has resolved the issues for vGPU machines when there are more than a few users per machine. The performance problem appears even if we use the Horizon software client and have it set to PCoIP.

We tried running it with different Horizon Agent versions (6, 7.0.2, 7.2.0, 7.4.0, 7.5.1, 7.6 & 7.7) and using direct connect bypassing Horizon Server.

We also tried running VMs using VMware Blast protocol and it didn’t have the high GPU usage issues, but unfortunately, almost all of our thin clients only support PCOIP.

Attaching screenshot below: please note the GPU utilization of PCoIP Server (32bit) process.

teams.jpg

UPDATE 1:

After a long discussion with NVIDIA, they concluded that it's not on their side.

They pointed us at this KB: https://nvidia.custhelp.com/app/answers/detail/a_id/4156/~/nvidia-smi-shows-high-gpu-utilization-for...

Looks like it's a known issue with Teradici PCOIP protocol and it hasn't been fixed yet.

UPDATE 2:

I tried downloading and installing Teradici's PCOIP Agent (PCoIP_agent_release_installer_graphics.exe) direct from Teradici. Then I ran "NvFBCEnable.exe -disable", it disables NVFBC capture and switches back to CPU. And it works great - no GPU spike when idle and a much better performance overall.

However, when I try to do this on Horizon Agent's Teradici protocol it disables NvFBC for a brief period of time and then as soon as I reconnect via PCOIP it re-enables it back, see extract from a log below:

  1. Svgadevtap: NvFBC Fixed capture by enabling NvFBC

Is there a way to permanently disable NvFBC on Horizon Agent's Teradici Protocol?

1 Solution

Accepted Solutions
AlexWhiteraft
Contributor
Contributor

We were able to find a workaround for our problem.

We used a combination of memory dump & Sysinternals process monitor to find the registry key.

Below is a combination of settings we use to achieve satisfactory performance:

[HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap]

"Win32FrameRate"=dword:0000002d

"MaxAppFrameRate"=dword:0000002d

"ForceWin32Capture"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin_defaults]

"pcoip.audio_bandwidth_limit"=dword:000001c2

"pcoip.enable_build_to_lossless"=dword:00000000

"pcoip.enable_console_access"=dword:00000000

"pcoip.minimum_image_quality"=dword:00000028

"pcoip.maximum_initial_image_quality"=dword:00000050

"pcoip.maximum_frame_rate"=dword:0000002d

"pcoip.use_client_img_settings"=dword:00000000

The config is for 45FPS which is the maximum we can achieve with our current NVIDIA license.

We already spoke with VMware and they confirmed that is indeed a workaround for now, until they get Teradici to fix PCOIP code.

PCOIP_with_workaround.png

View solution in original post

9 Replies
AlexWhiteraft
Contributor
Contributor

I have found this registry hack - Create a new DWORD "NoNvFBC" and set "1" in Data here -  HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap

Then open CMD and CD to this folder "C:\Program Files\Common Files\VMware\Teradici PCoIP Server", and run "NvFBCEnable.exe - Disable". Reboot.

It works fine with 1 monitor, however then we add a second monitor to our Zero Client, it distorts the picture on both monitors...

0 Kudos
AlexWhiteraft
Contributor
Contributor

I spoke to one of the Teradici's representatives and they said:

"Unfortunately, Teradici cannot change the behavior within Horizon as it is a VMware product and they control any PCoIP changes that goes into Horizon."

So now we are hoping for a resolution from VMware team.

0 Kudos
AlexWhiteraft
Contributor
Contributor

We decided to give it another go and built everything fresh from scratch with brand new ESXi, drivers, 1803 image, etc.

The VM below isn't joined to the domain and doesn't have anything installed apart from Nvidia drivers & Horizon Agent 7.7.

See screenshot below, as you can see it's using 30% of GPU utilization in idle.

screenshot.png

0 Kudos
AlexWhiteraft
Contributor
Contributor

We were able to find a workaround for our problem.

We used a combination of memory dump & Sysinternals process monitor to find the registry key.

Below is a combination of settings we use to achieve satisfactory performance:

[HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap]

"Win32FrameRate"=dword:0000002d

"MaxAppFrameRate"=dword:0000002d

"ForceWin32Capture"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin_defaults]

"pcoip.audio_bandwidth_limit"=dword:000001c2

"pcoip.enable_build_to_lossless"=dword:00000000

"pcoip.enable_console_access"=dword:00000000

"pcoip.minimum_image_quality"=dword:00000028

"pcoip.maximum_initial_image_quality"=dword:00000050

"pcoip.maximum_frame_rate"=dword:0000002d

"pcoip.use_client_img_settings"=dword:00000000

The config is for 45FPS which is the maximum we can achieve with our current NVIDIA license.

We already spoke with VMware and they confirmed that is indeed a workaround for now, until they get Teradici to fix PCOIP code.

PCOIP_with_workaround.png

RHaerri
Contributor
Contributor

Hi Alex

I currently have the same Issue at a Customer.

You marked your Thread as solved. Did your Registry Changes really help?

Do you have any experience with your Changes on Zero Clients with two Screens?

Would be great to benefit from your effort.

Thanks

Robin

0 Kudos
AlexWhiteraft
Contributor
Contributor

Hi Robin,

Yes, changing the registry helped and it works great on two monitors.

See before & after the reg change.

Before - with 30 VM sessions:

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.68       Driver Version: 410.68       CUDA Version: N/A      |

|-------------------------------+----------------------+----------------------+

| GPU  Name Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  Tesla M10           On | 00000000:3D:00.0 Off | N/A |

| N/A   51C    P0    31W /  53W |   8142MiB /  8191MiB | 79%      Default |

+-------------------------------+----------------------+----------------------+

|   1  Tesla M10           On | 00000000:3E:00.0 Off | N/A |

| N/A   46C    P0    30W /  53W |   8142MiB /  8191MiB | 77%      Default |

+-------------------------------+----------------------+----------------------+

|   2  Tesla M10           On | 00000000:3F:00.0 Off | N/A |

| N/A   28C    P0    17W /  53W |   8142MiB /  8191MiB | 29%      Default |

+-------------------------------+----------------------+----------------------+

|   3  Tesla M10           On | 00000000:40:00.0 Off | N/A |

| N/A   40C    P0    30W /  53W |   8142MiB /  8191MiB | 73%      Default |

+-------------------------------+----------------------+----------------------+

|   4  Tesla M10           On | 00000000:DA:00.0 Off | N/A |

| N/A   52C    P0    33W /  53W |   8142MiB /  8191MiB | 94%      Default |

+-------------------------------+----------------------+----------------------+

|   5  Tesla M10           On | 00000000:DB:00.0 Off | N/A |

| N/A   44C    P0    19W /  53W |   5094MiB /  8191MiB | 41%      Default |

+-------------------------------+----------------------+----------------------+

|   6  Tesla M10           On | 00000000:DC:00.0 Off | N/A |

| N/A   41C    P0    35W /  53W |   6110MiB /  8191MiB | 100%      Default |

+-------------------------------+----------------------+----------------------+

|   7  Tesla M10           On | 00000000:DD:00.0 Off | N/A |

| N/A   43C    P0    25W /  53W |   6110MiB /  8191MiB | 57%      Default |

+-------------------------------+----------------------+----------------------+

After - with 50 VM sessions:

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.91       Driver Version: 410.91       CUDA Version: N/A      |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0 Tesla M10           On   | 00000000:3D:00.0 Off |                  N/A |

| N/A   29C    P8 10W /  53W |   8142MiB / 8191MiB |      2%      Default |

+-------------------------------+----------------------+----------------------+

|   1 Tesla M10           On   | 00000000:3E:00.0 Off |                  N/A |

| N/A   30C    P8 10W /  53W |   8142MiB / 8191MiB |     18%      Default |

+-------------------------------+----------------------+----------------------+

|   2 Tesla M10           On   | 00000000:3F:00.0 Off |                  N/A |

| N/A   27C    P0 20W /  53W |   8142MiB / 8191MiB |      5%      Default |

+-------------------------------+----------------------+----------------------+

|   3 Tesla M10           On   | 00000000:40:00.0 Off |                  N/A |

| N/A   27C    P8 11W /  53W |   8142MiB / 8191MiB |     34%      Default |

+-------------------------------+----------------------+----------------------+

|   4 Tesla M10           On   | 00000000:DA:00.0 Off |                  N/A |

| N/A   29C    P8 10W /  53W |   8142MiB / 8191MiB |      4%      Default |

+-------------------------------+----------------------+----------------------+

|   5 Tesla M10           On   | 00000000:DB:00.0 Off |                  N/A |

| N/A   35C    P0 19W /  53W |   7126MiB / 8191MiB |      3%      Default |

+-------------------------------+----------------------+----------------------+

|   6 Tesla M10           On   | 00000000:DC:00.0 Off |                  N/A |

| N/A   31C    P0 19W /  53W |   7126MiB / 8191MiB |      1%      Default |

+-------------------------------+----------------------+----------------------+

|   7 Tesla M10           On   | 00000000:DD:00.0 Off |                  N/A |

| N/A   29C    P8 10W /  53W |   7126MiB / 8191MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

LukaszDziwisz
Hot Shot
Hot Shot

Hello @Alex

It appears that we are hitting exactly the same problem on our end with PCOIP however I do not seem to have a key

  • [HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin_defaults]
    The only thing we have is [HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP
    Did you end up creating the pcoip_admin_defaults key and then adding all of the DWORD values?

0 Kudos
SchwarzC
Enthusiast
Enthusiast

Dear Alex.

Thank you for your fix - this fixed our issue with idle gpu as well - but our PcoIP Server Process`s CPU now always has 20-30 CPU regardless of what is on the monitor.

Any idea what we could do?

Best regards

0 Kudos
sWORDs
VMware Employee
VMware Employee

This has been fixed by Teradici and the Horizon team in Horizon 7.12, which we released yesterday. Internal testing results:

GPU usage for NVidia is drastically reduced, from 2-3 times for a single HD display (5% before ->2% after) up to 10 times for 4xUHD displays (50% before ->5% after).

CPU usage is equal (slightly less, but within testing variance) to the old path.

0 Kudos