VMware Horizon Community
terranaut
Contributor
Contributor

VMWare View 5.3 vDGA slow fps Performance disappointing

I recently tested vDGA and - i was really disappointed. The frames per second rate is far below requirements of engineers (<15fps on 1 monitor). This result was not expected at all and does not match the advertising videos by VMWare (on one video you see 50-60fps in the upper right corner while running a game?!)

The environment:

Proliant DL380p gen8 with Dual CPU E2650 (2Ghz) with 128GB RAM and 2x 146 SAS HDs 15k, Gigabit Network

ESXi v5.5, View 5.3. nVida Quadro K2000 has been successfully linked to 1 VM via pci-passthrough on a host (no other vms were installed on this host).

All required BIOS-Settings on Server have been done. Installed OS on VM was Windows 7 x64 Enterprise, recent nVidia driver and vmview agent as well as feature pack 5.3 installed.

VM has been placed into a VMVIEW-Pool. Within VM, the SVGA Adapter has been deactivated in Windows device manager. Connection to VM via a ZeroClient (Wyse P25) has been successfully established. DXDiag shows Quadro K2000 as primary GPU and Passmark recognizes the gpu - any test from DirectX9 to DirectX11 runs successfuly - results are as expeced.

Conclusion: vDGA has been successfully established.

But: The frame rate is below 15 frames per second in average (peak max. 19fps) while running a video or moving windows around. The Process pcoip_server_win32 runs at about 15% in Average, Maximum peak was 40% (using 2 vCPUs). I think this service is responsible for encoding PCoIP Packets. I have prioritized this process from "High" to "RealTime" - without any performance improvements.

Implementing PCoIP GPO and setting higher frame rate (120 instead of 30), modifying the PCoIP-bandwidth settings and reducing PCoIP initial image quality as well as adding vCPUs (from 2 to 4) and all other settings suggested by documents from VMWare or Teradici to the VM didn't show any significant video improvements. Connecting 2 Monitors results in 8 fps in average...

Is it possible to speed up PCoIP encoding within Windows Session?? I think this is the bottleneck! In Citrix you are able to set a registry key to accelerate Software-Encoding.

Or is there another workaround or known setting which helps to accelerate the fps? Thank you in advance for your help.

Below you see ZeroClient Statistics on Wyse/Dell P25 while running a FullSized YouTube Video...

Reply
0 Kudos
40 Replies
jg159357infigen
Contributor
Contributor

Thanks for both of those Linjo, but I've been doing more testing and it never seems like the application is limited on FPS (I was able to get the 120FPS on the fuzzy cube test, but could only get 3-7FPS at the zero client). 

For me, with test a lab of x5550 or L5520 processors, the single threaded performance of pcoip_server.exe inside the VM seems to be the limit.  I can through more vCPUs at the VM since this is currently the only guest on that host with dual x5550, but I just can't get it to encode data to send any faster than about 30Mpps to the zero client.  And the added problem is that it acts like it's choking trying to keep up and lowers the frame rate even farther.  If I stick to 1366x768 (~31Mpps @ 30FPS) I consistently see good smooth motion at the zero client with peaks up to 35-40FPS while running the Fuzzy Cube test.

I'm beginning to wonder if the only way to get 30FPS (or greater) is by sticking with low resolutions or having to buy an APEX card.  Even if I had to dedicate one of the x5550 procs in the system to only doing PCoIP encoding that would be an acceptable loss in our small environment that probably wont have more than 5 active sessions at a time.

-JG

Reply
0 Kudos
ksvman
Contributor
Contributor

>>I also wanted to add the APEX 2800 lists an encoding rate of 300 Mpps:

Yeah, i also think that problem must be at one of the two parts: encoding or decoding of PCoIP, and cause there is Zero Client using for decoding  it looks like adding hardware teradici encoding chip should help.

Reply
0 Kudos
IT_Vision
Contributor
Contributor

Hello Terranaunt,

I would like to know where you can view those stats in the image you posted?  I need to confirm what model of the Wyse/Dell thin client we have but would be great to see this info as well.  Also, do you know of a tool to show similar stats from a Horizon View session initialed from the View Client vs a Thin Client?

We are doing a combination of vDGA and sVGA with Nvidia GRID K1 and K2 cards in a multiple hosts.  Hoping to find some better tools to display fps from the connected client session and even the PCoIP traffic.

Cheers

Reply
0 Kudos
gqslb
Contributor
Contributor

HI,

i have an update on this with my own setup.

ive configured up a small hp server and setup a vm with the following configuration

4 cores

16gb memory

40gb disk

view 5.3

esxi 5.5u1

i attached a k2000 with vDGA and set all recommended performance configuration parameters. I also put in an apex 2800 card which I can confirm is now offloading the encoding.

very little improvement, I'm using a p45 zero client and I get 17fps average with a lot of window drag delay.

Not sure what else can be done. I'm waiting on a new server which will arrive next week, I'll be able to put my grid k2 cards in there and start testing vSGA.

Im hoping to god that vSGA gives us the performance we are looking for or I'll be extremely disappointed in the whole solution.

let me know if anyone has any further updates.

cheers,

Jamie

Reply
0 Kudos
TomMar
Contributor
Contributor

I've found that vSGA performance with the grid cards can actually be very good.  Make sure you have the correct BIOS settings, they make a huge difference.  We're using Dell R720s and the Grid K1 cards in our hosts.  I believe the big one was a memory refresh timer had to be set to 2x.  However on those same servers, vDGA is horrible.  We're seeing the same issues with low FPS.  Luckily vSGA is fine for our workloads and we've actually been pretty happy with it.

Reply
0 Kudos
IT_Vision
Contributor
Contributor

We run a cluster with K1 and K2 card.  We are fairly pleased with the performance.

We have started to utilize vDGA in our environment because vSGA only allows for around 512MB of video memory per VM, however running dxdiag reports ~600MB .  Some of the modeling that we do requires a lot more than this to be usable so utilizing vDGA allows the VM to have around 3.7GB of Video memory.  We also utilize the apex 2800 cards in our servers but hard to tell if they are really helping with the performance.  We see the offloading taking place from the host, but from an end user experience, I do not feel they notice it.  We see anywhere from 25-30fps utilizing the view client.

Reply
0 Kudos
ReinerHeinz
Contributor
Contributor

>>>I also wanted to add the APEX 2800 lists an encoding rate of 300 Mpps:

>>Yeah, i also think that problem must be at one of the two parts: encoding or decoding of PCoIP, and cause there is Zero Client using for decoding  it >>looks like adding hardware teradici encoding chip should help.


Very interesting about the Apex 2800 card.  I saw some recently on ebay at a good price.  I might grab one for my lab to do some testing!

Reply
0 Kudos
gqslb
Contributor
Contributor

Interesting that you're getting decent performance just using the view client.

I'm going to test out the view client today on a cleaner setup with the K2000 passed through. I'm hearing that vDGA gives decent results with K1/K2 passthroughs but not so much with other devices which don't simultaniously support vSGA.

Is it at all possible that vSGA is being used for non OpenGL rendering even when vDGA is enabled? And then vDGA handles all OpenGL & Video rendering?

I'll continue testing today.

Reply
0 Kudos
Linjo
Leadership
Leadership

gqslb wrote:

Is it at all possible that vSGA is being used for non OpenGL rendering even when vDGA is enabled? And then vDGA handles all OpenGL & Video rendering?

No, that it not how it works, with vDGA the rendering is taken care of by the physical GPU.

// JL

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
gqslb
Contributor
Contributor

After playing with some settings I finally found two registry settings which seemed to make my window dragging and general FPS a LOT better, I found these posted on this forum under a different topic for window dragging issues.

HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin

pcoip.enable_new_color_coding 0

pcoip.enable_temporal_image_caching 0

Both of these resulted in much better performance on my P45 zero client, very close to what I'd expect from a desktop machine. In addition, I'm able to work with my 3d Models at a higher frame rate with less lag.

I'd recommend you try this, no matter what your existing configuration is.

Cheers,

Jamie

Reply
0 Kudos
jg159357infigen
Contributor
Contributor

Well i was trying to make a nice long post about how helpful the APEX 2800 card was to get this working, but seems i can't.  Long story short is that 1080p60 uses up about 20% of an APEX 2800's performance.

JG

Reply
0 Kudos
jg159357infigen
Contributor
Contributor

Just to follow up for anyone else reading this thread with the caveat that I've only retested vSGA so far after getting an APEX card:

We purchased a used APEX 2800LP off ebay for testing.  After getting it upgraded to the latest firmware (some PCIe & BIOS issues still persist that requires rebooting the host several times/updating the bios if available to get the device to become available).  I built a brand new Win7 VM and patched.  Then installed Tools, View Agent, VM OS optimizer, and Apex 2800 Driver.  I rebooted between each step and let the OS Optimizer fling do all it's recommended settings.  I also changed the reg key [HKLM]\SOFTWARE\VMware, Inc.\VMware SVGA DevTap\MaxAppFrameRate
(DWORD) to 0x3c (60Hz).

For testing I have a personal GTX 470 Modded to appear as a Quadro 5000.  In VMware ESXi I have the latest driver: NVIDIA-VMware_ESXi_5.5_Host_Driver  319.65-1OEM.550.0.0.1331820           NVIDIA    VMwareAccepted    2014-06-30.

With all of this in place (The GPU wasn't needed for the YouTube test) I was able to do the following:
1) With 8 cores assigned (Host in this test was a 2x E5506 Machine) and 2GB RAM I was able to play 1 60fps youtube video (using HTML5 and 2x speed) while also watching an additional 30fps video.  this was using 3 monitors on my desktop using the PCoIP 2.3.3 - 1745122 client.  The third monitor was showing the CPU Task manager to be 80+% utilized.

Reply
0 Kudos
jg159357infigen
Contributor
Contributor

2) With 3 cores (Single CPU Host i5-2500k) and 6GB Ram.  This machine has the GPU set to Hardware 3D acceleration.  I am able to play CS:Source with a slight input lag, but no noticeable FPS lag.  I'm able to view the fuzzy cube demo from GPU Caps Viewer at ~60 FPS.

We're still doing more internal tests, but the APEX card is 100% instrumental in doing anything over about 15fps at 1080 resolution.  Below is the output of the card's info while streaming a 1080/60fps youtube video (used 80-100% of the 3cores of the i5-2500k as well).  You can see a single 1080p screen can use about 1/5 of the APEX card, but makes the VM undestiguishable from a physical PC.

/opt/teradici/pcoip-ctrl -I

Reply
0 Kudos
jg159357infigen
Contributor
Contributor

APEX2800 Driver Information:

   - SVN revision (33571), Built May 27 2014 : 23:20:42

   - Display Manager is (ENABLED)

   - Display Portrait Mode is (DISABLED)

   - Maximum Resolution Supported: 1920 x 1200

   - Number of displays supported: 64

APEX2800 Device Summary:

   (1) APEX2800 device present

     -- APEX2800-LP PCIe (Bus 4) (IN_SERVICE)

        ++ Serial Number (#######)

        ++ Firmware SVN revision (33571), Built May 27 2014 : 23:17:47

        ++ CPU Temperature (54c), Ambient/Board Temperature (48c)

        ++ Device Util (20), Image Pipeline (60708) Kpps

Virtual Machine Summary:

  -- (1) Virtual Machine Found

  -- (1) PCoIP Session Found

  -- (1) PCoIP Display Offloaded

I'm not sure how long i'll have this test setup available to play with, but if you have a specific request I will try to accommodate any tests.

JG

Reply
0 Kudos
projectserve
Enthusiast
Enthusiast

I've tried as well VDGA with an Quadro K2000 and a Pass-through Grid K1 (1 GPU).

Still we are experiencing a lot of lag within windows.

We are using an APEX 2800; also updated drivers etc with this machine.

We noticed that pcoip_server_win32.exe using constantly 20-25% cpu of a quad core; seems that this is te culprit of the lag within windows. (when using vSGA cpu load is normal 0-5%)

We tried all options maxframes, used as suggested: MontereyEnable.exe – enable etc.

Currently on ESXi 5.5.0 2068190 (U2?)

NVIDIA-SMI 340.32     Driver Version: 340.32
APEX2800 Driver Information: SVN revision (35302), Built Jul 15 2014 : 12:28:01

Using Vmware View 5.3 (latest update), Windows 7 x64 PRO as client.

Any ideas how to fix this? We are running out of options.

Reply
0 Kudos
Linjo
Leadership
Leadership

The K2000 is not a supported GPU for vDGA and should not be used for any performance testing.

If you have not looked at the powesettings on the host please do, this is a common reason for bad user experience.

Also upgrade to Horizon View 6.0.1.

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
projectserve
Enthusiast
Enthusiast

Linjo,

Thanks for the answer. We use the Grid K1 primairly to test; when we had bad results we tried the K2000.

We have zero problems with vSGA, the system is running very very smooth on our zero clients. Only vDGA is extremly laggy. (pcoip_server_win32.exe constantly CPU load).

Currently we have set Power management @ High Performance, and also disabled ACPI-P & C states, Turbo core in the bios; same results.

We are going to try Vmware View 6.0.1 next week.

Reply
0 Kudos
jamchal
Contributor
Contributor

Projectserv,

I also experienced issues with vDGA getting sub 20fps. Some improvements were made from the following:

1. Enabling image caching on the zero client - this improved overall window dragging performance and 3d performance, this reduced load on the pcoip server process. This was the main improvement in performance for me.

2. Using an APEX 2800 Offload card with the latest drivers and firmware, the updated drivers on this improved overall performance by about 8-10%. Also, set this to use 2560x1600 resolutions, it then reserves more Mpps for each session.

3. Upgraded from View 5.3.1 to View 6.0.1 - this again enhanced vDGA performance.

4. Capped frame rates at 30fps to reduce lag experienced when the rates dropped.

5. Increased compression through image quality settings, although you see this visually, the build to lossless in PCoIP negates the image quality effects quickly - especially on high bandwidth connections.

6. Use 10GbE on the server to the switch, especially if you're running more than one VM through the interface. The PCoIP session seems to notice the high amount of available bandwidth and increases the Mpps (Megapixels per second) transmission rate.

7. Reduce your expectations, if you're comparing to a 1:1 workstation host card, expect >60fps solid as the Host cards can do 300Mpps for each session as opposed to the APEX offload card which shares the 300Mpps between all of the available sessions.

I'd say good performance is... 23-30fps output from PCoIP (even although your K1/K2 cards can render 60fps).

It's all about tweaking though to find the right balance, I use Wyse P45 zero clients on all my VM's and the performance is great, you will get away with using P25's and should receive the same level of performance.

From what you're saying around the PCoIP process, you should expect around 8% utilisation if your apex card is offoading properly, otherwise it'll use around 1 full virtual core.

Let me know if you have any questions.

EDIT:

My configuration is:

Dell R720 Server, 20Cores @ 2.8Ghz, 256GB Memory, Dual GRID K2 cards, FusionIO IOScale 1.6TB for VM's, 10GbE, APEX 2800 Offload Card

VM's are 4:1 per server specced at:

6 Cores each, 48Gb Memory, 1x K2 Core Passthrough, 200Gb Disk, 10GbE (two VM's per 10GbE Port) nVidia 340 drivers.

Jamie

Reply
0 Kudos
IT_Vision
Contributor
Contributor

Greeting ProjectServe,

What type of workload are you running on the VMs?  Standard office users or 3D modeling CAD App user?

I would be interested to know what your PCoIP GPO setting are for these VMs.

Min Image Quality

Max Initial Image Quality

Max Frame Rate

Session Bandwidth Floor

Build-to-Lossless

We run a mix of Office and CADD users both local and across the WAN and have similar config with both sVGA and dVGA.  Our 3D CADD users seen an improvement in model performance when on the dVGA VMs, however we still see the lagging when dragging windows across the screen.  I myself have been struggling to get this sorted out as it is noticeable in both sVGA and dVGA.

I would be interested to see what others have set for PCoIP session GPO for theirs VMs.

Cheers

Reply
0 Kudos
projectserve
Enthusiast
Enthusiast

These are my settings...

(GPO Overridable)

Min Image Quality (50)

Max Initial Image Quality (75)

Max Frame Rate (24 > Use image settings from zero client if available.)

Session Bandwidth Floor (Not configured)

Build-to-Lossless (Disabled > Enabled turn-off build....)

Registery (HKLM\Software\VMware, Inc.\VMware SVGA DevTap):

MaxAppFrameRate = 0

Win32FrameRate = 0

Zero client settings:

MTU size (1200)

Min Image Quality (60)

Max Initial Image Quality (80)

Max Frame Rate (120)

Disable Build to Lossless (True)


Currently vSGA with a GRID K1 & Apex2800 running smoothly 2vCPU / 4GB Memory.

The server has a 10Gbe connection to the NFS server and Vconnection/Vcenter servers.

vDGA keeps lagging/jerky when dragging etc. PCoIP CPU Utilization is very high compared to the vSGA setup.

Still trying to get some time to upgrade to Horizon 6.0.1 Smiley Happy

Reply
0 Kudos