VMware Horizon Community
DJLO
Enthusiast
Enthusiast

Screens Cutting In And Out / Non Stop APEX 2800 Issues

Hi everyone,

I'm here as a last resort, literally.  Here is my problem.:  The monitors cut in and out periodically throughout the day for all my users, sometimes once an hour, sometimes every 10 minutes, sometimes it won't happen for a day or two.  User will be working and both screens just go black for a second or two then they get their desktops back.

We have a View 5.3 instance running (we've been using it since 5.1) with a 50 person office and a full gigabit network that can easily handle 200 users worth of traffic.  It was overbuilt on purpose.  I have 20 users that i've moved to View.  Our goal from day 1 was to achieve as close to a real desktop experience for my users as possible.

I purchased a 4 node Super Micro with 2 x 6 Core Intel Xeon's, 128GB per node, 2 1GB NICs per node and 8 Intel 520 series SSD in RAID 10 (2.8TB) per node.  The system is a beast, we hammered this system to hell and back before we installed view on it.

We purchased HP T310 Zero Clients.

We are using GPO's to manage the settings (all per recommendations from VMware and Teradici)

Base Image (and desktops)

Windows 7 x64

2 Cores

6GB RAM

60GB HDD (not using persona or persistent disks)

Linked Clone dedicated pools

Enable 3D rendering

Set Video RAM to 512MB

Force PCoIP as connection type

Aero Enabled Desktops

Office 2013 primary set of apps used

Initially the pilot went very well but i quickly started getting complaints that it was slow to move windows around, scroll in IE and Excel and management asked me to fix this.

So we bought some APEX 2800 cards from Teradici and so began my nightmare.  As soon as we put these cards in, got drivers installed (2.3.1 at the time, now up to 2.3.3) users started complaining the monitors are flickering, cutting in and out periodically.  View was upgraded to 5.2, APEX came out with 2.32, problems persisted.  January we upgraded to View 5.3, APEX 2.3.3 and moved our Infrastructure from vSphere 5.1 to 5.5 including upgrading all the ESXi hosts.  Problem persists.

So i open a case with Teradici 1st week of January.  After a bit of run around and not getting called back, I escalate the case and get a call.  I was asked to enable full debug logging and start submitting logs.  I sent 2 weeks worth of logs to Teradici and initially I get this reply back:

"I went through the logs you submitted and its the same issue in each log. There is an "Imaging Timer Expiry" at each of the times you indicated the issue was occurring. This entry signifies that imagingon the VM has not received a response from the client in the last 5 seconds, and imaging tries to restart the imaging session(causing screen flicker)"

Things i've done on my own while waiting for Teradici:

Rebuilt View from Scratch

Rebuild the base image from scratch

optimized per Vmware and Teradici

Moved the vlan my users are on to the same switch as the ESXi server

Recomposed all pools and desktops

Firmware upgrade on all Cisco switches

Firmware upgrade to 4.2.0 on all Zero Clients

Latest Teradici APEX ESXi and VM drivers

Nothing has helped.  Management is really putting pressure on me to fix this.

We've googled every possible combination of this error and i find nothing.  Nothing here on these forums, Teradici's or anywhere on google, yet they tell me others are having the issue and engineering is looking into the issue.

I'm at a total and complete loss.  Can anyone offer any suggestions, has this happened to any of you?

Attached are some logs i've sent to Teradici (one of the hundreds)

The flicker in this set of logs happened at 2:29pm.  I don't know what else to do.  Ive set Aero to best performance, let windows manage it etc. I can't crack this nut.

I cannot disable Aero as that was one of the requirements management wouldn't flex on, the end user experience had to be the same as their laptops, which for the most part it is except for this bloody screen flickering thats driving my staff nuts, upsetting management and making me want to give up and try something else.

My Cisco guy has had traces running on the ports of some affected users, we cannot find anything incorrect with the network.  All priorities for PCoIP are set and he has assured me the traffic flow for the 20users is not even coming close to a gigabit, we can't be saturating the link

EDIT:  Would adding some Nvidia cards and returning these APEX cards to Teradici solve this problem?  We're prepared to get a grid card for eval, even though no one really uses any 3D apps, I really need to get this resolved once and for all

39 Replies
nzorn
Expert
Expert

Same problem here, I just asked one of my users (they've never reported it to me though).

Dell R720 servers w/ dual E5-2665 and 256GB RAM

View 5.3

ESXi 5.5 - 1331820

vCSA 5.5.0.5201 Build 1476389

APEX 2800 LP Cards using the 2.3.3 driver

Also thinking the APEX card could be related to this issue as well: Re: View 5.3 - Another task is already in progress

0 Kudos
Rorus
Enthusiast
Enthusiast

We also recently purchased some APEX cards, and are running 2.3.3 on View 5.3.

A few of our users have similar complaints, however instead of going black they get the 'Network Disconnected' overlay briefly.

I would be very keen to hear your resolution on this. If you get further responses from Teradici, would you mind posting it here?

Thanks,

Rorus.

0 Kudos
DJLO
Enthusiast
Enthusiast

Absolutely I will share any info i get.  Currently my case has been escalated to engineering at Teradici.  Stu has been in contact with me a few times.  As it sits, Teradici requested vmkernel logs to be attached as well as the PCoIP and Zero Client logs.  I've started sending them bundles this week with all 3 logs, each time i get a report from a user.

Hopefully they'll be able to identify root cause soon enough.  However, since this is a tough problem and not everyone is experiencing it, i recommend opening a case of your own with Teradici (it's free support) and reference my case : 15134-20223


Perhaps it will help them get to a root cause sooner if they get more logs from someone other than just me


Paolo

0 Kudos
gmtx
Hot Shot
Hot Shot

Just had one of my users report the same "black screen" problem. Similar environment - ESX 5.5 (1474528), View 5.3, Apex 2.3.3. We just installed these cards about two weeks ago and have never had a complaint about screens going black prior to this, so it sounds like I need to open a ticket with Teradici as well.

Geoff

0 Kudos
DJLO
Enthusiast
Enthusiast

Teradici had me turn on a undocumented registry setting on the affected VMs on Friday.  While it did not stop the issue, the frequency has dramatically reduced.  A few of my users were getting this flicker anywhere from 3 - 10 times in an 8 hour shift.  Since Friday, each user has only had it happen once.

I think they're moving in the correct direction with this one.  I feel the issue may resolved sooner than later

Crossing my fingers here...

If you do open a ticket, ask them if the temporal_cache registry setting would be applicable for you.  Here is what i've done for now. 

"

Ensure you have the necessary backups in place and are familiar with changing the registry before proceeding and the notes below have been considered.

  1. Click Start.
  2. Type regedit in "Search programs and files".
  3. Browse toHKEY_LOCAL_MACHINE\SOFTWARE\Policies\Teradici\PCoIP\pcoip_admin.(If you do not see the 'Teradici' folder and subsequent sub-folders, you will have to create them.)
  4. Add new DWORD Value and name it pcoip.enable_temporal_image_caching.
  5. Give this a Value data: of 00000000 and click OK.
  6. Disconnect/Reconnect your PCoIP session for the change to take effect.

Along the top of the new 'pcoip_server' log, a new line advising the caching is disabled (set to 0) appears.

LVL:0 RC: 0 MGMT_ENV :cTERA_MGMT_CFG::Registry setting parameter pcoip.enable_temporal_image_caching = 0 "


0 Kudos
gmtx
Hot Shot
Hot Shot

Interesting... I thought that key looked familiar, and it turns out I already had it in place to resolve a performance issue with Tera2 zero clients and View 5.2. Wonder if that explains why I'm getting so few complaints?

Geoff

0 Kudos
DJLO
Enthusiast
Enthusiast

Teradici got back to me yesterday with a beta fix they asked me to try.  They've provided me a new pcoip_server_32.exe file to replace the existing one for my problem VMs.  We pushed it out to 4 users this morning.  I need to monitor the situation for at least 4-5 days since it did not happen all the time to these people.

Sounds like they're getting close to knocking this one out.  I'll post back in a few days once we get enough data for a baseline

Paolo

0 Kudos
TPCooney
Contributor
Contributor

I have been experiencing this issue as well for a few months.  It all started when we installed APEX 2800 offload cards in two of the three hosts in our VDI cluster.  I submitted a support ticket to Teradici and have been working with them over the past month on this issue.  The one thing I've noticed is that it appears to only affect the users with dual displays, users with single displays have not complained of this issue.  And like some of you mentioned, it's random and doesn't seem to affect all users.  What I did initially is setup several user desktops with the Teradici drivers, installed the offload cards in two of our three hosts, created affinity rules in vCenter to keep the desktops with Teradici drivers on the hosts with the offload cards installed.  When I first ran into this problem, I would migrate the problematic desktop to the host without the offload card installed, setup an affinity rule to keep it there and the problem seemed to go away.  So, I believe it definitely has to do with the offload cards or the drivers installed in the desktops.  As of today, Teradici's latest response was to try the registry setting that DJLO mention in a previous post.  I did that and am now waiting to see if the problem re-occurs.  It would be interesting to see what others are using in their environment in terms of software versions.  Below is what we are currently using in our environment.

ESXi 5.1.0 build 1612806

Horizon View 5.2.0 build 987719

Unidesk (desktop layering software): 2.5.5.410

Teradici: 2.3.2

Tim

0 Kudos
DJLO
Enthusiast
Enthusiast

Hey everyone

I'm very happy to announce that the beta pcoip_server_32.exe that was given to me by Teradici has resolved my issue 100%.  Its been about 3 weeks and not a single user has had the flicker.

I was told by Teradici it would be rolled up into the next VMware View patch.  If any of you are facing the issue and can't wait, open a case with Teradici and reference my case 15134-20223.  Perhaps they can release this fix for you as well.  I have not rolled it out across my entire rollout.  I've only patched the users who were the most vocal (and a pain in my side) for now.

TPCooney
Contributor
Contributor

For those of you experiencing the problem described by DJLO in this post, Teradici, as of yesterday, has released update drivers which fix this issue.  See this link http://techsupport.teradici.com/ics/support/kbanswer.asp?deptID=15164&task=knowledge&questionID=1988 you will be required to logon to download the updated drivers.

0 Kudos
DJLO
Enthusiast
Enthusiast

I was just going to post that.  Just got an email from Teradici. the 2.3.4 drivers are out for APEX 2800 and vSphere 5.5, my flickering is totally gone. Have to give Teradici credit.

It took a few upset emails unfortunately to kick start this discovery but once engineering came on board, they worked tirelessly and kept me in the loop each step.  We're a 50 person shop, small potatoes in the grand scheme of Teradici's business model yet they took the time and dedicated resources to find root cause and produce a fix for this issue.

I am amazed at their service.  Truly a wonderful company to deal with... And all the support is free.. You can't beat that

0 Kudos
TechMassey
Hot Shot
Hot Shot

That is great to hear, we have been having the same issue but less intermittent. I second that the support form Teradici has been fairly solid.

I would recommend anyone who has the Apex 2800, to create a support account at Teradici. There is not just drivers and manuals, but video walkthroughs. I have also found getting very familiar with the PCOIP-CTRL command for the Apex 2800 is a huge help.


Please help out! If you find this post helpful and/or the correct answer. Mark it! It helps recgonize contributions to the VMTN community and well me too 🙂
0 Kudos
nzorn
Expert
Expert

I installed the new driver on my View Template (apex2800-2.3.4-rel-31865.exe) and the problem is still here after my recompose last night.  I did not change any registry keys or change the ESXi driver.

Is there a way to verify I'm using the new driver?  The C:\Program Files\Common Files\VMware\Teradici PCoIP Server\pcoip_server_win32.exe file has a timestamp of 4/8/2014 @ 6:24pm.

0 Kudos
TPCooney
Contributor
Contributor

To confirm your APEX Windows driver is version 2.3.4 (Teradici's latest) is by checking the properties of the tera2800_accel.dll file located in the following location:

C:\Program Files\Common Files\VMware\Teradici PCoIP Server

Under the details tab of the tera2800_accel.dll file the file version is listed.

0 Kudos
nzorn
Expert
Expert

The tera2800_accel.dll file shows version 2.3.3.28364 and has a timestamp of 4/8/2014 @ 6:24pm.

So this should be showing 2.3.4??

0 Kudos
TPCooney
Contributor
Contributor

The file version you list is correct.  Also check the version on the pcoip_server_win32.exe file.

0 Kudos
nzorn
Expert
Expert

The pcoip_server_win32.exe shows version 3.12.3.31781

0 Kudos
TPCooney
Contributor
Contributor

After installing the virtual machine driver, did you power the virtual machine off?  Teradici states the machine must be powered off and not just rebooted.

0 Kudos
TPCooney
Contributor
Contributor

The pcoip_server_win32.exe file version should be 3.12.0.31782

0 Kudos