Hi guys,
Quick info on the environment
ESXi 6.5U1
VCenter 6.5b Appliance
Horizon View 7.2(Connection,Composer,Security)
All Flash SSD SAN Storage
Teradici Apex 2800
Nvidia GRID K2
Basically as the title says, since our upgrade to the latest versions, we've had heeps of issues... weird lock files when recomposing our images, composer errors stating vmx files not being found during recompose or refresh, but this latest one is actually more severe. Our vGPU environment is simply unusable at this time, at least any pool running the latest agents.
Once a user logs in, we get a black screen for approximately 15-20 seconds, and then a kick out back to the desktop(Zeroclient or View Client have the same result).When we try again we get the following message from the client "The View agent reports that you have an existing desktop session request that is currently being processed. Please wait for this to complete before trying again".
After another 20-30 seconds, we can try again and it will bring us to the desktop,but 50% of the time, it is unusable. Taskbar is either incomplete or unclickable most of the time. So my only recourse for now is to downgrade all the images to the View Agent 6.2.2 which seems to work ok, but I would prefer finding the actual issue instead if anyone else has experienced this.
This isn't present on our vSGA cluster, only on the vGPU portion. Windows 7 and Windows 10 have symptoms. I still did a full clean install of Win10 with only the agents installed and no other software, exact same result.
I will do also a full power shutdown and clean start up of the environment simply for my sanity, but I don't think it'll get rid of the black screen issue.
Any input or thoughts would be greatly appreciated.
Cheers,
Ben
Hey Parmarr,
just wanted to take the time to update this. I got an answer from support stating that this is a known issue with version 7 and that it is slated to be fixed in 7.4...Right now,I reverted back to 6.2.4 and don't seem to have the issue, but the fact that it would take over 4 releases to fix such an enormous issue for vGPU is appalling.
View support has truly gone down the drain.
This issue is possible with many reason but common issue can be check with KB tag and one of the internal KB 2147294 suggest for setting at Nvidia Grid. The setting are This is a known issue with nVidia drivers. The issue noticed at drivers >367.43. nVidia reference bug number: 200130864
With reference to https://docs.nvidia.com/deploy/pdf/XID_Errors.pdf file help ticket with nvidia support team,
When this event is logged, NVIDIA recommends the following:
Run the application in cuda-gdb or cuda-memcheck , or Run the application with CUDA_DEVICE_WAITS_ON_EXCEPTION=1 and then attach later with cuda-gdb, or File a bug if the previous two come back inconclusive to eliminate potential NVIDIA driver or hardware bug.
Hey Parmarr,
just wanted to take the time to update this. I got an answer from support stating that this is a known issue with version 7 and that it is slated to be fixed in 7.4...Right now,I reverted back to 6.2.4 and don't seem to have the issue, but the fact that it would take over 4 releases to fix such an enormous issue for vGPU is appalling.
View support has truly gone down the drain.
My current enviromnet
ESXi 6.5U1
VCenter 6.5b Appliance
Horizon View 7.1
All Flash SSD SAN Storage
Teradici Apex 2800
Nvidia GRID K1
I too had the black screen with the grid cards. I found the Solution was to connect to the virtual Golden Image after you have attached the grid card with Horizon direct connect and before your snapshot for your pool. I believe what is happening is the virtual machine thinks your conneting through vcenter so its trying to use the (Vmware SVGA driver) gives you the black screen but after 20-30 sec it disconnects you to change your driver to (Nvidia). By connecting with Horizon direct connect it establishes the Grid Driver, then when you build a pool with that Golden image no driver flip flop.
The errors with recomposing
Putty to all ESXi host with a grid card and navigate to /etc/vmware/hostd/config.xml
Note :: Before making the changes below, please take a backup of config.xml file.
- Under
<plugins>
<statssvc> (locate this existing section)
<collectGpuStats> false </collectGpuStats> (add this line within section and before </statssvc>)
After adding this line, restart hostd service
this was from vmware after many months of trouble shooting they believe the GPUStats are causing issues after doing ths it corrected 98% lock errors.
the other 2% seems to be corrected from disabling Storage Accelerator in a linked clone pool.
Hope this helps
Kelly
Have you had a chance to update Horizon and GRID? Did it resolve your issues?