VMware Horizon Community
TheTer
Contributor
Contributor

Desktops hung up in console mode - View client users change from PCoIP display protocol to console display protocol on the fly

1] Updated to Update 1 on the ESX servers

2] Installed new vmware tools

3] Got the blank screen issue (KB article about upgrading to new agent deals with this)

4] New agent fixed the blank screen issue, but now causes desktops to convert from PCoIP protocol to the console protocol on the fly. Happening to about 1 out of 10-15 desktops. Sometimes the active session will show the Console as the protocol in View Admin (Active sessions tab). Other times, you have to manually look at every desktop in vCenter to see if the user is stuck on a console session.

5] This issue causes Desktops to be taken up on the back end, yet the user is disconnected. You get enough of these, and users have no more desktops to access

6] Work around is to monitor the desktops in vCenter and manually restart any that are stuck on the console

Reply
0 Kudos
35 Replies
w00005414
Enthusiast
Enthusiast

To update my last post, the name of the exe that was throwing an error was vmware-svi-ga.exe. I haven't seen that error lately though. It is listed in the Services console as the "VMware View Composer Guest Agent Server".

Reply
0 Kudos
BrianGraham
Contributor
Contributor

I was actually just about to call VMware about this tomorrow morning. I am seeing this too.. we are running View 4.5 and getting ready to upgrade to 4.6. The only way I have figured out to fix the issue is to force PCOIP in the pool settings. The only problem with this is that users are unable to use the view client from home with RDP.

The problem popped up out of the blue one day to an entire pool, (luckly a small pool).. but if this starts happening to my main pool I'm in big trouble.

Reply
0 Kudos
w00005414
Enthusiast
Enthusiast

We may have found the issue, we had these old (obsolete) entries in DNS, one pointed to a domain controller we no longer have and the other pointed to a private IP address for a second NIC we had on our initial domain controller which we removed. I removed all of these entries.

Forward Lookup Zones -> _msdcs.wheatonma.edu -> gc

(same as parent folder)        Host (A)    155.47.64.175        10/12/2006 1:00:00 PM
(same as parent folder)        Host (A)    10.1.1.4        11/3/2005 1:00:00 PM


Forward Lookup Zones -> wheatonma.edu -> domaindnszones

(same as parent folder)        Host (A)    155.47.64.175        10/12/2006 1:00:00 PM
(same as parent folder)        Host (A)    10.1.1.4        7/8/2005 2:00:00 PM

Forward Lookup Zones -> wheatonma.edu -> forestdnszones

(same as parent folder)        Host (A)    155.47.64.175        10/12/2006 1:00:00 PM
(same as parent folder)        Host (A)    10.1.1.4        7/8/2005 2:00:00 PM

What pointed me to these potential problems was this comment from the web page

http://eventid.net/display.asp?eventid=1030&eventno=1542&source=Userenv&phase=1

------------------------------------------------
I haven't given it quite enough time yet, but I may have resolved this by removing some invalid "(same as parent folder)" DNS A records in dnsmgmt under domain, domain\_msdcs\gc, domain\DomainDNSZones, and domain\ForestDNSZones. I tried accessing the GPO in the error by going to \\domain\sysvol\domain\policies\{GPO} and was unable to access  the first time although I could access \\server\sysvol\domain\policies for each of the servers. Running nslookup showed me the extra invalid record for the domain. No other solutions in here had worked yet. This event is also appearing with error 1058 from Userenv. Intermittently, gpupdate was working and logging event 1704 from SceCli showing it was working.
------------------------------------------------

Reply
0 Kudos
BrianGraham
Contributor
Contributor

I just took a look in our DNS.. We actually had some stale entries as well. After making changes in DNS, did you just have to reboot the VM's for the users to be removed from console mode?

Nice Find! Smiley Happy

Reply
0 Kudos
w00005414
Enthusiast
Enthusiast

Right now we are only seeing people connected over PCoIP and not in console mode (although I saw one earlier this morning). Just to make sure some of the virtual desktops in our pool are not caching any of the bad DNS entries we are going into all the desktops under the "pool" -> "Inventory" area and selecting "Remove" -> "Delete" 2 or 3 at a time so they get nuked and rebuilt. I guess we'll just keep an eye on them from here on out and see if it helps.

Reply
0 Kudos
w00005414
Enthusiast
Enthusiast

Unfortunately removing the bad DNS entries in AD didn't fix our issue, we recomposed the pool yesterday after making those changes and when we came in this morning there were 2 desktops in the "Configuration Error" folder within VMware View. One of them was using 100% CPU and at the console there was just a black screen. We have seen the CPU issue happen before and when it does there is either a black screen, it is stuck at the "Applying personal settings" screen after the person logs into Windows or it is as the screen where it is creating their Windows profile (setting up Outlook Express settings etc...).

The other desktop that was in the "Configuration Error" folder was at the Windows login screen and it had a users login name there. This is odd because we have the pool set to delete and rebuild a desktop after someone logs out, shuts down, restarts or disconnects from a desktop. I was able to log in with my Windows credentials just fine using the vSphere console. What makes no sense is from the "Monitoring" -> "Events" area inside of View Admin it shows the person's session expiring after 15 minutes (even though our group policy is set to log them out after 1 hour of inactivity). Here is a screen shot of the Event entries for that desktop (see attached rtf file).

What is wild is the user WC\w00097828 gets expired at 4:40 pm (15 minutes after they logged in) and one minute later the user WC\w00158793 gets allocated to that same desktop.... but about 40 minutes later it says the original user WC\w00097828 has logged off. When I got to the console of the desktop this morning it was at the Windows XP login screen and it had w00097828 in the username field.

There were some strange entries in the Application log within Event Viewer, here are some of them (and some of them were not warnings or critical errors, just informational). As some background info the last boot up from the quick prep happened at 3:32 pm

3:32:15 pm Event ID 1  - VMware View Composer Guest Agent service started 2.5.0 build-291081

3:32:16 pm Event ID 0 - The description for Event ID ( 0 ) in Source ( TPVCGateway ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: TP VC Gateway Service started..

3:32:16 pm Event ID 105 - (VMware Tools) - The service was started

3:32:19 pm EventID 0 - The description for Event ID ( 0 ) in Source ( TPAutoConnSvc ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: TPAutoConnect Service started.

3:32:19 pm - EventID 105 - (VMUpgradeHelper) The service was started

3:32:19 pm - EventID 256 The VmUpgradeHelper service has started

3:32:19 pm - EventID 258 (VMUpgradeHelper) - Restoring network configuration

3:32:21 pm - EventID 271 (VMUpgradeHelper) - Restored network configuration

3:32:22 pm - EventID 1704 (SceCli) Security policy in the Group policy objects has been applied successfully.

4:40:09 pm - EventID 102 (Vmware View) Pending portal logon timed out for user WC\w00097828, wssm may have failed to start correctly, or the user was not able to connect and log in. Pending count left=0

4:40:09 pm - EventID 102 (VMware View) Closed PCoIP connection doesn't match global value

4:40:10 pm - EventID 102 (VMware View) Partial logon for session 0 has expired

4:40:41 pm - EventID 102 (VMware View) Unable to locate route for response queue [[73877a64-dc4e-4643-ac26-c3f05262edde]MessageFrameWork_1.

4:43:21 pm - EventID 102 (VMware View) PCoIP connection request failed: 14

5:22:44 pm - EventID 1517 (Userenv) Windows saved user WC\w00097828 registry while an application or service was still using the registry during log off. The memory used by the user's registry has not been freed. The registry will be unloaded when it is no longer in use. This is often caused by services running as a user account, try configuring the services to run in either the LocalService or NetworkService account.

5:22:47 pm - EventID 4614 (EventSystem) The COM+ Event System detected an inconsistency in its internal state.  The assertion "GetLastError() == 122L" failed at line 162 of d:\comxp_sp3\com\com1x\src\events\shared\sectools.cpp.  Please contact Microsoft Product Support Services to report this error.

Reply
0 Kudos
w00005414
Enthusiast
Enthusiast

We may have figured out this issue in our environment, we too are using WYSE terminals but we are using P20's. We found out there were some issues with these devices and performed the latest firmware update (v3.3.0) and since then (almost a week) we haven't seen any of these weird issues.

We found there had been about 3 firmware updates for these devices in the last 2 months so my guess is it is kind of a moving target.

Hope that helps.

Reply
0 Kudos
BrianGraham
Contributor
Contributor

Funny you brought this up.. I was digging around yesterday, and am starting to come to the same conclusion.

We are using the Samsung all in one zero clients. I think something is happening on the zero client side that is pushing the session in to RDP.

I am curious.. however, is everyone in this thread also running teradici zero clients? Or does this cover a larger spectrum.

Reply
0 Kudos
morrisosu
Contributor
Contributor

I also want to say I read about this issue being addressed in View 4.6 but I could be wrong.

Shane

Reply
0 Kudos
BrianGraham
Contributor
Contributor

I think it's fixed in 4.6 because you can force PCOIP and disallow RDP.. then the clients connect correctly. There is away I have found to fix it in 4.5, but it seems to be somewhat disruptive to the users. You can edit the pool and disallow the user from being able to to choose RDP.. when the clients reconnect the problem is magically fixed. The problem is; users that use the view client remotely can't get in because the pool no longer supports RDP.

I had a very small pool that I fixed with this method..

I forced PCOIP, let everyone reconnect, then reenabled "allow user to choose between RDP & PCOIP". The main problem I saw was that when users tried to connect again in the morning they got a weird desktop timeout error, but after resetting their VM everything was fine.

I don't know the implications of doing this to a large pool.

Reply
0 Kudos
Chhavi
Contributor
Contributor

Even forcing PCOIP & disabling the option for user to change the protocol didn't help Smiley Sad

Whenever I use view connection for these VDIs, it gets stuck &  I have to log off the session from view admin site even to take console access fron vSphere.

For user RDP is the only option so far.

Anyone has any resolution to this so far or did anyone check if view agent 4.6 is resolving this issue?

Regards,

Chhavi

Reply
0 Kudos
bdelongpcsd
Contributor
Contributor

4.6 still has this issue, however it seems to occur far less often.

Reply
0 Kudos
Chhavi
Contributor
Contributor

View 4.6 didn't resolve the issue for me... anyone any more suggestions?

Reply
0 Kudos
mobcdi
Enthusiast
Enthusiast

I'm having the same problem with View 4.6 build-366101 but its compounded by the fact my vm's don't get assigned an IP address using DHCP when I provision them and get a limited connectivity message and an IP of 0.0.0.0 on Network connection details, can't ping the default gateway or DNS servers . I've changed the NIC on the vm and compared its network setup to another desktop vm running on the same host connected to the same port group but still can't get it to join and stay joined to the network long enough to complete the provisioning

Did anyone else encounter basic networking problems like unable to ping default gateway when they got this problem

Better still did anyone find a way to solve it

Additional Details:

The source vm for the pool is already a member of the network & AD, running on the same esxi host before I snap it and provision the pool from that.

Update 1:

I created a new pool using the newest snap but also configured the pool to only use PCoIP and not let the users choose. The provisioning completed successfully and test pool is available for allocation

Update 2:

Edited my original problem pool to only use PCoIP and not allow choice and kicked off the provisioning again, this time they were able to complete quickprep, take a snapshot, join the AD domain and become available so it looks like removing the display protocol choice from users of the pool worked as a temporary way for me because other pools composed with user selectable display protocol without a problem while pools with the option disabled failed to provision properly

Reply
0 Kudos
JayArr
Contributor
Contributor

We're struggling with this issue as well... it's brought a large thinclient/view deployment to a grinding halt while we work through supporting these unexpected issues that didn't appear in our PoC.

It seems to happen most often with our kiosks, that the thinclient uses kiosk authentication (in View 4.6) then presents the user with the Windows XP login. When the user logs off - the session drops off the thinclient and sticks in Console mode.

I'm looking into a scripting solution that clean up this mess on a regular basis... but it really shouldn't come down to that.

Reply
0 Kudos
morrisosu
Contributor
Contributor

Have you looked at upgrading to View 5? Since we upgraded to 5 we haven't seen this issue rear it's ugly head.

Keeping the fingers crossed!

-Shane

Reply
0 Kudos