VMware Horizon Community
DanVM99
Enthusiast
Enthusiast

Strange issue with AppStacks failing to attach

For a few weeks now (and prior to our recent upgrade to 2.16) our users have been experiencing random issues with missing AppStacks when they login to their Horizon 7.8 non persistent clone machines. We currently have around 120 users of the system and we tend to see around 1-3 people each day experiencing this issue.

There doesn't appear to be any pattern to this behavior and it is affecting different users with varying AppStack assignments and seemingly no commonality. The manager side log doesn't appear to reveal any connection issues but the client side svservice.log file does indicate the below issue at a fairly early stage during the user logon process.

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] HttpUserLogin: skipping notifying manager of user login (computer-startup failed)

The non persistent VM seems perfectly healthy.

The user who's log this belongs to didn't receive any of their three assigned AppStacks. Advising the user to logoff (which refreshes the VM) and log back on to a new VM resolved the issue and their AppStacks were then attached. We're seeing this behaviour with multiple users but asking them to log off and back on to resolve it is not a feasible long term solution.

Logged this with VMware Support a few weeks ago and I'm still yet to hear anything back from them after supplying multiple log file examples etc.

Does anyone have any ideas? Ray?? Are you out there!?

Reply
0 Kudos
10 Replies
Ray_handels
Virtuoso
Virtuoso

I'm always out there somewhere.. It just depends on where I am Smiley Happy Smiley Happy..

We did have these issues from time to time as well. It seems that when a lot of computers do a startup all at once Appvolumes agent cannot connect them correctly, this could be due to a lot of things, not just the Appvolumes manager. During startup Appvolumes agent connects to the manager as well to check and see if there are appstacks assigned to the machine account. We saw the same thing happening in the log files. Computer could not connect to manager so when a user logs in appstack attachment failes. Only option is to log off and on again I'm afraid

In regards to why this is happening. Do you happen to have more than 1 Appvolumes server? If not, I would suggest load balancing the managers. Also make sure that you have enough threads to process the bootstorm. 120 users is not that much to be honest.

Other 2 more heavy options are to restart the Appvolumes servers at night (make sure to have more than 1 manager and do not reboot at the same time) or to have some sort of PS script that checks the svservice.log file on this error and restart the machine if his error does show up.

jmatz135
Hot Shot
Hot Shot

This happens to us as well.  Probably about the same rate as you.  As for Ray's response in our environments (we have two) this should most definitely not be a congestion issue with lots of VMs trying to check in.  We have about 100 total users in one environment and have two app volumes managers load balanced.  If two servers can't handle 100 machines there is a major issue with the platform.  Regardless, it isn't actually even that many VMs as we use instant clones that spin up when necessary so we only have at most about 8 machines spinning up in a given 30 second time frame.  So basically 2 app volumes managers can't handle 8 machines starting up at in 30 seconds. 

Here is the interesting thing.  We set up one appstack to apply to the computers themselves and not at login.  The appstack that is applied to the computer will actually attach to the VM even though the agent itself says that it failed contacting manager at computer startup.  How this is even possible is anyone's guess.  We actually applied the appstacks to the computer instead of at login just so we might be able to see what machines didn't contact the manager.  This clearly didn't work though as the appstack still attaches at computer startup even though the agent failed to connect to the manager.

One other thing we did was set the MaxDelayTimeoutS registry variable to 600.  This seems to help a bit. This is because the agent at startup while it says it is checking every 5 seconds over 120 seconds to reconnect to the manager if you actually check the network traffic it doesn't actually do that.  It only checks once.  The timeout is then hit at 120 seconds and it says it fails.  Oddly if you set the timeout to above 305 seconds it will actually check a second time at 300 seconds plus or minus 5 seconds.  So at least you get the agent to do 2 whole checks to see if the manager is there and 2 is twice the opportunity for the agent to actually work.

Reply
0 Kudos
jmatz135
Hot Shot
Hot Shot

Here is an example of the default 120 timeout fail (even though the agent logs log it as a timeout of 300 seconds, not sure what is up with that)

[2018-01-30 16:46:09.331 UTC] [svservice:P880:T976] WinHttpSendRequestWithSSLCertValidation: WinHttpSendRequest succeeded.

[2018-01-30 16:46:09.331 UTC] [svservice:P880:T976] WinHttpReceiveResponse timed out waiting for response

[2018-01-30 16:46:09.331 UTC] [svservice:P880:T976] Retrying in 5 seconds (waited 300 seconds out of 120 max)

[2018-01-30 16:46:14.331 UTC] [svservice:P880:T976] Aborting HTTP request after exceeding time limit (120 seconds)

[2018-01-30 16:46:14.331 UTC] [svservice:P880:T976] HttpComputerStartupThread Pre-startup over HTTP failed: error 0

[2018-01-30 16:46:14.331 UTC] [svservice:P880:T976] HttpComputerStartupThread: failed (computer startup)

As you can see it says it is retrying every 5 seconds for 300 seconds, but it doesn't.  It doesn't try again at all unless you set the timeout to above 305 like I said at which point it does try again at 300 seconds.

Reply
0 Kudos
jmatz135
Hot Shot
Hot Shot

When it works after 300 seconds it does something like this:

[2018-04-09 19:58:08.501 UTC] [svservice:P912:T952] svdriver is running

[2018-04-09 19:58:08.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (0 of 300)

[2018-04-09 19:58:09.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (1 of 300)

[2018-04-09 19:58:10.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (2 of 300)

[2018-04-09 19:58:11.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (3 of 300)

[2018-04-09 19:58:12.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (4 of 300)

[2018-04-09 19:58:13.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (5 of 300)

[2018-04-09 19:58:14.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (6 of 300)

[2018-04-09 19:58:15.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (7 of 300)

[2018-04-09 19:58:16.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (8 of 300)

[2018-04-09 19:58:17.501 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (9 of 300)

.

.

.

[2018-04-09 20:02:08.498 UTC] [svservice:P912:T952] HttpUserLogin: waiting for computer-startup to complete (300 of 300)

[2018-04-09 20:02:08.623 UTC] [svservice:P912:T1012] WinHttpSendRequestWithSSLCertValidation: WinHttpSendRequest succeeded.

[2018-04-09 20:02:08.623 UTC] [svservice:P912:T1012] WinHttpReceiveResponse timed out waiting for response

[2018-04-09 20:02:08.623 UTC] [svservice:P912:T1012] Retrying in 5 seconds (waited 300 seconds out of 600 max)

[2018-04-09 20:02:13.623 UTC] [svservice:P912:T1012] [0] Connecting to appserver:443 using HTTPS (attempt 2)

[2018-04-09 20:02:13.623 UTC] [svservice:P912:T1012] WinHttpSendRequestWithSSLCertValidation: SSL certificate validation is disabled.

[2018-04-09 20:02:13.623 UTC] [svservice:P912:T1012] WinHttpSendRequestWithSSLCertValidation: WinHttpSetOption(WINHTTP_OPTION_SECURITY_FLAGS) succeeded.

[2018-04-09 20:02:13.841 UTC] [svservice:P912:T1012] WinHttpSendRequestWithSSLCertValidation: WinHttpSendRequest succeeded.

The funny thing is the log straight up says that the agent succeeds on attempt 2...after 300 seconds.

Reply
0 Kudos
jmatz135
Hot Shot
Hot Shot

The previous log may have been from when someone logged in when it was waiting that 300 seconds.  The following log is better:

[2019-06-15 00:31:15.496 UTC] [svservice:P1648:T2072] Computer UUID (Microsoft format): DDB11842-1241-5AEE-539E-E8A94DB300A0

[2019-06-15 00:31:15.496 UTC] [svservice:P1648:T2072] Computer UUID (VMware format): 4218b1dd-4112-ee5a-539e-e8a94db300a0

[2019-06-15 00:31:15.496 UTC] [svservice:P1648:T2072] Initializing HTTP session

[2019-06-15 00:31:15.496 UTC] [svservice:P1648:T2072] MaxDelayTimeOutS configured as: 600

[2019-06-15 00:31:15.496 UTC] [svservice:P1648:T2072] Initializing HTTP connection

[2019-06-15 00:31:15.496 UTC] [svservice:P1648:T2072] Choosing a manager out of 1 configured manager(s)

[2019-06-15 00:31:15.496 UTC] [svservice:P1648:T2072] IsDomainJoinedComputer: NetGetJoinInformation() success, domain name DEVNCSECU and type is 3

[2019-06-15 00:31:15.512 UTC] [svservice:P1648:T2072] Pre-startup URL: Machine Info stuff removed

[2019-06-15 00:31:15.512 UTC] [svservice:P1648:T2072] [0] Connecting to appvolumesserver:443 using HTTPS (attempt 1)

[2019-06-15 00:31:15.512 UTC] [svservice:P1648:T2072] WinHttpSendRequestWithSSLCertValidation: SSL certificate validation is disabled.

[2019-06-15 00:31:15.512 UTC] [svservice:P1648:T2072] WinHttpSendRequestWithSSLCertValidation: WinHttpSetOption(WINHTTP_OPTION_SECURITY_FLAGS) succeeded.

[2019-06-15 00:36:19.521 UTC] [svservice:P1648:T2072] WinHttpSendRequestWithSSLCertValidation: WinHttpSendRequest succeeded.

[2019-06-15 00:36:19.521 UTC] [svservice:P1648:T2072] WinHttpReceiveResponse timed out waiting for response

[2019-06-15 00:36:19.521 UTC] [svservice:P1648:T2072] Retrying in 5 seconds (waited 304 seconds out of 600 max)

[2019-06-15 00:36:24.537 UTC] [svservice:P1648:T2072] [0] Connecting to appvolumesserver:443 using HTTPS (attempt 2)

As you can see the agent waits 304 seconds until it makes its second attempt to connect to the manager at which point it succeeds.  Like I said giving it a whole second try is 100% better, but you have to wait 5 minutes for it to do that.  Unfortunately users can actually log into the VM in that time as Horizon view doesn't wait to make the machine available if app volumes is borked.

Reply
0 Kudos
cdickerson75
Enthusiast
Enthusiast

Have customer that was having a issue talking to App Volumes server.  Turns out they had a web filtering client (zscaler) that would block all http and https leaving the machine until authentication to backend was complete.  So might check if some sort of web filtering client isn't breaking the communication.

-Craig

Reply
0 Kudos
DanVM99
Enthusiast
Enthusiast

Thanks for your input but I think our issue differs from yours in so much as our VMs seem to be connecting to the manager rather than re-attempting connection like yours. They just don't get the AppStacks for some reason. I've included a lager chunck of our example fail log in case anyone notices any commonality with their own. It begins from the point a user first logs on to a VM. I've not included the prior section where the VM is originally provisioned and connects successfully to the AppVols Manager.

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1128] *** Started

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1128] Running from: C:\Program Files\CloudVolumes\Agent\svservice.exe (release build)

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1208] Checking agent version from "C:\Program Files\CloudVolumes\Agent\VERSION32.txt"

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1208] Build: "Release-Agent-Build-32-2_16_0" (letter U)

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1208] Build version: "2.16.0.23U"

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1208] CheckOfflineVHDMode: dwResetWritableDays was set to (0) days, turn off offline mode!

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1208] VHD offline mode was off!

[2019-06-03 07:41:10.961 UTC] [svservice:P1124:T1208] ServiceInit starting

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] Running on Windows 6.1 build 7601 (service pack 1.0)

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] OS is a workstation

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] Architecture: x86 (2 processors)

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] Running as: DOMAIN\WINDOWS7-099$ (NameSamCompatible)

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] Setting status to SERVICE_START_PENDING

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] MachineSID is "S-1-5-21-3842748544-3961479084-860730781"

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] Hypervisor configured as: vcenter

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] Successfully created RunKey event

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] CleanUpSystemDrive: Flag CleanSystemWritable was configured, auto cleanup the system writable directories...

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] Moving C:\SnapVolumesTemp to C:\SnapVolumesTemp.old

[2019-06-03 07:41:10.976 UTC] [svservice:P1124:T1208] CleanUpSystemDrive: deleted "C:\SnapVolumesTemp"

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] Processing of Scheduled Tasks is disabled

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] Using 8 worker threads to communicate with driver

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] InitializeWmi: called

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] UpdateInteractiveSessionCount: SessionId 0 Name(Services) State(4)

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] UpdateInteractiveSessionCount: SessionId 1 Name(Console) State(2)

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] UpdateInteractiveSessionCount: Found 0 active / 2 total user session(s)

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] Setting the following data in Run key for svservice : "C:\Program Files\CloudVolumes\Agent\svservice.exe"fromrunkey

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] HttpComputerStartup: called 0 logged in (computer startup)

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] svdriver is running

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1208] Becoming trusted installer

[2019-06-03 07:41:10.992 UTC] [svservice:P1124:T1272] HandleNGVC: NGVC not present, error 2

[2019-06-03 07:41:11.039 UTC] [svservice:P1124:T1208] Enter : BuildSecurityDescriptorForIPC

[2019-06-03 07:41:11.039 UTC] [svservice:P1124:T1208] Exit : BuildSecurityDescriptorForIPC

[2019-06-03 07:41:11.039 UTC] [svservice:P1124:T1208] IPC server initialize success!

[2019-06-03 07:41:11.039 UTC] [svservice:P1124:T1208] ServiceInit completed successfully

[2019-06-03 07:41:11.039 UTC] [svservice:P1124:T1208] ServiceMain now running

[2019-06-03 07:41:11.039 UTC] [svservice:P1124:T1208] Setting status to SERVICE_RUNNING

[2019-06-03 07:41:11.101 UTC] [svservice:P1124:T1196] OnCreateSession called (Session ID 1, Handle 001511E0, Params 015FF5EC, Context 00000000)

[2019-06-03 07:43:13.300 UTC] [svservice:P1124:T1272] GetComputerUuid: unable to locate UUID

[2019-06-03 07:43:13.300 UTC] [svservice:P1124:T1272] HttpComputerStartupThread: failed (computer startup)

[2019-06-03 07:44:22.573 UTC] [svservice:P1124:T1128] Received SERVICE_CONTROL_INTERROGATE

[2019-06-03 07:44:22.573 UTC] [svservice:P1124:T1128] Current status being sent to SCM is 4

[2019-06-03 07:48:00.362 UTC] [svservice:P1124:T4616] OnLogon called (Session ID 1, Handle 001511E0, Params 0157F2D4, Context 00000000)

[2019-06-03 07:48:00.362 UTC] [svservice:P1124:T4616] OnLogon: DOMAIN\USER (NameSamCompatible)

[2019-06-03 07:48:00.393 UTC] [svservice:P1124:T4616] OnLogon: CN=USER'S NAME,OU=Users,OU=DEPARTMENT,OU=Departments,DC=DOMAIN,DC=DOMAIN,DC=local (NameFullyQualifiedDN)

[2019-06-03 07:48:00.393 UTC] [svservice:P1124:T4616] OnLogon: USER@EXTDOMAIN (NameUserPrincipal)

[2019-06-03 07:48:00.393 UTC] [svservice:P1124:T4616] NotifyLoginStarted failed: error 0x80070002

[2019-06-03 07:48:00.393 UTC] [svservice:P1124:T4616] IsDomainJoinedComputer: NetGetJoinInformation() success, domain name DOMAIN and type is 3

[2019-06-03 07:48:00.393 UTC] [svservice:P1124:T4616] GetUserComputerInfo: user:"USER" computer:"windows7-099" userdomain:"DOMAIN" computerdomain: "DOMAIN"

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] GetUserProfileDirectoryW failed: error code 2

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] Logged in user is USER

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] OnLogon: skipping scripts because filtering is inactive

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] GetViewClientIdValue: Unable to open key Volatile Environment\1: error 2

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] GetViewClientIdValue: Pool ID value in key Software\VMware, Inc.\VMware VDM\SessionData\1 is Windows7

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] HttpUserLogin: called 0 logged in (user login)

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] svdriver is running

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] HttpUserLogin: skipping notifying manager of user login (computer-startup failed)

[2019-06-03 07:48:00.409 UTC] [svservice:P1124:T4616] OnLogon : succeeded

[2019-06-03 07:48:06.848 UTC] [svservice:P1124:T1128] Received SERVICE_CONTROL_INTERROGATE

[2019-06-03 07:48:06.848 UTC] [svservice:P1124:T1128] Current status being sent to SCM is 4

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T6016] OnStartShell called (Session ID 1, Handle 001511E0, Params 02B5F688, Context 00000000)

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T6016] OnStartShell: DOMAIN\USER (NameSamCompatible)

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T6016] NotifyLoginCompleted failed: error 0x80070002

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T6016] HttpFileShareRequest WinHttp over SSL is disabled. Log collection to file share not supported.

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T6016] handleFileShareStr: File share info received from manager is empty.

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T6016] OnStartShell: Error Failed to Start DCT Logger

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T2920] Waiting 0 second(s) for a new volume

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T2920] Activate filtering (called by DelayActivateWorker)

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T1256] MeasureTime::RecordCenter: Start recording  GUID:{58fffd46-2b8f-11e9-af91-806e6f6e6963} Type:0

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T1256] Preload volume event (startup): "\Device\HarddiskVolume2" GUID {58fffd46-2b8f-11e9-af91-806e6f6e6963} Hive  (1 logged in, SystemVolume 1, VolumeType 0)

[2019-06-03 07:49:14.360 UTC] [svservice:P1124:T1256] Sending reply to SVCMD_ID_NEW_VOLUME_PRE (Message 13, Size 0)

[2019-06-03 07:49:23.157 UTC] [svservice:P5988:T4296] *** Started

[2019-06-03 07:49:23.157 UTC] [svservice:P5988:T4296] Entered fromrunkey case

[2019-06-03 07:49:23.157 UTC] [svservice:P5988:T4296] Successfully opened runkey event.

[2019-06-03 07:51:16.428 UTC] [svservice:P1124:T1128] Received SERVICE_CONTROL_INTERROGATE

[2019-06-03 07:51:16.428 UTC] [svservice:P1124:T1128] Current status being sent to SCM is 4

[2019-06-03 08:19:27.858 UTC] [svservice:P1124:T6016] OnLock called (Session ID 1, Handle 001511E0, Params 02B5F688, Context 00000000)

[2019-06-03 08:49:27.910 UTC] [svservice:P1124:T6016] OnUnlock called (Session ID 1, Handle 001511E0, Params 02B5F688, Context 00000000)

[2019-06-03 10:14:39.333 UTC] [svservice:P1124:T6016] OnLock called (Session ID 1, Handle 001511E0, Params 02B5F688, Context 00000000)

[2019-06-03 10:17:40.131 UTC] [svservice:P1124:T6468] OnUnlock called (Session ID 1, Handle 001511E0, Params 015FF008, Context 00000000)

[2019-06-03 11:24:42.440 UTC] [svservice:P1124:T6468] OnLock called (Session ID 1, Handle 001511E0, Params 015FF008, Context 00000000)

[2019-06-03 11:44:14.937 UTC] [svservice:P1124:T6468] OnUnlock called (Session ID 1, Handle 001511E0, Params 015FF008, Context 00000000)

Reply
0 Kudos
jmatz135
Hot Shot
Hot Shot

Your problem looks to be this line:

GetComputerUuid: unable to locate UUID

Not sure why it can't get the UUID but I don't think the manager will accept it without that.

jmatz135
Hot Shot
Hot Shot

No it isn't a webfilter.  The AppVolumes agent just doesn't act in the way the developers thinks it does.

Reply
0 Kudos
mellicott
Enthusiast
Enthusiast

I've been seeing this issue recently, and have this line in my log which appears to be relevant to me: -

[2021-05-18 13:12:50.503 UTC] [svservice:P2220:T2664] HandleNGVC: NGVC not present, error 2

On a working session, the line looks like this: -

[2021-07-06 15:07:30.293 UTC] [svservice:P1552:T2332] HandleNGVC: NGVC is running in a forked VM
[2021-07-06 15:07:30.293 UTC] [svservice:P1552:T2332] HandleNGVC: This is not DaaS environment, no need to check Manager endpoints from NGVC 2

Anyone know what NGVC is, and what error 2 might be?

 

Horizon Client 5.4.3 Appvol Manager 2.18 Appvol Client 2.18.6 UEM 9.10
Reply
0 Kudos