VMware Horizon Community
alsmk2
Hot Shot
Hot Shot

AV 2.13.1: Errors when assigning app stacks to computer OU or AD object

We've recently updated to 2.13.1 to make use of the ability to mix computer and user object assignments, but we're consistently seeing two different errors that crop up randomly when View tries to refresh a machine after user logoff, and during recompose:

1. Invalid configuration for device 1 (or 2/ 3)

2. Cannot power on VM on Host in Datacenter. Failed to lock file.

When both of these occur, I can see that App Volumes has not released the app stacks assigned via computer assignment - they remain attached to the VM.

We'd normally run the viewdbchk script on a connection server to repair the desktops, but this does not work with any desktop displaying this issue. The only way to get around it is to manually delete the machine, and then run the viewdbchk script.

The only reference to this issue I have seen goes back to AV 2.6 where the solution was to ensure "Allow non-domain entities" was unchecked int he admin gui, Unfortunately, this has never been checked in this particular environment. Has anyone else come across this before with the latest version before I log a call with support?

Reply
0 Kudos
20 Replies
Gaurav_Baghla
VMware Employee
VMware Employee

If I understand correctly the major challenge that you have, AppStacks are not getting detached for Computer assigned machines in short time.

Ideally in a refresh cycle includes the following steps

1) View Enters the machine in  Maintenance state.

2) revert to old Snapshot >

3) Provisioned /Available and then obviously maintaining Domain connectivity/password.

So ideally the app stack should detach immediately after step 1 itself to move to revert stage . If we break this down further then a power off is involved so in power off AV Agent should tell the manager that it is going offline and remove the app Stack. This is where we can monitor the App Volumes Logs if the machine is going offline .

How large is your environment, how many AV manager, what load Balancer for LTM also interested to know the LB policy and more information on Storage that you use? I am assuming this happens approximately 1 out of 8-16 concurrent refresh operations and not always.

[Isolation step ] You should also be able to reproduce this by restarting 20 Machines from vSphere Just monitor the recent Tasks and see if detach works and worth a try to isolate. Then we can see the AV Agent logs as well.

Regards Gaurav Baghla Opinions are my own and not the views of my employer. https://twitter.com/garry_14
Reply
0 Kudos
Sravan_k
Expert
Expert

Hi,

I think I am familiar with this issue but let me check with you on symptoms

when user log-off from VDI, under view the machine is going to refresh or going to disconnect? [please check in view admin and let me know]

if this was the case, I will give you one solution/workaround that have to apply on parent image

Thank you,

Vkmr. 

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

The environment is as follows:

LB: F5 LTM (Least Connections (member) )

Machine Managers: 4

Storage: vSAN for desktops, VPLEX VNX LUNs for  App Stacks

For the View side of things, there are two PODs / Sites using CPA - this only happens on machines at one POD / Site. All pools are set to refresh desktops at logoff.

I've managed to pull the logs off of a desktop displaying the issue and can see the following errors, which I have not come across prior to this:

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ServiceStartShutdown: running computer shutdown scripts (before stopping services)

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] SetupDiGetClassDevs failed for GUID_DEVCLASS_SCSIADAPTER: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessScsiAdapterClass: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] SetupDiGetClassDevs failed for GUID_DEVCLASS_DISKDRIVE: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessDiskClass: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessDiskInterface: SetupDiGetClassDevs failed: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessDiskInterface: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] SetupDiGetClassDevs failed for GUID_DEVCLASS_VOLUME: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessVolumeClass: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessVolumeInterface: SetupDiGetClassDevs failed: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessVolumeInterface: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ServiceStartShutdown: running computer shutdown scripts

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] SetupDiGetClassDevs failed for GUID_DEVCLASS_SCSIADAPTER: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessScsiAdapterClass: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] SetupDiGetClassDevs failed for GUID_DEVCLASS_DISKDRIVE: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessDiskClass: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessDiskInterface: SetupDiGetClassDevs failed: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessDiskInterface: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] SetupDiGetClassDevs failed for GUID_DEVCLASS_VOLUME: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessVolumeClass: The data is invalid.

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessVolumeInterface: SetupDiGetClassDevs failed: error 13

[2017-12-25 07:03:05.836 UTC] [svservice:P1288:T1292] ProcessVolumeInterface: The data is invalid.

With that in mind and seeing as we have one site without an issue, I'm in the process of cloning the gold image from that site with a mind to reprovisioning the pools later on today. That doesn't tell me the root cause, but will hopefully stop it happening.

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

We're still struggling with this issue - it's causing major pain for us at the moment. Both errors can be traced backed to App Volumes.

  1. Invalid backing for device 2 - This device is always the first app stack/ vmdk that AV attempts to assign. It's set to independent mode which I know is necessary, but we see this error on VM's where App Volumes tries to assign the stacks before provisioning was fully completed. Once we see this, we know the desktop is no longer viable.

  1. File lock - Again, as above, AV appears to attempt assignment and then does not let go. Workaround at the moment is to either delete the desktop and run viewdbchk, or temporarily remove all computer assignments so AV releases the files.

  1. No error in View, but app volumes fails to attach the appstacks. The following error is displayed on the machine managers:

Skipping assignment lookup for Computer because computer has no VM (machine identifier).

We've noticed that if we re-provision a pool all at once, there number of these errors is much higher (as though App Volumes has crapped itself). If we remove the computer assignments completely, we're able to re-provision the pools 100% successfully and then re-add the assignments successfully. However, as the desktops get refreshed later in the day, we start to see the errors
again as app volumes goes mental.

The root cause of this seems fairly simple to me - App Volumes triggers too soon during any provisioning or refresh operations. Solution to stop this.... no idea.

Every single major or minor update to App Volumes seems to have silly bugs like this. We're pushing on with the computer assignments as the user experience when they get a good desktop is much better (apps already ready to go rather than waiting 30 seconds after logon). However, it's absolutely infuriating at the moment as we now have a firefight in the morning during peak logon times to spot the unusable desktops before a user logs on to them, and then at the end of the day to tidy up any desktops that are broken after an auto machine refresh.

Reply
0 Kudos
jhebertupo
Contributor
Contributor

Hello,

I have exactly the same problems, did you find a solution.
This becomes very problematic.

Thank you

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

Nope, unfortunately not. We're now on the latest version and still getting the same issue.

I've just logged a call with support about this, as in all honesty, we were able to manage it so I've been complacent with it. I will update this thread once support have had a chance to investigate.

Reply
0 Kudos
jhebertupo
Contributor
Contributor

Hello and thank you,

I also have an open ticket with Vmware, it's in progress.
I give you the tracks that Vmware gave me without success so far.

On your desktop, can you navigate to the following registry location using regedit: HKLM\SYSTEM\CurrentControlSet\services\svservice\Parameters

Can you then change the value of "RebootAfterDetach" to 1.

This registry key has no harm to your environment. This will ensure, every log off is a clean log off and makes the next login operation successful by flushing the registry and volume cache entries. This definitely a requirement from app volume for successful real-time application delivery for a user

I want to see will this ensure the appstacks are detach from the machine after logoff.

To try resolve this issue, let us disable the App Volumes Service on the parent virtual machine and create a script that starts the service back up again.

1) Open a command prompt as the administrator and run these commands:

- sc config "svservice" start= disabled

- net stop "App Volumes Service"

- ipconfig /release

2) Create a script or batch file as below to set the service to automatic and start the service.

sc config "svservice" start = auto

net start "App Volumes Service"

3) Copy the script to the parent virtual machine to a directory. For example: C:\scripts\script.bat

4) Shutdown the virtual machine and take a snapshot.

5) In View Administration portal, reference your post-synchronization script:

- Open up View Administration Portal

- Go to Catalog - Desktop Pools - Select your pool

- Click Edit

- Select Guest Customization Tab

- Enter the file path for script in post-synchronization script name: C:\scripts\script.bat

6) Recompose the pool.

These steps have been taken from a kB titled: "Reprovisioning fails with AppStacks set to computer based assignments", so I believe it is good we test this.

alsmk2
Hot Shot
Hot Shot

Fantastic - I'll get this into a test environment ASAP!

Reply
0 Kudos
jhebertupo
Contributor
Contributor

Keep me informed, please

I may be magnifying something

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

Just gone through the process of pushing this out and unfortunately it isn't an option for us as it assumes the pool is using QuickPrep, which we can't use because it doesn't change the local SID (Anti Virus doesn't like it).

There's no similar option for using a vsphere customisation specification - the only option there is to run it when a user logs on for the first time, which is already too late for computer assignments Smiley Sad

Reply
0 Kudos
jhebertupo
Contributor
Contributor

What are you using antivirus?

Your DCs in AppVolumes have what type of security? LDAP or LDAPS?

Enable Secure Communication Between App Volumes Manager and Active Directory

Do you have any errors at this level?

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

Mcafee Enterprise, and standard LDAP - there's no errors at that level that I have found.

I decided to test the suggestions vmware made to you that you shared previously. It didn't end well - the provisioning process went absolutely fine... not one error, but that didn't carry through to the desktops. Some desktops came up fine with the computer app stacks attached, others were absolutely destroyed - the service had been changed to automatic by the post provisioning script, but then could not start. View could no longer handle those particular desktops, which had to be manually deleted. That's not the end of it though... the desktops which did look fine, then absolutely bombed after test users logged off. Multiple errors in vsphere which put all remaining desktops in an unusable state - composer couldn't delete them without a helping hand. This behaviour was the same on two separate test pools in different PODs

I suspect the cause of the failure of the desktops that did initially look good to be the RebootOnDetach registry entry that was modified - purely because I could see View trying and failing to power down the machine at logoff, only for it to already be off line, I'm assuming because App Vol had already issued a reboot command. I'm probably wide of the mark, but who knows with App Volumes? I've not known a stable release since... well, never... bug after bug after bug.

I'm going to try again with another test pool, but so far it looks like another half-baked vmware suggestion for fixing this application. Smiley Sad

Reply
0 Kudos
jhebertupo
Contributor
Contributor

Hello,

Have you had a return from Vmware?
We had a long talk yesterday afternoon with the support.
They study the logs again and again ...

I still can not redial or refresh without error when Appstack are attached

:smileyangry:

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

Nothing as yet, though I've only just mailed them again this morning.

I'm going to have another go at the recommended workaround you posted, but I'm not holding my breath.

The frustrating thing is that as far as I am concerned, we shouldn't have to provision a new snapshot where we manually disable the service, and then rely on a batch file and quickprep to stop App Volumes going mental during provisioning. This is the sort of basic issue that should have been quickly picked up on by VMware test & dev, and some simple logic coded into the app volumes agents that make it wait for the provisioning / refresh process to fully complete.

I'm tired of feeling like a beta tester for this product, and so are my clients who have to suffer endless pain with every.single.release of App Volumes. I want a stable release that I can stick with for a year without bugs that creep in over time (fragmented DB, conflicted VMs, having to recreate app stacks with each release... I'm looking at you), and where VMware's go to "fix" isn't just to upgrade to the latest version... which ALWAYS introduces new bugs. It's a merry-go-round of pain.

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

I've changed track with the testing this afternoon and moved the test pools over to Instant Clones rather than LC's - touchwood, it's behaving at the moment. No errors during VM creation, App Volumes behaving itself. Time will tell with a few days of testing though.

I've not implemented any silly workarounds involving disabling the AV service / including batch files.

Reply
0 Kudos
jhebertupo
Contributor
Contributor

The concerns I encountered with Instand Clone is the application of the UEM GPO, vmware does not recommend using GPO loopback processing. Did you manage to apply this GPO? and how?

Reply
0 Kudos
alsmk2
Hot Shot
Hot Shot

I've not made any changes to the UEM configuration and it does seem to be applying fine on the Instant Clones (we also use Loopback processing: Replace within that policy). UEM has been solid as a rock for quite some time now.

The proof is in the pudding - shall do some more testing and see if anything comes to light over the next few days.

UEM v9.2.

View 7.4

AV 2.13.2

Reply
0 Kudos
jhebertupo
Contributor
Contributor

I do not understand my UEM GPO with loopback Proccessing does not work at home, can you help me on this?

What do you have in the scope of this GPO?

Reply
0 Kudos
sjesse
Leadership
Leadership

Careful with instant clones and computer OU assignments, I have yet to open an SR, but if the cp-template file has an appstack attached, all instant clones will fail to provision. Its because the snapshot relies on a appstack vmdk which is missing. If you specifically limit the attachments to the naming prefix used you are fine, its because the cp-template has an ad object as well but its it113412341234 or something similar.

Reply
0 Kudos