VMware Horizon Community
Sabian0309
Enthusiast
Enthusiast
Jump to solution

UEM hanging/delaying randomly.

Running horizon 7.6 with UEM 9.5 and appvol 2.14.  Parent is windows 10 1607 LTSB, floating instant clones.  App data is not redirected, documents/favorites/desktop/etc.. are.

Occasionally i am seeing login delays when processing UEM.  On average UEM processes between 5-6 seconds.  However we have some users that it takes 25+ seconds occasionally with UEM not reporting why.

Example from logs:

2018-12-12 07:57:37.331 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Acrobat Reader.INI' added to DirectFlex cache

2018-12-12 07:57:56.269 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Adobe Acrobat.INI' added to DirectFlex cache

2018-12-12 08:39:13.444 [DEBUG] ImportRegistry::Import: Calling '"C:\Windows\REGEDIT.EXE" /S "C:\Users\test\AppData\Local\Temp\FLX6AB5.tmp"' (RPAL: l=0 (D/E), r=0)

2018-12-12 08:39:13.490 [DEBUG] Read 3 entries from profile archive (size: 151258; compressed: 31466; took 64 ms; largest file: 83810 bytes; slowest import took 0 ms)

2018-12-12 08:39:32.428 [DEBUG] Conditions: Condition set 'Microsoft Office 2013.xml' was previously evaluated to true

2018-12-12 14:02:29.110 [DEBUG] Read 25 entries from profile archive (size: 63987712; compressed: 7889959; took 427 ms; largest file: 26738688 bytes; slowest import took 154 ms)

2018-12-12 14:02:48.063 [DEBUG] Conditions: Check for endpoint name = false ('C81MK6V1' is not equal to 'LJ2359H2')

I can reproduce this by logging in over and over.  At times it happens, at times it does not although frequently enough it is easily reproducible.  It does not appear to be related to any specific pool or UEM setting.  It also isn't tied to any specific hosts, users, or type of client (thin or laptop), or version of horizon.

The UEM data is stored on a windows 2008 R2 server file share, which is a VM running off an all flash datastore on a Dell Compellent.  There are no indications the Compellent it is struggling to server the data, or on the file server.  This is as likely to happen during the morning login as it is when people leave for the day.

This appears to of been going on for a long while, this environment is just now getting healthy enough from the previous admin to where smaller issues like this can be tracked down.  But my googlefu is failing to find any possible solutions for this.

Any help would be appreciated.

Thank you,

Billy

Tags (2)
131 Replies
ijdemes
Expert
Expert
Jump to solution

My management console is running on a different machine (management server). But I do not experience any difficulties there.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

I was just wondering whether the Management Console was maybe causing the locks, but I just did some experiments myself with running it on another machine, without causing delays for the agent.

0 Kudos
ElroyS
Contributor
Contributor
Jump to solution

Any news on the UEM locks ? Since yesterday i am collecting some logon data (based on UEM logging and VMware Logon Monitor and AppVolumes).

The results are shown in the following report, we have already disabled virus scanning on the UEM share without any positive result.

Logon times are randomly very slow and it does not matter if you have 1 or 10 appstacks mounted.

(Slow logon times are visible in the FlexEngine logging, it cannot be related to any specified action.)

Logon_times2.png

Logon_times.png

Any help would be very appreciated.

Kind regards,

Elroy

0 Kudos
ijdemes
Expert
Expert
Jump to solution

Preliminary results are that it may have to be with SMB leasing in Win10 v1803/v1809. But UEMdev​ and I are still investigating.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
0 Kudos
JohnTwilley
Hot Shot
Hot Shot
Jump to solution

I figured since @iJdemes mentioned SMB Leasing, I'd toss in another idea...

Windows 2012 R2 servers had a known issue with SMB that caused a lot of headaches when under load...and it had an easy fix.

Here's an exert from the very long thread referenced below. 

Ref: Server 2012 R2 File Server Stops Responding to SMB Connections

MS have recommended two components to this fix, however, with the vm driver fix applied I was still experiencing the issue, it wasn’t until I made the change to the srv2.sys that the fix became permanent.

Vmware driver;MS believe that this driver (vsepflt)could have been conflicting with the srv2.sys driver. To disable follow this article. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203449...

Srv2.sys;In my opinion this is the actual change that fixed the issue. The srv2.sys driver controls SMB 2 traffic at the kernel level, in operating systems pre Server 2012 R2 the driver is set to auto start. Microsoft have changed the functionality for Server 2012 R2 to ‘start on demand’, this seems to not be starting gracefully when a request is made on SMB 2 or above.

To change srv2.sys to auto start, open cmd and type sc config srv2 start=auto reboot will be required after running this command.

When I talked to the lead network tech about this fix, referenced him to this article, how many people it was affecting etc, he advised that in all cases experiencing this issue changing the srv2.sys to auto start has worked 100% of the time. Why haven’t MS release an official patch I asked – because there have not been enough cases to warrant an official fix.

I’m very interested in if this fixes your issues, please mark this as an answer if I bring you success! Best of luck!

John

0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ElroyS,

For those (extremely) slow UEM runs, do you see large "gaps" between subsequent log lines, i.e. big delays for single log entries?

0 Kudos
ijdemes
Expert
Expert
Jump to solution

I don't see this filter driver (vsepflt) on the Win2016 server system. Same goes for the srv2 service. Things have probably changed between 2012 and 2016.

But your suggestion is highly appreciated!


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
0 Kudos
ElroyS
Contributor
Contributor
Jump to solution

Hi UEMdev,

This is one of the logfiles containing a gap.

example_gap.png

Our production environment uses HNAS storage for the UEM share, there is a big difference looking at the test results of our test environment which uses a windows 2012r2 share for UEM.

Logon times are much faster and stable.

result_windows_share.png

Still testing and collecting logon data.

0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ElroyS,

Thank you for the additional info. Yes, that 22-second delay is what I'm talking about. But some of the other lines in that log fragment (150-200ms for skipping a shortcut) also feel a bit slow to me; does that go faster on your 2012r2 share as well?

I take it you're the colleague that Ray_handelsmentioned earlier in this thread ?

0 Kudos
ElroyS
Contributor
Contributor
Jump to solution

Hi UEMdev,

Correct, i am the colleague Ray was talking about.

Just reviewed the UEM logging in production times for skipping a shortcut are :

Production HNAS

0.152

0.182

0.144

0.272

Test - windows smb share

0.004

0.081

0.142

0.153

0.005

0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ElroyS,

Correct, i am the colleague Ray was talking about.

Thanks, just wanted to make sure that this was not yet another report of the "delay issue"...

Looking at the different timings between HNAS and Windows, it looks like the latter is performing a bit faster, but the sample size is probably a bit too small for sweeping statements. Anyway, our investigation continues...

0 Kudos
Ray_handels
Virtuoso
Virtuoso
Jump to solution

Hey all,

Any news on this issue? We are still seeing the delays from time to time. I have raised an official SR so that we can add information into that as well.

SR is 19202347805. Is there anything that we can test?

0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi Ray,

No real progress, unfortunately. We've opened a case with Microsoft, but without any traction yet.

If you can reproduce the delay while ProcMon is running, please upload that ProcMon log to your SR and send me a DM.

0 Kudos
JohnTwilley
Hot Shot
Hot Shot
Jump to solution

So I start getting calls at 9:30pm about Nurses and Physicians having problems "logging in VDI".  

This is the screenshot provided, and it's the same as the other 30 tickets...all high priority (of course).

Flex Hang.jpg

If you assumed the UNC patch for the UEM FLEX config/Profiles was acting up...bingo.  That's what I thought.

SAN Team member joins and says the Isilon is GREEN.  All Good.

Open ticket with VMware Support.

VMware Support engineer see an entry in the local VM's EventLog saying the 'GPAGENT' process is taking a long time.

" You've got a Group Policy Issue...call Microsoft".  (see where this is going)

Microsoft engaged.   30 hours later I'm not having a good day.

The Problem ended up being one or two of the Isilon nodes being saturated and causing a 0:01 second process to take 2:11 seconds.

Check but verify.  I wrote a script to cycle thru the 38 Isilon nodes, performing a robocopy task from each.  Nice performance statistics.

Shared Storage = BAD.    Say it with me.

If anyone knows of a better way to host a clustered or better an OVA for a highly resilient NAS SMB share for UEM...please share with me.  I do not want to move this to a Windows DFS environment...I know better than to subject myself and others that that.  VMware is creating a new virtual SMB share for VSan later this summer...but that fresh out of beta.

0 Kudos
Ray_handels
Virtuoso
Virtuoso
Jump to solution

We also did some test runs on the storage (we are using a HNAS) and when looking at statistics it seems to all work well..

Strange thing is that we don;t see things that need to be done in just a few milliseconds take a lot longer but it seems as if it just randomly hangs and does nothing anymore. You can see a file action on a config file taking up to 30 seconds for no apparent reason (at least not to our knowledge).

Still looking into it..

0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi John,

Open ticket with VMware Support.

VMware Support engineer see an entry in the local VM's EventLog saying the 'GPAGENT' process is taking a long time.

" You've got a Group Policy Issue...call Microsoft".  (see where this is going)

If you're running the UEM agent as a Group Policy client-side extension, the time UEM takes at logon is counted towards the overall Group Policy processing time. If you have a UEM log that shows that its logon processing took inordinately long, please use that to, umm, "educate" VMware support.

You mentioned previously that turning off oplocks on your Isilon nodes resolved the delay issue. Has anything changed since then (as that post is from a year and half ago)?

0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi Ray,

Are these delays occurring on config files (on the read-only configuration share), or on profile archives? Not that I have any advice for either scenario at this time... Just out of curiosity.

0 Kudos
JohnTwilley
Hot Shot
Hot Shot
Jump to solution

Ref
You mentioned previously that turning off oplocks on your Isilon nodes resolved the delay issue. Has anything changed since then (as that post is from a year and half ago)?

Nothing changed on the Isilon side, other than EMC updates.

I'd heard that "Leasing" as they call it now, is still an issue and disabling per share is not as effective as it once was...many just turn it of completely across the clusters.

Our change is just load.  I'm competing against other business processes on a shared NAS...and VDI login processes are sensitive.

Looks like a shared environment was a bad idea afterall.

Ray_handels
Virtuoso
Virtuoso
Jump to solution

Are these delays occurring on config files (on the read-only configuration share), or on profile archives? Not that I have any advice for either scenario at this time... Just out of curiosity.

For as far as I can see it is mostly the read only configuration share, that's the strange thing. These files normally are not a lot larger than just a few kb.

0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Thanks, Ray, that matches what we're seeing in most of the reported cases.

Quick (non-)update since my remark from a week ago: our case with Microsoft saw a little bit of activity over the weekend (nothing to report from that yet, though), but it's been quiet since then Smiley Sad

0 Kudos