Solved: Re: UEM hanging/delaying randomly. - Page 5

Sabian0309 · ‎12-13-2018

Running horizon 7.6 with UEM 9.5 and appvol 2.14. Parent is windows 10 1607 LTSB, floating instant clones. App data is not redirected, documents/favorites/desktop/etc.. are.

Occasionally i am seeing login delays when processing UEM. On average UEM processes between 5-6 seconds. However we have some users that it takes 25+ seconds occasionally with UEM not reporting why.

Example from logs:

2018-12-12 07:57:37.331 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Acrobat Reader.INI' added to DirectFlex cache

2018-12-12 07:57:56.269 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Adobe Acrobat.INI' added to DirectFlex cache

2018-12-12 08:39:13.444 [DEBUG] ImportRegistry::Import: Calling '"C:\Windows\REGEDIT.EXE" /S "C:\Users\test\AppData\Local\Temp\FLX6AB5.tmp"' (RPAL: l=0 (D/E), r=0)

2018-12-12 08:39:13.490 [DEBUG] Read 3 entries from profile archive (size: 151258; compressed: 31466; took 64 ms; largest file: 83810 bytes; slowest import took 0 ms)

2018-12-12 08:39:32.428 [DEBUG] Conditions: Condition set 'Microsoft Office 2013.xml' was previously evaluated to true

2018-12-12 14:02:29.110 [DEBUG] Read 25 entries from profile archive (size: 63987712; compressed: 7889959; took 427 ms; largest file: 26738688 bytes; slowest import took 154 ms)

2018-12-12 14:02:48.063 [DEBUG] Conditions: Check for endpoint name = false ('C81MK6V1' is not equal to 'LJ2359H2')

I can reproduce this by logging in over and over. At times it happens, at times it does not although frequently enough it is easily reproducible. It does not appear to be related to any specific pool or UEM setting. It also isn't tied to any specific hosts, users, or type of client (thin or laptop), or version of horizon.

The UEM data is stored on a windows 2008 R2 server file share, which is a VM running off an all flash datastore on a Dell Compellent. There are no indications the Compellent it is struggling to server the data, or on the file server. This is as likely to happen during the morning login as it is when people leave for the day.

This appears to of been going on for a long while, this environment is just now getting healthy enough from the previous admin to where smaller issues like this can be tracked down. But my googlefu is failing to find any possible solutions for this.

Any help would be appreciated.

Thank you,

Billy

ijdemes · ‎04-29-2019

My management console is running on a different machine (management server). But I do not experience any difficulties there.

\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com

DEMdev · ‎04-29-2019

I was just wondering whether the Management Console was maybe causing the locks, but I just did some experiments myself with running it on another machine, without causing delays for the agent.

ElroyS · ‎05-01-2019

Any news on the UEM locks ? Since yesterday i am collecting some logon data (based on UEM logging and VMware Logon Monitor and AppVolumes).

The results are shown in the following report, we have already disabled virus scanning on the UEM share without any positive result.

Logon times are randomly very slow and it does not matter if you have 1 or 10 appstacks mounted.

(Slow logon times are visible in the FlexEngine logging, it cannot be related to any specified action.)

Any help would be very appreciated.

Kind regards,

Elroy

ijdemes · ‎05-01-2019

Preliminary results are that it may have to be with SMB leasing in Win10 v1803/v1809. But UEMdev and I are still investigating.

\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com

JohnTwilley · ‎05-01-2019

I figured since @iJdemes mentioned SMB Leasing, I'd toss in another idea...

Windows 2012 R2 servers had a known issue with SMB that caused a lot of headaches when under load...and it had an easy fix.

Here's an exert from the very long thread referenced below.

Ref: Server 2012 R2 File Server Stops Responding to SMB Connections

MS have recommended two components to this fix, however, with the vm driver fix applied I was still experiencing the issue, it wasn’t until I made the change to the srv2.sys that the fix became permanent.

Vmware driver;MS believe that this driver (vsepflt)could have been conflicting with the srv2.sys driver. To disable follow this article. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203449...

Srv2.sys;In my opinion this is the actual change that fixed the issue. The srv2.sys driver controls SMB 2 traffic at the kernel level, in operating systems pre Server 2012 R2 the driver is set to auto start. Microsoft have changed the functionality for Server 2012 R2 to ‘start on demand’, this seems to not be starting gracefully when a request is made on SMB 2 or above.

To change srv2.sys to auto start, open cmd and type sc config srv2 start=auto reboot will be required after running this command.

When I talked to the lead network tech about this fix, referenced him to this article, how many people it was affecting etc, he advised that in all cases experiencing this issue changing the srv2.sys to auto start has worked 100% of the time. Why haven’t MS release an official patch I asked – because there have not been enough cases to warrant an official fix.

I’m very interested in if this fixes your issues, please mark this as an answer if I bring you success! Best of luck!

John

DEMdev · ‎05-01-2019

Hi ElroyS,

For those (extremely) slow UEM runs, do you see large "gaps" between subsequent log lines, i.e. big delays for single log entries?

ijdemes · ‎05-01-2019

I don't see this filter driver (vsepflt) on the Win2016 server system. Same goes for the srv2 service. Things have probably changed between 2012 and 2016.

But your suggestion is highly appreciated!

\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com

ElroyS · ‎05-02-2019

Hi UEMdev,

This is one of the logfiles containing a gap.

Our production environment uses HNAS storage for the UEM share, there is a big difference looking at the test results of our test environment which uses a windows 2012r2 share for UEM.

Logon times are much faster and stable.

Still testing and collecting logon data.

DEMdev · ‎05-02-2019

Hi ElroyS,

Thank you for the additional info. Yes, that 22-second delay is what I'm talking about. But some of the other lines in that log fragment (150-200ms for skipping a shortcut) also feel a bit slow to me; does that go faster on your 2012r2 share as well?

I take it you're the colleague that Ray_handels mentioned earlier in this thread ?

ElroyS · ‎05-02-2019

Hi UEMdev,

Correct, i am the colleague Ray was talking about.

Just reviewed the UEM logging in production times for skipping a shortcut are :

Production HNAS

0.152

0.182

0.144

0.272

Test - windows smb share

0.004

0.081

0.142

0.153

0.005

DEMdev · ‎05-02-2019

Hi ElroyS,

Correct, i am the colleague Ray was talking about.

Thanks, just wanted to make sure that this was not yet another report of the "delay issue"...

Looking at the different timings between HNAS and Windows, it looks like the latter is performing a bit faster, but the sample size is probably a bit too small for sweeping statements. Anyway, our investigation continues...

Ray_handels · ‎05-16-2019

Hey all,

Any news on this issue? We are still seeing the delays from time to time. I have raised an official SR so that we can add information into that as well.

SR is 19202347805. Is there anything that we can test?

DEMdev · ‎05-16-2019

Hi Ray,

No real progress, unfortunately. We've opened a case with Microsoft, but without any traction yet.

If you can reproduce the delay while ProcMon is running, please upload that ProcMon log to your SR and send me a DM.

JohnTwilley · ‎05-20-2019

So I start getting calls at 9:30pm about Nurses and Physicians having problems "logging in VDI".

This is the screenshot provided, and it's the same as the other 30 tickets...all high priority (of course).

If you assumed the UNC patch for the UEM FLEX config/Profiles was acting up...bingo. That's what I thought.

SAN Team member joins and says the Isilon is GREEN. All Good.

Open ticket with VMware Support.

VMware Support engineer see an entry in the local VM's EventLog saying the 'GPAGENT' process is taking a long time.

" You've got a Group Policy Issue...call Microsoft". (see where this is going)

Microsoft engaged. 30 hours later I'm not having a good day.

The Problem ended up being one or two of the Isilon nodes being saturated and causing a 0:01 second process to take 2:11 seconds.

Check but verify. I wrote a script to cycle thru the 38 Isilon nodes, performing a robocopy task from each. Nice performance statistics.

Shared Storage = BAD. Say it with me.

If anyone knows of a better way to host a clustered or better an OVA for a highly resilient NAS SMB share for UEM...please share with me. I do not want to move this to a Windows DFS environment...I know better than to subject myself and others that that. VMware is creating a new virtual SMB share for VSan later this summer...but that fresh out of beta.

Ray_handels · ‎05-21-2019

We also did some test runs on the storage (we are using a HNAS) and when looking at statistics it seems to all work well..

Strange thing is that we don;t see things that need to be done in just a few milliseconds take a lot longer but it seems as if it just randomly hangs and does nothing anymore. You can see a file action on a config file taking up to 30 seconds for no apparent reason (at least not to our knowledge).

Still looking into it..

DEMdev · ‎05-21-2019

Hi John,

Open ticket with VMware Support.
VMware Support engineer see an entry in the local VM's EventLog saying the 'GPAGENT' process is taking a long time.
" You've got a Group Policy Issue...call Microsoft". (see where this is going)

If you're running the UEM agent as a Group Policy client-side extension, the time UEM takes at logon is counted towards the overall Group Policy processing time. If you have a UEM log that shows that its logon processing took inordinately long, please use that to, umm, "educate" VMware support.

You mentioned previously that turning off oplocks on your Isilon nodes resolved the delay issue. Has anything changed since then (as that post is from a year and half ago)?

DEMdev · ‎05-21-2019

Hi Ray,

Are these delays occurring on config files (on the read-only configuration share), or on profile archives? Not that I have any advice for either scenario at this time... Just out of curiosity.

JohnTwilley · ‎05-21-2019

Ref
You mentioned previously that turning off oplocks on your Isilon nodes resolved the delay issue. Has anything changed since then (as that post is from a year and half ago)?

Nothing changed on the Isilon side, other than EMC updates.

I'd heard that "Leasing" as they call it now, is still an issue and disabling per share is not as effective as it once was...many just turn it of completely across the clusters.

Our change is just load. I'm competing against other business processes on a shared NAS...and VDI login processes are sensitive.

Looks like a shared environment was a bad idea afterall.

Ray_handels · ‎05-22-2019

Are these delays occurring on config files (on the read-only configuration share), or on profile archives? Not that I have any advice for either scenario at this time... Just out of curiosity.

For as far as I can see it is mostly the read only configuration share, that's the strange thing. These files normally are not a lot larger than just a few kb.

DEMdev · ‎05-23-2019

Thanks, Ray, that matches what we're seeing in most of the reported cases.

Quick (non-)update since my remark from a week ago: our case with Microsoft saw a little bit of activity over the weekend (nothing to report from that yet, though), but it's been quiet since then