VMware Horizon Community
Sabian0309
Enthusiast
Enthusiast
Jump to solution

UEM hanging/delaying randomly.

Running horizon 7.6 with UEM 9.5 and appvol 2.14.  Parent is windows 10 1607 LTSB, floating instant clones.  App data is not redirected, documents/favorites/desktop/etc.. are.

Occasionally i am seeing login delays when processing UEM.  On average UEM processes between 5-6 seconds.  However we have some users that it takes 25+ seconds occasionally with UEM not reporting why.

Example from logs:

2018-12-12 07:57:37.331 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Acrobat Reader.INI' added to DirectFlex cache

2018-12-12 07:57:56.269 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Adobe Acrobat.INI' added to DirectFlex cache

2018-12-12 08:39:13.444 [DEBUG] ImportRegistry::Import: Calling '"C:\Windows\REGEDIT.EXE" /S "C:\Users\test\AppData\Local\Temp\FLX6AB5.tmp"' (RPAL: l=0 (D/E), r=0)

2018-12-12 08:39:13.490 [DEBUG] Read 3 entries from profile archive (size: 151258; compressed: 31466; took 64 ms; largest file: 83810 bytes; slowest import took 0 ms)

2018-12-12 08:39:32.428 [DEBUG] Conditions: Condition set 'Microsoft Office 2013.xml' was previously evaluated to true

2018-12-12 14:02:29.110 [DEBUG] Read 25 entries from profile archive (size: 63987712; compressed: 7889959; took 427 ms; largest file: 26738688 bytes; slowest import took 154 ms)

2018-12-12 14:02:48.063 [DEBUG] Conditions: Check for endpoint name = false ('C81MK6V1' is not equal to 'LJ2359H2')

I can reproduce this by logging in over and over.  At times it happens, at times it does not although frequently enough it is easily reproducible.  It does not appear to be related to any specific pool or UEM setting.  It also isn't tied to any specific hosts, users, or type of client (thin or laptop), or version of horizon.

The UEM data is stored on a windows 2008 R2 server file share, which is a VM running off an all flash datastore on a Dell Compellent.  There are no indications the Compellent it is struggling to server the data, or on the file server.  This is as likely to happen during the morning login as it is when people leave for the day.

This appears to of been going on for a long while, this environment is just now getting healthy enough from the previous admin to where smaller issues like this can be tracked down.  But my googlefu is failing to find any possible solutions for this.

Any help would be appreciated.

Thank you,

Billy

Tags (2)
131 Replies
ijdemes
Expert
Expert
Jump to solution

I created a procmon trace that looks similar to the one provided earlier.

pastedImage_0.png

<CLICK TO SEE FULL PICTURE>

As I explained earlier, after doing another FlexEngine.exe -r within the same session it has no delay up to now. So only the first one during logon.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Thanks, Ivan.

Is your file server a Windows VM? Would it be possible to run ProcMon there as well, to see if anything interesting pops up?

Reply
0 Kudos
Ray_handels
Virtuoso
Virtuoso
Jump to solution

I tried to reproduce the issue but see the same results as Ivan. Almost 40 times I did the flexengine.exe -r and I see times differing between 24 and 29 seconds so that seems to be alright, no delay there.

We are looking into it with the storage manager hoping something can be seen there. Did anyone try this with a physical machine??

Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Thanks, Ray.

Does the delay occur consistently at logon? Would you be able to collect a ProcMon trace of a delayed logon?

As for physical machines: the cases that we've looked at in detail have all been in Horizon-based virtual setups.

Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

I ran ProcMon on the file server VM (Windows Server 2016), but nothing spectacular that pops up. However, I did make a WPR trace of which I sent you the link by PM. I did have a look at it with WPA, but as I don't have much experience with it it's hard to find what you think you're looking for Smiley Wink. Maybe you can find something interesting.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

BTW, the WPR trace is from the VDI (FlexEngine) side.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Thanks, Ivan! Downloading it as we speak – I'll have something to do during the Easter long weekend 🙂

Ray_handels
Virtuoso
Virtuoso
Jump to solution

Does the delay occur consistently at logon? Would you be able to collect a ProcMon trace of a delayed logon?

No unfortunately it does not. Also, my colleague who does most with this has a week off. We are trying at creating a script to check all log files to see who has the gaps and at what time does this happen. Hoping to get some more info using that but it won't be until next week before we get that going.

And getting a procmon trace of a delayed logon is quite hard. Because we cannot reproduce it every time we should need to get that procmon running at every logon which stalls the logon and we are using linked clones so it ain't that easy to let procmon run at logon.

Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi Ray_handels,

Well, it's good that your users aren't consistently confronted with slow logons, I guess 🙂 I fully understand the difficulty of catching random delays "in the act" – hope your log analysis can find some pattern after the fact.

In the meantime, I'm keeping busy with ijdemes's WPR trace. Looking very similar to the one that Sabian0309 provided.

Reply
0 Kudos
Sabian0309
Enthusiast
Enthusiast
Jump to solution

I'm still lurking on this Smiley Happy.  So i just sent this on the case i am working on with this issue:

I’m open to suggestions, it does appear to be oplock related.

To recap, things that were done to the file server:

Set-SmbServerConfiguration -EnableLeasing $false

Set-SmbServerConfiguration -EnableOplocks $false

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters

Name : OplockBreakWait

Type : REG_DWORD

Value : 10

However, this I see this on the file server when an issue happens:

pastedImage_8.png

Then 19 seconds later:

pastedImage_1.png

This matches the delay in the logs

2019-04-23 09:37:06.491 [DEBUG] Read 13 entries from profile archive (size: 66625536; compressed: 8536205; took 493 ms; largest file: 31457280 bytes; slowest import took 185 ms)

2019-04-23 09:37:25.428 [INFO ] Importing profile archive 'Internet Explorer.zip' (\\jfv-vm-fs2\UEM Profiles\XXXX\archives\Windows Settings\Internet Explorer.zip)

Now a normal session where this delay is not experienced(no oplock break request):

pastedImage_0.png

Sabian0309
Enthusiast
Enthusiast
Jump to solution

UEMdev,

Does UEM do anything funny with exports where it would lock the configuration files as its processing?  As seen in the previous post, the ini file has an oplock where a break is requested.  I've captured earlier in the process, and see a fsctl_request_batch_oplock on the file that causes the delay from another session, guessing from an export of another user as shortly afterwards i see a similarly labeled .tmp then .zip being written:

pastedImage_0.png

pastedImage_1.png

Reading up on the batch uplock, it appears this is requested from the application

Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi Sabian0309,

I was hoping you were still lurking on this thread, which is why I @'ed you in my previous post 🙂

Do you see any change in those 18.9s delays if you tweak that OplockBreakWait registry value?

That OPLOCK BREAK in in the file server's ProcMon trace is interesting, as is the subsequent FSCTL_REQUEST_BATCH_OPLOCK (as UEM sure does not send that...) ijdemes, do you see that in your lab, too?

Reply
0 Kudos
Sabian0309
Enthusiast
Enthusiast
Jump to solution

This last weekend i had an outage window where i tweaked the oplockbreakwait setting (and attempted to disable oplocking via powershell), however it did not alter the behavior.  I know with SMB2+ many of the configurations within lanmanserver\parameters are no longer valid, so that may be the case.

I'll still poke around some more and see what i can figure out Smiley Happy.

DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi Sabian0309,

It looks like I missed your "batch oplock request during previous export" post yesterday... Fortunately, you'd also reported that to VMware support, who notified me 🙂

UEM does not request any oplocks, but I'll try to reproduce this behavior.

Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

Hi UEMdev​,

I will rerun the test with a procmon on file server side and check for oplocks.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
JohnTwilley
Hot Shot
Hot Shot
Jump to solution

I had the Storage Team disable OpLocks on our Isilon SMB shares for UEM.  (Both User Profiles & Configuration)

So far, I really think it has helped!  It is quick and easy from the Isilon management console.  I'd share the commands...but I don't have them.

You can give them this from EMC Support:

              For SMB related inquiries, please provide the following commands output:

isi smb shares view YourShareNameHere | grep –i oplock

isi_gconfig registry.Services.lwio.Parameters.Drivers.srv.smb2.EnableLeases

Side Note:

You can test with the following Client registry setting to disable Opportunistic Locks from a client PC.

Turn Off Client requests for OpLocks for SMB

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\mrxsmb\Parameters]

"OplocksDisabled"=dword:00000001

This is not what I’d like to use going forward, as it will turn off OpLocks for All SMB shares, not just the Isilon ones.

ijdemes
Expert
Expert
Jump to solution

Hi Sabian0309​,

I see the same happening in my trace on the file server (didn't see it the first time, as I forgot to enable advanced logging :smileyblush:).

pastedImage_0.png

pastedImage_1.png

<click pictures for the full lines>

I did try some settings with Oplocks on the file server and client (VDI) side. But without any positive result so far.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

Hi JohnTwilley​,

Thanks for your suggestion. I'm not sure if I checked the client setting you mentioned, as I tried a lot of options. But I will double check again and respond here about the result.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

I changed the settings you mentioned. Both the OplocksDisabled reg key and settings on the Windows file server, like OplocksDisabled and Leases. But both of these settings result in the same trace output. I still see Oplock messages. :smileyconfused:


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ijdemes and Sabian0309,

I'm setting up a new environment to run some more tests, and I just noticed the following:
pastedImage_3.png

I'm not an IT Pro but just a lowly developer, so I don't know whether it's relevant, but I was wondering how your shares are set up w.r.t. these caching/offline settings.

Reply
0 Kudos