VMware Horizon Community
Sabian0309
Enthusiast
Enthusiast
Jump to solution

UEM hanging/delaying randomly.

Running horizon 7.6 with UEM 9.5 and appvol 2.14.  Parent is windows 10 1607 LTSB, floating instant clones.  App data is not redirected, documents/favorites/desktop/etc.. are.

Occasionally i am seeing login delays when processing UEM.  On average UEM processes between 5-6 seconds.  However we have some users that it takes 25+ seconds occasionally with UEM not reporting why.

Example from logs:

2018-12-12 07:57:37.331 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Acrobat Reader.INI' added to DirectFlex cache

2018-12-12 07:57:56.269 [INFO ] Config file '\\jfv-vm-fs2\UEM Configuration\general\Applications\Adobe Acrobat.INI' added to DirectFlex cache

2018-12-12 08:39:13.444 [DEBUG] ImportRegistry::Import: Calling '"C:\Windows\REGEDIT.EXE" /S "C:\Users\test\AppData\Local\Temp\FLX6AB5.tmp"' (RPAL: l=0 (D/E), r=0)

2018-12-12 08:39:13.490 [DEBUG] Read 3 entries from profile archive (size: 151258; compressed: 31466; took 64 ms; largest file: 83810 bytes; slowest import took 0 ms)

2018-12-12 08:39:32.428 [DEBUG] Conditions: Condition set 'Microsoft Office 2013.xml' was previously evaluated to true

2018-12-12 14:02:29.110 [DEBUG] Read 25 entries from profile archive (size: 63987712; compressed: 7889959; took 427 ms; largest file: 26738688 bytes; slowest import took 154 ms)

2018-12-12 14:02:48.063 [DEBUG] Conditions: Check for endpoint name = false ('C81MK6V1' is not equal to 'LJ2359H2')

I can reproduce this by logging in over and over.  At times it happens, at times it does not although frequently enough it is easily reproducible.  It does not appear to be related to any specific pool or UEM setting.  It also isn't tied to any specific hosts, users, or type of client (thin or laptop), or version of horizon.

The UEM data is stored on a windows 2008 R2 server file share, which is a VM running off an all flash datastore on a Dell Compellent.  There are no indications the Compellent it is struggling to server the data, or on the file server.  This is as likely to happen during the morning login as it is when people leave for the day.

This appears to of been going on for a long while, this environment is just now getting healthy enough from the previous admin to where smaller issues like this can be tracked down.  But my googlefu is failing to find any possible solutions for this.

Any help would be appreciated.

Thank you,

Billy

Tags (2)
131 Replies
ijdemes
Expert
Expert
Jump to solution

Hmm good point. I will try that later on in my lab and report back.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Sabian0309
Enthusiast
Enthusiast
Jump to solution

I'm going to guess that is the default setting, as that is how my share is setup as well.  I've altered it to not allow any caching and will see if this has any effect. 

Thank you Smiley Happy.

ijdemes
Expert
Expert
Jump to solution

I changed both the UEM_Profiles and UEM_Config shares to "No files or programs from the shared folder are available offline", but nothing changed unfortunately. Still seeing the "OPLOCK BREAK IN PROGRESS" that causes a 18sec. delay.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Thanks, Ivan.

I'm currently running continuous logons and logoffs for two concurrent users with all 200+ Flex config files from the VMware Marketplace. The delay between logon and logoff differs for the two users, so the session "overlap" changes slightly for each run. So far I have not seen any delays after 100+ iterations.

Have done the same with both users concurrently performing a DirectFlex refresh in a tight loop; no delays seen yet after 400+ iterations.

My VMs are Windows 10 Version 1809, my file server is 2012R2, and I verified that SMB 3.02 is being used. I did not tweak any SMB settings, and left the caching configuration for both config and profile archive shares at the default "Only the files and programs that users specify are available offline".

ProcMon on the file server is showing lots of oplock-related entries, but no "OPLOCK BREAK IN PROGRESS". Just oplocks being requested, granted, not granted, closed, successful FSCTL_OPLOCK_BREAK_NOTIFY, etc. None of them seemingly causing a delay.

The repro attempt continues... 🙂

sjesse
Leadership
Leadership
Jump to solution

I've been watching this and I haven't seen this in my environment, but I just was wondering in general is islion storage used where its occurring? I found a post awhile back about the same thing happening . I saw one person mention they were using it so just wanted to make sure that was something being looked at.

Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi sjesse,

There have been some reports with Isilon, but both Sabian0309 and ijdemes are using standard Windows file servers.

The previous Isilon reports have never been analyzed in detail, as far as I'm aware – oplocks have frequently come up as a potential culprit, but apart from JohnTwilley's recent post in this thread, I haven't seen anything conclusive...

Reply
0 Kudos
JohnTwilley
Hot Shot
Hot Shot
Jump to solution

You can read the old thread from 2017 regarding my Isilon File locking issues:  https://communities.vmware.com/thread/566801

My company's Isilon slowness with UEM was resolved by disabling OpLocks on the shares.  We had EMC Support working the case, and they mentioned it was a common issue.  So much so, that the ticket was only open for like 1 day.

sjesse
Leadership
Leadership
Jump to solution

Ok , figured I'd check, I've been watching this as I've see delays like this in earlier windows versions and I been running procmon traces and haven't seen what you are looking into on my windows 2012 r2

ijdemes
Expert
Expert
Jump to solution

Hi UEMdev​,

Did you use the Win10 v1809 SAC release without OSOT? So just next-next-finish? I am running v1809 LTSC with OSOT. Will try without OSOT as well, just to be sure.

BTW, I also have VM running (v1809 LTSC with OSOT) in VMware workstation (no Horizon) and experience the same issues there. Also the traces are the same compared with the ones from VDI.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ijdemes,

Yes, Windows 10 Version 1809 Enterprise, next-next-finish installs. No OSOT, no Horizon.

Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

Allright, I will check with the same version/setup and see if I get a different result compared with my original version/setup.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

Just tested with v1809 Enterprise, no OSOT, no Horizon, but still the same result. Thinking of what to test/rule out next... :smileyconfused:


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ijdemes,

Thank you for your additional testing. I ran another 10000 DirectFlex refreshes on two VMs in parallel last night – no delays...

Thinking of what to test/rule out next...

Same here...

Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ijdemes,

Responding to an older message of yours:

I am runnning vSphere 6.7 U1, VMware Tools 10338, vmxnet3 driver version 1.8.3.1.

I'm on 6.5U2, VMware Tools 10.2.1, and vmxnet3 1.7.3.7. That is, where vmxnet3 is used, it's 1.7.3.7 – I just noticed that my file server and one of the VMs are using E1000e adapters, and hence Intel drivers. Can't imagine that affecting SMB behavior as that's a few OSI layers up, but I'll change the config and will upgrade VMware Tools. I'll stick with 6.5U2, though 🙂

Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Hi ijdemes,

No change in test results after switching to vmxnet3 on all NICs and updating to VMware Tools 10.3.10...

Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

Bummer! I will also test with Server 2012 as a file server. Just trying to rule out another component/version.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

Did a test with Server 2012 as a file server, but still the same result. :smileyconfused:


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

I created a (Win2016) file server that was not hosted on vSphere and VDI desktops that were not hosted on vSphere. Still experiencing the same result. I think I (sort of) ruled out the underlying hypervisor (vSphere) at the moment (OK, maybe I don't know what the look at next for this component Smiley Wink). Focussing on the client/Windows (VDI) part now.


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos
DEMdev
VMware Employee
VMware Employee
Jump to solution

Thanks, ijdemes! I've been switching between NoAD, running FlexEngine as a Group Policy client-side extension, and running FlexEngine from a logon script, without any change in behavior...

What Group Policy settings do you have configured? My setup is very empty, with computer policies for detailed status messages, disabling the logon script delay, always wait for network, and running logon script synchronously.

User policy contains logon and logoff scripts for UEM (where appropriate), and the relevant UEM configuration.

Where do you run the Management Console? Mine runs on the file server itself, so probably does not use SMB.

My VMs have 2 vCPUs; overall performance of my environment is not great (pretty old ESX box that runs quite a few VMs.)

Pretty much at a loss what to try next...

Reply
0 Kudos
ijdemes
Expert
Expert
Jump to solution

I've got the same policy settings in place. Nothing fancy Smiley Wink.

In the mean time I did a Wireshark trace on the file server resulting in a Create Response Error: STATUS_PENDING message

pastedImage_1.png

pastedImage_0.png

Eventually the connection is reset, which takes about 19 seconds that corresponds with the delay in the FlexEngine.log file.

Not seeing yet why this happens though....


\\ Ivan
---
Twitter: @ivandemes
Blog: https://www.ivandemes.com
Reply
0 Kudos