epa80
Hot Shot
Hot Shot

GPOs Not Running Upon VM Creation/Refreshing

Jump to solution

This initially started as hread in UEM forum. I thought though, after some work, it might be a better fit here.

Just this morning I logged into our pool we believe is having an intermittent issue runnings GPOs on startup. Some further testing showed another pool having the issue. The commonality of these 2 pools is they exist in the same VDI cluster of hosts. So now we're at that level, cluster. Another pool we tested in a separate cluster, we couldn't get the issue to happen. This testing really probably needs to be vetted out more, but, it's what we have now.

So, on the VM it happened on yesterday, below is a screenshot of the error in event viewer, and the text from the details tab is below that. You'll see that the error happened at 8:28AM. I didn't login through View onto the VM until about 8:55AM. Our working theory right now goes something like this:

  • VMs in this cluster get refreshed.
  • Some of the VMs for some reason are not binding to the network fast enough, and thus their GPOs do not run initially.
  • If a user logs in before the GPO update interval occurs, the user will have the GPO not running issue. Noticed mostly by UEM because we deliver a ton of icons with them.

Later in the morning I did this test in our problem pool:

  • I refreshed the 5 VMs in the pool.
  • VMs 2 and 5 had the issue. 1, 3, 4 did not.
  • I noted the hosts 2 and 5 were on. They were indeed unique.
  • I expanded the pool by 15 VMs.
  • VMs 6 and 7 were on the same hosts as VMs 2 and 5 that had the issue. Unfortunately, they DID NOT experience the problem. Thereby eliminating a host problem.
  • I refreshed 2 and 5 and immedately tested them. They were now ok.

So we're at a bit of a loss. There seems to be nothing consistent. Any input is appreciated.

135875_135875.pngsnip_20170913090015_edited.png

+ System

  - Provider

   [ Name]  Microsoft-Windows-GroupPolicy

   [ Guid]  {AEA1B4FA-97D1-45F2-A64C-4D69FFFD92C9}

   EventID 1096

   Version 0

   Level 2

   Task 0

   Opcode 1

   Keywords 0x8000000000000000

  - TimeCreated

   [ SystemTime]  2017-09-13T12:28:34.256250000Z

   EventRecordID 45808

  - Correlation

   [ ActivityID]  {8ECF79A7-CAD3-4787-BF63-A1AFA9B125C4}

  - Execution

   [ ProcessID]  112

   [ ThreadID]  1320

   Channel System

   Computer REDACTED

  - Security

   [ UserID]  S-1-5-18

- EventData

  SupportInfo1 2

  SupportInfo2 1254

  ProcessingMode 1

  ProcessingTimeInMilliseconds 3188

  ErrorCode 64

  ErrorDescription The specified network name is no longer available.

  DCName \\domain.controller.fqdn

  GPOCNName LDAP://CN=Machine,cn={DE16CA21-9FDB-4B20-8FED-DC8297247855},cn=policies,cn=system,DC=rdacted,DC=redacted,DC=redacted

  FilePath \\domain\sysvol\domain\Policies\{DE16CA21-9FDB-4B20-8FED-DC8297247855}\Machine\registry.pol

1 Solution

Accepted Solutions
epa80
Hot Shot
Hot Shot

If by chance anyone sees this issue again, we found the culprit. Can't explain why, but, we found it.

It turns out we needed to exclude the file C:\Windows\ntbtlog.txt from being scanned by Deep Security. If we do that, on refresh, the VMs don't seem to get the random issue of some sort of network blip, that resulted in the initial pull down of GPOs to fail. I can't find a single KB/best practice guide stating anything about this file at all, let alone for exclusion reasons.

If someone is aware of what kind of relationship ntbtlog.txt has to the Refresh process, I'd love to hear about it. Again, on a full recompose/deployment of a full, the issue never happened. Upon refresh, about 1 in 4 or 5 VMs would experience it. Once we excluded the file from being scanned, we are able to refresh over and over without the issue.

Thanks.

View solution in original post

6 Replies
epa80
Hot Shot
Hot Shot

and I just came across this. Sounds similar/promising. Would love to know why it only seems to happen with 1 cluster though.

https://www.reddit.com/r/vmware/comments/5fypps/linkedclones_and_group_policy_gpo_not_applying_at/

0 Kudos
epa80
Hot Shot
Hot Shot

We implemented the 2 minute wait, but, didn't have any luck. The issue remained. We've now opened a ticket. One other failure we're seeing in the event logs is this:

Error9/14/2017 11:50Service Control Manager7026NoneThe following boot-start or system-start driver(s) failed to load:

ftsjail

vnetflt

vnetflt I believe has to do with the NSX introspection via VMware tools. We are indeed an NSX shop with Deep Security 10. Interestingly in our other VDI environment that isn't having the issue, we are still using vShield and utilizing DS 9.6. Wondering if something related to this upgrade/migration to NSX/DS10 is the culprit.

0 Kudos
epa80
Hot Shot
Hot Shot

Looks like it may be NSX related. When we disabled the NSX deployment on a cluster where the issue exists, I was able to refresh 50 VMs 3 times in a row. 0 of the VMs had the issue. I deployed NSX guest introspection back to the cluster, 1 refresh brought back 7 bad VMs, another refresh brought back 11 bad ones. I've updated my ticket with support, waiting to hear back.

0 Kudos
epa80
Hot Shot
Hot Shot

aaaaaaaaaaand I may have spoken to soon. seems the issue might be with our anti-malware product, Deep Security 10. With NSX deployed, but the Deep Security appliances disabled on our test cluster, the issue doesn't happen. I deploy and refresh the pool over and over, and nope, no issue. We upgraded our VMware tools to 10.1.10 and then brought back NSX, when we didn't see the issue, we said ah that's it. 10.1.10 needs to be there to fix NSX. But then we activated Deep Security.......

Once I activated the Deep Security appliances, the issue rears its head. What's odd is the way it comes back. Upon an initial deployment, all goes well. The 50 VMs spit out, they activate fine. When I refresh them though, is when the desktops randomly experience the issue. And it's never the same one each refresh. Some have the issue, some don't. Some on this host, some on that.

I have to think it's how DS handles the refreshing. Upon desktop deployment, when a new VM is created, DS is told to do nothing until 10 minutes passes, then activate the VM for protection. Upon refresh, they're not considered a newly created VM. They were there already, just refreshed. I assumed DS would treat it as if the VM was rebooted. It doesn't know the refresh happened, it just sees a VM there that was off and now its back. But I guess something is going on, as when they first come back, something goes on that causes them to have a network blip and not do their GPO pull.

Anyway, I have a ticket into Trend Micro now as well. Figured between them and View/NSX, we should be able to nail this. Maybe someone else will have experienced something similar and reply, so I can stop talking to myself. Smiley Happy

0 Kudos
epa80
Hot Shot
Hot Shot

Little bit more testing confirmed my previous post.

  1. Removed NSX and Deep Security outright from my test cluster. Refreshed the entire pool 3 times in a row, the issue never happened.
  2. Implemented NSX only. The NSX Guest Introspection agent. No Deep Security appliances whatsoever. Refreshed the entire pool 3 times in a row, the issue never happened. Seems that NSX on its own is ok. 3 for 3 no issues.
  3. Deployed and activated the Deep Security Appliances. This would mimic the real world fully protected. Refreshed the entire pool 3 times in a row, and all 3 times random VMs experienced the issue. Some network blip caused them to miss their GPO update interval on startup.

If I restart these problem desktops, NOT refresh just a normal restarts, it will come up fine. It's something about refreshing and Deep Security.

So, it looks like Trend Micro will be where I start, with some VMware assistance as well.

0 Kudos
epa80
Hot Shot
Hot Shot

If by chance anyone sees this issue again, we found the culprit. Can't explain why, but, we found it.

It turns out we needed to exclude the file C:\Windows\ntbtlog.txt from being scanned by Deep Security. If we do that, on refresh, the VMs don't seem to get the random issue of some sort of network blip, that resulted in the initial pull down of GPOs to fail. I can't find a single KB/best practice guide stating anything about this file at all, let alone for exclusion reasons.

If someone is aware of what kind of relationship ntbtlog.txt has to the Refresh process, I'd love to hear about it. Again, on a full recompose/deployment of a full, the issue never happened. Upon refresh, about 1 in 4 or 5 VMs would experience it. Once we excluded the file from being scanned, we are able to refresh over and over without the issue.

Thanks.