VMware Horizon Community
mach170
Contributor
Contributor

Agent Needs Reboot on one pool

I've recently been having an issue with one of my desktop pools getting an 'Agent needs reboot' error on random VM's. My setup has 3 non-persistent linked-clone pools all based from the same Golden Master image. The Horizon Standard install is version 7.8. The agent on the golden master is version 7.8.0-12599301.

Two of the pools (115 VMs) are refreshing and provisioning just fine without issues. The third pool (21 VMs) using the same settings as the other two, just a different name, constantly has 1-3 VMs showing the 'Agent needs reboot' when the VMs are refreshed or recomposed.

I've even deleted and recreated the pools and it is still the same pool that gets the errors. Rebooting the VM fixes the issue most times.

I'm at a loss as to why it is this one pool that gets the error.

Any advice on what to look for in the logs and which logs?

Reply
0 Kudos
6 Replies
ashsevenuk80
Enthusiast
Enthusiast

Hi,

I would uninstall, reboot and reinstall the view agent on the master/gold image. also ensure its in line with the version of Horizon you're running. So, agent 7.9 with Horizon view version 7.9....

Hope that helps

Please give a thumps up if the above is helpful or tag as correct if the above fixes your issue.

Regards

Arshad

Reply
0 Kudos
mach170
Contributor
Contributor

Arshad,

Thanks for the reply. I have already tried what you suggested and I still get the 'Agent needs reboot' message on only the one pool. I ran the service.bat file to pull the logs from a VM in each pool. I'm in the process of comparing the logs, but there are so many. It's going to take a while.

Thanks,

Brent

Reply
0 Kudos
RBeaber
Contributor
Contributor

Did you ever find a cause for this?  I'm having a very similar issue.

Thanks,

-Rick

Reply
0 Kudos
TonySpeight
Contributor
Contributor

I don't know if this will help anyone as I have just had this issue and managed to resolve it. I had upgrade App Volumes agent to 2.18.4 from 2.16. I have several pools that assign appstacks based on their computer name. So when these were powering up 3/4 pool provisioned fine but 1 would always say agent needs reboot.

I did various things in the image with no joy. I then moved the machines into a different OU to eliminate AppStacks and all of a sudden the machines started provisioning again. Moved them back to original OU and again agent needs reboot.

I downgraded the AppVolumes agent to 2.18.0 and so far so good all machines are provisioning OK. I think it will be a task of updating all AppStacks with template 2.18.0 before moving to 2.18.4 and above.

Reply
0 Kudos
TonySpeight
Contributor
Contributor

Update to my previous message.

It was a bit coincidence everything fixed it self by going to a different version of AppVolumes. We noticed the issue happen again and we found that settings we had applied to the machine had been removed in the process of upgrading our agents. Such as EnableFirewallProcessing 0. When we upgraded this registry was removed and because we have machine based AppStacks the firewall wasn't processing correctly and not allowing the Horizon agent communicate back to view server.

We have made this changes in GPO now so if they are ever removed they are automatically added back in. 

Reply
0 Kudos
adamabel
Enthusiast
Enthusiast

I recently ran into this on my horizon 8 agent pool.  I had 6 or 7 RDS hosts that were in this state.  After digging through both the server and agent logs and only finding the following on the server (log location program data/vmware/VDM/logs) which was no real help. 

 

021-07-06T14:36:02.031-07:00 DEBUG (1214-1110) <ajp-nio-127.0.0.1-8009-exec-6> [FarmImp] (SESSION:6212_241f) Existing session app history checks for app: LP-Office-Pool enabled for userDn: CN=S-1-5-21-982881632-1095161353-1905309208-,CN=ForeignSecurityPrincipals,DC=vdi,DC=vmware,DC=int
2021-07-06T14:36:02.031-07:00 DEBUG (1214-1110) <ajp-nio-127.0.0.1-8009-exec-6> [FarmImp] (SESSION:6212241f) Creating new session for launch item cn=lp-office-pool,ou=applications,dc=vdi,dc=vmware,dc=int from pool cn=lp-office-farm,ou=server groups,dc=vdi,dc=vmware,dc=int for userDn: CN=S-1-5-21-982881632-1095161353-1905309208-,CN=ForeignSecurityPrincipals,DC=vdi,DC=vmware,DC=int
2021-07-06T14:36:02.031-07:00 DEBUG (1214-1110) <ajp-nio-127.0.0.1-8009-exec-6> [FarmImp] (SESSION:6212241f) Selecting server based on load balance preference for userDn: CN=S-1-5-21-982881632-1095161353-1905309208-,CN=ForeignSecurityPrincipals,DC=vdi,DC=vmware,DC=int
2021-07-06T14:36:02.031-07:00 DEBUG (1214-1110) <ajp-nio-127.0.0.1-8009-exec-6> [FarmImp] (SESSION:6212241f) cn=cb52aef8-eeda-4514-a8b1-303348c80676,ou=servers,dc=vdi,dc=vmware,dc=int has no suitable protocols, excluding server
2021-07-06T14:36:02.031-07:00 DEBUG (1214-1110) <ajp-nio-127.0.0.1-8009-exec-6> [FarmImp] (SESSION:6212241f) total servers considered: 1, servers excluded: 1

 

A colleague of mine noticed that there Vmware VMCI host Devices that reported an error of "windows couldn't load drivers"  in the device manager --> system devices. 

We just uninstalled them as there were already VMware VMCI Host Devices that were working properly in the list, rebooted and the problem cleared.  We don't have a root cause unfortunately though.  hope this helps any stuck where just a reboot or reinstall didn't fix the problem. 

 

Reply
0 Kudos