VMware Horizon Community
Mkoechel
Contributor
Contributor

Instant-Clone Pool Patching Restrictions

Hi All!

I've inherited an environment that has 30+ instant-clone desktop pools (Horizon 8, 2306). The max number of VMs in each of the pools ranges between 5 and 20. All 30+ pools use the same parent image and we deliver unique LOB applications using AppVols. 

When I joined the team, I was informed that during our monthly recompose we should update a maximum of 4 pools per hour (information was rumored to have been provided by VMware), regardless of how many VMs reside in those pools. This means I end up spending around 8 hours pushing the same image to 30+ pools, 4 pools at a time. 

This seems suspect/incorrect to me, but I don't see any information in the docs or on The Google RE: this restriction. Can anyone confirm or dispel this restriction? 

Thanks in advance for any info, thoughts or direction!  

Labels (1)
0 Kudos
4 Replies
BenTrojahn
Enthusiast
Enthusiast

I have never seen such a restriction and suspect strongly that the recommendations are just a practice for what worked reliably for the previous admins.  I'm sure someone will correct me otherwise.  IMO That's far too slow but there may be valid~ish "reasons" even if those reasons are out of date or just misunderstood. 

They may simply be proving time to not commit to a bad image and run out of good pools before you find out that you need to roll back and find your previous image was fully unpublished.  A poorly designed or configured DHCP scope(s) might be another reason to go slow so you don't run out of addresses.   Going 'slow" might be band-aiding another problem that needs fixed such as time for GPOs to apply or to reduce load on marginal storage system. 

Once the new image is "seeded" on the target cluster/datastore, deploy time is limited by your storage speed and these settings found in the HZ console: 

Max concurrent maintenance operations
Max concurrent Instant Clone Engine provisioning operations

Even at the defaults, its pretty fast ONCE the CPs have been created on your target cluster/datastore.  These don't normally need tweaking and from your description you have procedure issues causing delays or problems that need addressed.

0 Kudos
Mkoechel
Contributor
Contributor

Thanks for sharing your thoughts @BenTrojahn! I agree the approach is an inherited practice - I'd love to have quantitative info to lean on when I approach the team to revamp the process. 

Given we update all 30+ pools in a single after-hours window, I don't think the slow roll is to catch any image problems. We do have a test group vet the updated image functionality prior to pushing org wide. I took a look at the DHCP config, all VMs across all pools get an IP from the same scope which is on a /21 subnet. I did the math for the default number of machines that are provisioned during the recompose, and it works out to be around 280 VMs, well under the 2000 addresses that are available on a /21 subnet.

The compute resources we have in place are pretty robust - during peak use we under 50% CPU utilization, around 30% Mem utilization and there's plenty of storage. That said, I do suppose the deletion and creation process for VMs could be more resource intensive, but that's where the Max concurrent properties you referenced would come into play. 

As far as the GPO issue, this is the only one that kind of gives me pause. If I were to theoretically recompose all pools at once, as mentioned above I would be attempting to create around 280 VMs. I wouldn't presume to think that would be more policy queries than our DCs could handle, especially since the provisioning process wouldn't be instantaneous due to Max concurrent properties?

Thanks again for your thoughts! 

0 Kudos
BenTrojahn
Enthusiast
Enthusiast

yeah i was just spit balling a bit trying to come up with any reason to what could possible be wrong.   the gpo comment was not a AD perf  issue per se, that's generally pretty good unless somethings really messed up, it can be, just not likely.  I was alluding to the issue that policy is inherited on your CP template from the first deployment in that pod and without a customization script to reboot (my preference) to pickup the OU specific policy.  

Speaking of AD, replication or poor AD sites config can cause trouble with both policy and account creation.  Poorly replicating DC in your site does wreck vm deployment regardless.  

Max concurrent is generally an issue with IOps.  generally you should be good with the defaults.  

For reference i just now deployed to a farm of 110 rdsh (no sessions) in 6m39s from the time I sub the maintenance until the agents showed the version in console and that includes the before mentioned additional reboot.  As mentioned earlier this image all ready had CPs local.  definitely seems good enough.


0 Kudos
Pranav11
Contributor
Contributor

In troubleshooting, I explored various possibilities, including Active Directory performance issues, though unlikely. The reference to GPO pertains to policy inheritance on your CP template. Without a reboot customization script, policies may not update according to OU specifics from the initial pod deployment, a potential source of the problem.Fm Whatsapp

0 Kudos