So I have the most dastardly issue that has me stumped (i hate debug logs with a passion now......)and i have a Sev 1 open. To give ya'll a general run through of my environment:
VCSA 6.7 embeded DB and PSC
ESXi 6.7 on 8 hosts (64 cores, 768GB RAM AKA hardware for days) HPE Simplivity solution
VM's are a mix of Win 10 and Win 7, all up to date to all current patches to the August patch set. Average size 50GB of data
22TB SSD NFS3 Datastore
6TB SSD VMFS5 Datastore for my old hardware datastores.
The behavior, when my environment is under 500 concurrent sessions it can spit out a machine in 5 min flat for any of my pools. Once i hit a higher number my machines get stuck at the "Creating Refresh Checkpoint" in the Customization stage of Provisioning. When I say stuck I mean like HOURS, this has forced me to over provision like crazy to make sure there are available machines since my environment is highly dynamic (College, 1 class logs in BAM! 50 machines in use). The kicker is if i log into the machines via the vCenter and reboot them through the guest OS its like MAGIC! they become available straight away.. Recently I increased the CPU and RAM on the main connection brokers to address another issue, they are now sitting at 16vCPU and 16GB of RAM. Our Composer is configured with 4vCPU and 8GB of RAM and it barely using any of the resources available. During high provisioning it barely uses more than half of the resources on it. The vCenter in the environment is an embedded database VCA and is configured for a Large environment with 16vCPU and 32GB of RAM. Lastly the Database server for the view environment is configured with 8vCPU and 32GB of RAM, and has minimal usage, it was over sized for the just in case moment. I’ve looked at each piece of the management resources for the environment and there is no storage constraints, or resource constraints that can be seen.
HELP!! I don't think the concrete pillar I've been banging my head against will survive much longer