Starting about a week ago we've started getting a LOT of provisioning errors in our on-prem Horizon 7.5 environment. The only thing that has changed is the Windows June update was applied to the connection brokers and the Active Directory controllers. While I would love to blame that upgrade on the problem I've been told we aren't removing it.
So current environment:
I've already tried the famous power cycle: Turn off all connection brokers, reboot vCenter, power on connection brokers one at a time.
I've already deleted the problem pool and created a new one thinking it might be an issue in the ADAM database
Provisioning errors are not the same one every time, but seem to be related to either deleting an old machine or trying to reuse an old name as if it didn't get removed from vcenter/AD/DNS or something in time before Horizon tries to use the name again.
Here are some example errors taken from the SQL Event Log:
I've talked with VMware and they don't know what is going on. I hope someone here has some insight and can point me in the right direction.
Did you try running viewdbchk?
Resolving Database Inconsistencies with the ViewDbChk Command (it says 6 but it works in 7)
Deleting the pool I would think would work. I'd also upgrade the agent. there was a problem in 7.4 that caused provisioning errors and a workaround was to upgrade the agent until you could upgrade the connection servers. I just ran into this on 7.4 and running the 7.5.1 agent resolved it.
I've seen similar issues recently- Here are a couple of things to try:
For the "Cannot complete the operation because the file or folder /vmfs/volumes/a69c6ffe-991bcc8a/dce-41812/dce-41812-000001.vmdk already exists" issue, check your datastore(s) for folders that have a "_1" appended to them. We had accumulated a few instances where vCenter/View did not delete the folder when machines were torn down as part of the provisioning process. In our most recent case, the xxxx_1 folder was current and in-use and the xxxx folder could be deleted manually. I also ran into issues during the cleanup of some of the folders with lock files preventing the deletion.
For the "The name 'dce-41791' already exists." issue, I've found these VMs in my vCenter but are powered off. My workaround has been to manually delete them from vCenter and then View will happily reprovision a new machine. Attempts to remove them from View (manually or by running viewdbchk) have not been successful since View really does not recognize these VMs. I don't have a good explanation as to how/why this happens but seems to occur after patching so I think a few machines get "lost in the shuffle" during connection broker and/or vCenter reboots.
