VMware Horizon Community
CraigTompkins1
Contributor
Contributor

Provisioning issues

Starting about a week ago we've started getting a LOT of provisioning errors in our on-prem Horizon 7.5 environment.  The only thing that has changed is the Windows June update was applied to the connection brokers and the Active Directory controllers.  While I would love to blame that upgrade on the problem I've been told we aren't removing it.

So current environment:

  • Horizon 7.5, UEM 9.4, Agent 7.4, WIndows 10 1607 (not patched since Nov 2018.  Office click to run updated in early June)
  • 2 connection brokers on a load balancer.  I've disabled 1 and then the other on the LB to try and isolate an issue with the connection broker, but it happens on both
  • Instant Clones
  • ~2000 desktops in 1 pool backed by 40 Nutanix Nodes

I've already tried the famous power cycle: Turn off all connection brokers, reboot vCenter, power on connection brokers one at a time.

I've already deleted the problem pool and created a new one thinking it might be an issue in the ADAM database

Provisioning errors are not the same one every time, but seem to be related to either deleting an old machine or trying to reuse an old name as if it didn't get removed from vcenter/AD/DNS or something in time before Horizon tries to use the name again.

Here are some example errors taken from the SQL Event Log:

  • Cloning of VM dce-41791 has failed: Fault type is VC_FAULT_FATAL - The name 'dce-41791' already exists.
  • Cloning of VM dce-41790 has failed: Fault type is VC_FAULT_FATAL - The name 'dce-41790' already exists.
  • Cloning of VM dce-41791 has failed: Fault type is VC_FAULT_FATAL - The name 'dce-41791' already exists.
  • Cloning of VM dce-41812 has failed: Fault type is VC_FAULT_FATAL - Cannot complete the operation because the file or folder /vmfs/volumes/a69c6ffe-991bcc8a/dce-41812/dce-41812-000001.vmdk already exists
  • Cloning of VM dce-40604 has failed: Fault type is VC_FAULT_FATAL - The specified key, name, or identifier 'dce-40604' already exists.
  • Cloning of VM dce-41419 has failed. Timed out waiting for operation to complete. Total time waited 30 mins
  • Cloning of VM dce-41436 has failed. Timed out waiting for operation to complete. Total time waited 30 mins
  • Cloning of VM dce-40341 has failed: Fault type is VC_FAULT_FATAL - No host is compatible with the virtual machine.
  • Cloning of VM dce-40641 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-40065 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-41097 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-41110 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-41174 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-40576 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-40245 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-40132 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-40182 has failed. ClonePrep Service is tearing down or is shutting down vc.
  • Cloning of VM dce-40181 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-40051 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce-40068 has failed: Fault type is VC_FAULT_FATAL - Operation timed out.
  • Cloning of VM dce20986 has failed. Timed out waiting for operation to complete. Total time waited 30 mins
  • Cloning of VM dce20986 has failed: Fault type is VC_FAULT_FATAL - The name 'dce20986' already exists.

I've talked with VMware and they don't know what is going on.  I hope someone here has some insight and can point me in the right direction.

Reply
0 Kudos
2 Replies
sjesse
Leadership
Leadership

Did you try running viewdbchk?

Resolving Database Inconsistencies with the ViewDbChk Command (it says 6 but it works in 7)

Deleting the pool I would think would work. I'd also upgrade the agent. there was a problem in 7.4 that caused provisioning errors and a workaround was to upgrade the agent until you could upgrade the connection servers. I just ran into this on 7.4 and running the 7.5.1 agent resolved it.

Reply
0 Kudos
MrCheesecake
Enthusiast
Enthusiast

I've seen similar issues recently-  Here are a couple of things to try:

For the "Cannot complete the operation because the file or folder /vmfs/volumes/a69c6ffe-991bcc8a/dce-41812/dce-41812-000001.vmdk already exists" issue, check your datastore(s) for folders that have a "_1" appended to them.  We had accumulated a few instances where vCenter/View did not delete the folder when machines were torn down as part of the provisioning process.  In our most recent case, the xxxx_1 folder was current and in-use and the xxxx folder could be deleted manually.  I also ran into issues during the cleanup of some of the folders with lock files preventing the deletion.

For the "The name 'dce-41791' already exists." issue, I've found these VMs in my vCenter but are powered off.  My workaround has been to manually delete them from vCenter and then View will happily reprovision a new machine.  Attempts to remove them from View (manually or by running viewdbchk) have not been successful since View really does not recognize these VMs.  I don't have a good explanation as to how/why this happens but seems to occur after patching so I think a few machines get "lost in the shuffle" during connection broker and/or vCenter reboots.

Reply
0 Kudos