VMware Horizon Community
vmuser2344
Contributor
Contributor

Does anyone experiencing Windows 10 hung on reboot with linked clone VMs?

We are experiencing some issues with Linked Clone VMs that are stuck in reboot. This happens to random VMs. Anyone?

13 Replies
techguy129
Expert
Expert

I had a problem with VM's not restarting properly due to a service failing to stop. I had to reinstall the program. I also had something similar with UEM not configured to run at logoff script as noted in the documentation: Configure FlexEngine to Run From a Logoff Script

I would suggest checking to see if you have any logoff scripts that running and fail to stop or end (like stuck in a loop or waiting on something).

Reply
0 Kudos
nschlip
Enthusiast
Enthusiast

We too are experiencing a similar issue - the gold (or master) Win10 1809 image works fine - I can reboot it and it completes the reboot in less than 30 seconds (about 26 seconds). However, any linked clones we create take upwards of 30 minutes for a reboot to complete! I've tried:

- Linked clones using EFI

- Linked clones using BIOS

- Linked clones with EFI & Secure Boot

The other interesting thing, is if I used vSphere to console to a linked clone VM where no one has logged into, and reboot the VM, it takes about 2min. 30 seconds - however, once someone logs into the VM and reboots, it takes anywhere from 15-30 minutes for the reboot to complete. It doesn't matter if it's using BIOS or EFI (secure boot is turned off for EFI).

I have a support case open with VMWare to try and figure this out - if I find a solution I'll definitely let you know.

EDIT: we're also running ESXi 6.7 U1 w/Horizon 7.5.1 (both composer and connection servers) w/latest VMWare tools from the hosts.

Reply
0 Kudos
nettech1
Expert
Expert

@nschlip

What version of VMWare tools is installed on the parent? Assuming you are on 7.5.1 agent since your Horizon View is 7.51

If i am not mistaken 10.3.2 is included with ESXi 6.7 U1, and I don't see that version listed on the VMware Product Interoperability Matrices for 7.5.1 at all. Version 10.3.0 is not supported either.

VMware ESXi 6.7 Update 1 Release Notes

VMware Product Interoperability Matrices

Thanks

Reply
0 Kudos
EricNichols
Hot Shot
Hot Shot

Try turning off storage acceleration in view admin to see if that fixes it.

We found for linked clones that storage acceleration wasn't worth the delay.

http://myvirtualcloud.net/view-vsa-recompose-time-issue-workaround/

nschlip
Enthusiast
Enthusiast

So, in the vSphere console it shows the VM running tools 10338 - if I look directly on the VM in Windows in Apps & Features it shows tools 10.3.2.9925305. Comparing against the VMWare Product Interoperability Matrices, it doesn't have an option to select tools 10.3.2. However, it does show 10.3.5 as compatible with horizon 7.5.1, so I'll manually download that and give it a try.

Reply
0 Kudos
nschlip
Enthusiast
Enthusiast

Turned off Storage Accelerator on one of my Win10 test pools, and unfortunately that didn't seem to help - the VM is still at the Win10 circular dots going around in circles as I type this....going on 20 minutes now. Sigh.

Reply
0 Kudos
BenFB
Virtuoso
Virtuoso

What are you running for anti-virus? We've seen similar issues with anti-virus causing both Windows 7 and Windows 10 boot issues or reports of corruption.

Regarding the VMware product interoperability matrix. If a version is not listed (e.g. VMware Tools 10.3.2) than it's not considered a tested and supported version.

Reply
0 Kudos
nschlip
Enthusiast
Enthusiast

For AV we're running System Center Endpoint Protection - interesting on the interoperability matrix....when I select horizon 7.5.1 and Windows 10 agent, absoutely nothing comes up. Even if I select horizon 7.5.1 7.6 and 7.7, it displays nothing. Is that saying Win10 is not supported with any of those view agents?

Reply
0 Kudos
BenFB
Virtuoso
Virtuoso

The interoperability matrix just covers Horizon Connection Server and Horizon Agent compatibility. You can find Windows 10 compatibility on Supported versions of Windows 10 on Horizon Agent Including All VDI Clones (Full Clones, Instant Clo....

I would try removing SEP temporarily and see if that resolves your issue. We had to work with our anti-virus vendor to get them to play nice with our linked clones. I know others have had trouble with SEP.

Reply
0 Kudos
nettech1
Expert
Expert

Do you refresh on log off?

Wondering if you can put one of your VMs in maintenance mode and initiate a reboot? Once VM is rebooted you can examine the system and app logs for errors & warnings that are logged after reboot is initiated.

Reply
0 Kudos
nschlip
Enthusiast
Enthusiast

Hello everyone,

***Note***

This is posted in another thread in the vmware communities as well, but thought I'd post to this one as well.

I have an update to share on this - it may be fairly "wordy" (I know that's not a word), but here goes....

Issue

Linked-clone Windows 10 VM's take a considerable amount of time (5-30 minutes, depending) to complete a guest OS restart cycle.

Troubleshooting

To make a long story short, I eventually noticed that disk IO was ZERO during a Windows 10 VM restart - it zero's out until the VM gets to the Windows 10 logo with the spinning circles, then the disk IO suddenly spikes once it gets to the login screen. From this, I began to look at disk configuration settings directly on the VM (vSphere -> right-click VM -> edit). I noticed the disk was using LSI SAS. I reviewed event logs, and noticed (on several Win10 VM's I tested with) there was a 10+ minute span of time with LSI_SAS event warnings. From there, I created a brand new Windows 10 master/gold VM using Paravirtual SCSI instead of LSI SAS. I then sanitized the mast VM and created a new pool - the Win10 linked-clone VM's are now down to 5 minute restarts. Much better, but still not great by any means.

We then noticed that the .vmdk on the VM (all the linked-clone VM's, in fact) was using a "vmname.checkpoint.vmdk" - this "checkpoint" in the .vmdk indicates it's using a snapshot from the master VM. After many hours of testing, we found a sort of fix/workaround....

Fix

If you:

1. Shut down the VM and migrate the storage (storage only)

2. Select configure per disk

3. Change the storage datastore for each disk on the VM, click Next, let it complete (may take anywhere from a few minutes to several, depending on disk size)

4. You should then have an alert on the VM indicating a consolidation is needed

5. Right-click the VM -> Snapshots -> Consolidate

6. Once the consolidation is finished, right click the VM - notice that you can now change the disk size (if needed)

7. Power up the VM, log in, and restart

The restart time is now down to under 30 seconds (I've seen it as low as 7-8 seconds). I tested this using thick and thin disk provisioning, and either one didn't seem to make a difference. With 100% certainity (at least in my case) it has to do with the VM running off a "checkpoint.vmdk" disk - once the VM's disk is pointed back to its own self-named .vmdk, the restart issue is resolved.

Now to provide some additional context to this:

In vSphere, I create a new VM, set configuration options, point to an .iso, power up VM and proceed through our imaging process. Once this is complete, I log into the VM using a domain admin account, configure the OS, sanitize (prep it for cloning), and take a single snapshot.

I then create (or edit) a View pool (for Win10 testing they have all been persisten/dedicated pool's) which provisioning VM's off the snap from that new gold VM. Something interesting during this composing process I noticed earlier today - when the linked-clone (we unfortunately do not have licensing for instant clones) is being provisioned, I noticed the disk being used is its own named "vmname.vmdk" - great! However, once provisioning is complete, the disk changes to "vmname.checkpoint.vmdk"

Anyone have any insight into why that occurs? Why wouldn't the cloned VM just continue to use it own named .vmdk? Any helpful answer on that would be much appreciated!

At any rate, we're able to fix/work around the Win10 restart slowness by, again, migrating the VM's storage only, to a new datastore, consolidate under snapshots, power up VM, and BAM - fixed.

I hope all of this makes sense - if anyone is still experiencing Win10 restart slowness and is able to try this storage migration/consolidate process, please give it a try and let me know the result. So far it's been consisten for me, but I'd like to hear about what others experience is.

This almost feels like a possible bug in vCenter or View somewhere, but I can't say for sure - anyone with VMWare have any thoughts to share?

Also, fwiw, we're running a Pure storage array (not vSAN) with plenty of space available and excellent dedupe. Also, we're using EFI (though I've tried EFI and BIOS, neither seems to make a difference, as far as restart times are concerned).

-Nathan

Reply
0 Kudos
yymryy
Contributor
Contributor

Hi,

Is there ane news regarding this issue?  I have the same problem, first restart can take almost one hour or hung.

vCenter Appliance 6.7 U1b

ESXi 6.7 EP 06

Horizon View Manager 7.8

Tested: Windows 10 1803/1809

Reply
0 Kudos
dpeterson
Contributor
Contributor

Take a look at this post: https://communities.vmware.com/thread/598763

There are a lot of people out there having this issue.  VMWare says a fix is coming but not until the end of Q2.

Reply
0 Kudos