7 Replies Latest reply on Jun 21, 2019 3:23 PM by LukaszDziwisz

    Instant clones slow provisioning

    LukaszDziwisz Enthusiast

      Hello,

       

      I was wondering if maybe someone came across similar issue and might offer an advise on how to troubleshoot it and resolve.  We are using Windows 1809 LTSC, Horizon 7.7, AppVolumes 2.16 and UEM 9.7 and vCenter 6.7 U2. So pretty much all the newest. Our storage is all flash based Pure, however it appears that the Instant clone takes roughly 5  minutes before it becomes available. It is in provisioning state for about 1 min 30 sec and then 3 mins 30 sec in customization state and then becomes available.

       

      I opened a ticket with VMware on it and here is what we are seeing from the log:

       

      2019-05-22 16:07:14 (3444) info CSvmgaService - (svgaservice.cpp, 397 ) shutting down the service

       

      2019-05-22 16:11:54 (4268) debug Svmga:: setservicestatus - (svmga.cpp , 522) service stats set to 2

       

      As you can see there is a 4  minute gap that would account for the long time before machine becomes available.

      Next update from Vmware engineer is as follows:

       

      Furthermore i discussed the case with my technical lead and he mentioned that it either could be some script or some program on VM itself which is causing it to reboot. As we are using clone prep so machine should not reboot however in your case its doing exactly the opposite.

       

      The process C:\Program Files (x86)\Common Files\VMware\View Composer Guest Agent\vmware-svi-ga.exe (TESTIMRW5) has initiated the shutdown of computer TESTIMRW5 on behalf of user NT AUTHORITY\SYSTEM for the following reason: Operating System: Recovery (Planned)

       

      Reason Code: 0x80020002

       

      So the next suggestion is to build brand new image with plain windows, install vmtools and horizon agent only and provision pool. I did that and bam the instant clone only takes 30 seconds before it becomes available.

       

      I don't really want to redo my images as it will be very time consuming and pretty sure will break other things so I was wondering if anyone might have any idea on how to resolve it?

        • 1. Re: Instant clones slow provisioning
          BenFB Expert

          As painful as it is building a new image is the best way to identify the problematic software. We've had to do this on a few occasions.

           

          This would be a great time to at a minimum document and slim down the image. If possible look at using automation so you can produce a new image on demand within minutes.

          • 2. Re: Instant clones slow provisioning
            LukaszDziwisz Enthusiast

            Yeah I understand but unfortunately packaging applications through AppVolumes is not as reliable as putting it on the image therefore we are trying to pack as much as we can on the image and then just supplement it with AppStacks.

             

            Just curious, what do you use for automation and how would this work? Do you mean like create a template at some point in time nad then add applications?

             

            Also, back to original issue, I was able to pinpoint the problem and it appears to be  adding a shared PCI device to the image for the NVidia Grid. So far no agents seem to be breaking it until I get to the point of adding “Shared PCI device” for the NVIDIA grid. As soon as I do that provisioning jumps from 30 seconds to 5 minutes. I have removed the Shared PCI device in Mater Image settings but left drivers installed on the image and I’m back to 30 seconds again. I’m not too sure if this is NVIdia specific or adding a Shared PCI device problem though or problem with Horizon and how it deals with Shared PCI devices in general

            • 3. Re: Instant clones slow provisioning
              MrCheesecake Novice

              Has this always happened or is a recent issue with a pool that's been operating for a while?

               

              Do you have Windows Update disabled on your base image?  I'm wondering if there's a "rogue" update being pulled down during the provisioning process which could cause the reboot.

               

              Along the same lines, I assume you assign your App Volumes to users rather than machines?  If you're assigning them to machines, could one of the apps be kicking off a reboot?

              • 4. Re: Instant clones slow provisioning
                BenFB Expert

                While AppVolumes/FlexApp/etc... are options I was thinking about automation with something like Ansible from Red Hat. You can get to the point where you are deploying a new VM every time using the latest Microsoft WIM and script all of the application installations. This should allow you to have a reproducible parent VM that can be created under an hour.

                • 5. Re: Instant clones slow provisioning
                  cdickerson75 Novice

                  We are seeing the exact same problem.  It appears to be a issue with Windows 10 1809 and Nvidia with instant clones.  If we switch to 1709, no issue.  We opened a ticket with Nvidia, but they aren't responding.

                   

                  -Craig

                  • 6. Re: Instant clones slow provisioning
                    LukaszDziwisz Enthusiast

                    We also opened a ticket with NVIdia and so far they told me to upgrade to version 425.31 Grid 8 as this is apparently first version that supports WIndows 1809. I will give it a try this weekend and post results. Also, we do have a ticket opened with VMware just to see if it's Shared PCI device issue in general

                    • 7. Re: Instant clones slow provisioning
                      LukaszDziwisz Enthusiast

                      It looks like it has been going for a while. At first we had issues with VMFS6 and snapshots which is apparently going to be fixed in futiure releases of vsphere then we ran into other problems with it. Now we are finally getting into a stable state and that seem to be the biggest issue. I tried brand new Master Image with no updates, customizations etc and it happens as well. As for AppVolumes we are attaching per user not per machine. Updates and Module Installer are disabled as well.