8 Replies Latest reply on Jul 16, 2019 12:02 AM by Linjo

    Snapshot stun issue

    itorder Lurker

      Hello,

       

      I am facing longs stun on VM very busy while backuping, during the consolidation step, something like 40s, which totally down the jobs running on the VM.

       

      I have seen that we can play with "snapshot.maxConsolidateTime" on v3.5 to v5.5, but i am on version 6.5, what can be done ?

       

      Regards,

        • 1. Re: Snapshot stun issue
          pragg12 Enthusiast
          vExpert

          Hi,

           

          Welcome to VMTN. :-)

           

          Refer to below VMware KB on this particular setting.

          How to increase the time limit on snapshot consolidation (2146270)

           

          I haven't played with this setting before. So if you want to check, check through a test VM.

           

          Need some insight into the the affected VM.

          1. What troubleshooting steps you have performed till now ?

          2. Total size of VM ?

          3. Application(s) hosted on VM ?

          4. Any active or manual snapshot on VM before backup starts ?

          5. Is VM doing any IO intensive operation during backup time ?

          6. Have you tried modifying backup schedule to see if you face long stun times again ?

          • 2. Re: Snapshot stun issue
            itorder Lurker

            Hi,

             

            Thanks for your reply, i've seen this link before but it only refer 6.0, not 6.5.

            We will make a try on next week.

             

             

            2. Total size of VM ?

                 2x100gb

            3. Application(s) hosted on VM ?

                 SAP, also Orchestrator on some VM, or custom apps.

            4. Any active or manual snapshot on VM before backup starts ?

                 No

            5. Is VM doing any IO intensive operation during backup time ?

                 Yes, this is precisly the moment where the vm stun 40s, when there is an "normal" IO rate the stun are like 1s

            6. Have you tried modifying backup schedule to see if you face long stun times again ?

                 We can't because of internal policies.

            • 3. Re: Snapshot stun issue
              pragg12 Enthusiast
              vExpert

              Normally, it doesn't take 40 sec for a 200GB VM to perform snapshot tasks. However, since you said SAP and high IOs in response to ques 3 and 5 respectively, I have few more questions.

               

              7. What's the underlying storage type for this VM ? HDD or SSD ?

              8. How many other VMs on the same datastore as this SAP VM ?

              9. Are the VMs for ques 8 on same ESXi host as SAP VM ?

              10. Does the backup schedule of the VMs for ques 8 run at same time as the SAP VM ?

              • 4. Re: Snapshot stun issue
                itorder Lurker

                7. What's the underlying storage type for this VM ? HDD or SSD ?

                Mixed of SSD/HHD, VMs are stocked on Nutanix, we are using Rubrik to backup the VMs.

                 

                8. How many other VMs on the same datastore as this SAP VM ?

                They are all location on different esxi, but on the same Datastore.

                We already tried to migrate on VM to a different Datastore but we faced the same stun time, 40s.

                 

                9. Are the VMs for ques 8 on same ESXi host as SAP VM ?

                All SAP VM are on the same esxi, We already tried to migrate on VM to a different Datastore but we faced the same stun time, 40s.

                 

                10. Does the backup schedule of the VMs for ques 8 run at same time as the SAP VM ?

                The backup window is from midnight to 8am. we can manage the precise backup time in this window.

                • 5. Re: Snapshot stun issue
                  pragg12 Enthusiast
                  vExpert

                  Hi,

                   

                  I see you have marked my previous response as Correct. Did you find something which resolved your issue ?

                  • 6. Re: Snapshot stun issue
                    itorder Lurker

                    Hello,

                    I mark as correct by error, i didn't find a fix yet.

                     

                    We try to configure MaxConsolidateTime from 6 to 30, but this setting is marked for v6.0, and we got stunned for 41s still.

                     

                    Regards,

                    • 7. Re: Snapshot stun issue
                      pragg12 Enthusiast
                      vExpert

                      I have few suggestions to further check this.

                       

                      1. Migrate the affected VM to another host in cluster and wait for backup schedule to run at same time to see if the stun time is still same.

                       

                      2. If stun time is still same, change the current backup time to a different time in same backup window when backups of VMs on same datastore have already finished.

                       

                      Share your findings once you have checked. Also, let us know the RF in place for the underlying Nutanix datastore cluster.

                      • 8. Re: Snapshot stun issue
                        Linjo Champion
                        User Moderators

                        Unchecked the "Correct" answer since it seems that it was in error.

                        I would also recommend to open a SR with Nutanix since they would be able to trace this in their storage-stack better.

                         

                        Best Regards, Linjo