1 2 Previous Next 27 Replies Latest reply on Mar 18, 2011 10:49 AM by lshelton

    Snapshot removal timeout

    lshelton Novice

      Good Morning,

       

      We've been running ESXi 3.5 for a couple of years with our VM's stored on NFS.  We also use the VCB scripts which have been working flawlessly after making the welll documented changes to the VMware config.

       

      Recently I've been testing vSphere 4.1 with the new VCB script and ran into an issue.  If I simply take a snapshot and then remove it I typically loose 1 ping when the snapshot is taken, and 1 when it is removed.  But If I run the VCB script, when backup finishes and the snapshot is removed the VM becomes unusable for about 30 seconds.  On the client, nothing is changing while the backup is running and the backup only takes 15 minutes.

       

      I did some Google searches and found several links that indicated that this was a bug in vSphere 4.1, and that this has not been addressed in the recently released update 1.

       

      Would like to get feedback on this and if anyone knows of a work around.  This would prevent us from going to 4.1 if there is no solution.

       

      Thank you,

      Lewis

        • 1. Re: Snapshot removal timeout
          lamw Guru
          VMware EmployeesCommunity Warriors

          Hello,

           

          This is a VMware community group for ghettoVCBg2 script and not VMware's VCB utility. You may want to post your inquiry on the main VMTN forum or a sub-forum related to your topic - http://communities.vmware.com/community/vmtn

          • 2. Re: Snapshot removal timeout
            lshelton Novice

            Sorry for not replying to this sooner, but I believe that it is related to the ghetto script.

             

            In the older versions of ESX you had to tweak the ESX configuration when using NFS as the data store so that you did not get timeouts when removing snapshots.  I had to do this when I originally started using ghetto on my 3.5 servers.  It appears to have been fixed in 4.1.  On my 4.1 test server I can manually create and remove snapshots with only 1 single ping lost for each which I can live with.

             

            The problem that I am having is when the ghetto script completes I get 8 ping timeouts.  So I would like to know if there is something I am doing wrong in the script, or if there is something else I need to do to overcome this.

             

            I'm not including memory in the snapshot, nor am I quiescing the VM.

             

            The NFS store is persistent on the VMware host.

             

            Thanks in advance,

            Lewis

            • 3. Re: Snapshot removal timeout
              lamw Guru
              VMware EmployeesCommunity Warriors

              This really depends on the VM and how busy it is, there's been some posts on the VMTN regarding this, so it's outside of the ghettoVCB scripts as it just leverages the VMware snapshot creation/removal.

               

              Here's a post that you may want to follow-up on as well - http://communities.vmware.com/thread/274479

               

              What you can do is manually create the snapshot using the vSphere Client (that's all the script is doing via the APIs) and see how many pings you're losing within the guest.

              • 4. Re: Snapshot removal timeout
                lshelton Novice

                Thanks for the link, but unfortunately this does not help.

                 

                The VMware host is only running 1 VM (the one I am testing)

                 

                The VM

                Small (40GB)

                No activity while the backup is happening

                A manual snapshot create/delete results in 2 ping losses (one when the snapshot is taken, one when it is removed)

                 

                What I have not done is wait the 20 minutes that it takes for the ghetto script to complete before remving the manually generated snapshot.  However I don't think this will make an difference as thinging is accessing the VM.

                 

                Thanks,

                Lewis

                • 5. Re: Snapshot removal timeout
                  lamw Guru
                  Community WarriorsVMware Employees

                  I just want to make sure, are you using ghettoVCB or ghettoVCBg2 script? I assume the latter as you posted in the ghettoVCBg2 forum but I just want to make sure as it may have an impact on what you're seeing.


                  Thanks

                  • 6. Re: Snapshot removal timeout
                    lshelton Novice

                    I downloaded the latest one this morning and had the same results.  When the backup completes and deletes the snapshot I loose 8 pings.

                     

                    The only things that I have changed from the defaults are:

                    1) The backup destination (to a Linux hosted NFS location)

                    2) The rotation count to 1

                    3)  The DISK_BACKUP_FORMAT is thin

                     

                    I am using  "-f" to specify the host(s) to backup which is just a single Windows 7 VM right now.

                     

                    Both source and destinations are NFS.

                     

                    I also tested by taking a manual snapshot, then waited 20 minutes (the same as it took for the Ghetto script to complete), then deleted the snapshot.  I lost only one ping.

                     

                    Thank you,

                    Lewis

                    • 7. Re: Snapshot removal timeout
                      vmbru Enthusiast

                      It could be a combination of script/appliance and vmware.  Overall it is a vmware process of the snapshot and removal timeout.  Using the appliance may introduce more lag but not much unless it is on bad network link.  We really need a retry feature in this script, we have bad snapshots using VDR 1.2 too, BUT it retries failed backups 3 times every 30 minutes.  I'd recommend a failed backup file that is created by this script that we the end user could then run another backup on later (or cron).  Not sure we'll ever get 100% good backups on 1st try with any solution (quiece issues).  A Retry feature is needed.

                       

                      We usually have 3 VM's that fail from time to time.  Usually 1 every 4-5 days.  You could create a snappy .sh script that parses thru the log file for the names of the failed VM backups and create a file for backup retry.

                      • 8. Re: Snapshot removal timeout
                        lshelton Novice

                        WIth the older script we see errors from time to time as well.  Usually related to a snapshot not getting removed properly.

                         

                        However in this case, the backup is running fine.  The only problem being that the snapshot removal interferes with the VM availability.

                         

                        Thanks,

                        Lewis

                        • 9. Re: Snapshot removal timeout
                          lamw Guru
                          VMware EmployeesCommunity Warriors

                          @lshelton,

                           

                          Thanks for info, I just wanted to double check we were talking about the same script as ghettoVCB and ghettoVCBg2 are two different scripts.

                           

                          Thanks for also running the test manually, I did have one additional question. You mentioned this VM is pretty much idle? This means that no matter when you run the ghettoVCBg2 backup script, you're seeing 8 ping lost? Are you manually pinging from vMA or from another host or from within the VM pinging to a gateway or some sort? I'm also assuming that you have the latest VMware Tools installed on this host?

                          • 10. Re: Snapshot removal timeout
                            lamw Guru
                            Community WarriorsVMware Employees

                            @vmbu,

                             

                            Thanks for the feedback, I'll definitely take "retry" into consideration for a future release. It's definitely something I can look into, though there maybe others who feel that if a user specified a particular backup window, that it should only execute during that period. I'll have to weight both pros/cons and figure out how to best incorporate such a feature.

                            • 11. Re: Snapshot removal timeout
                              lshelton Novice

                              OK, I am officially a knucklehead.  I’m using ghettoVCB not ghettoVCBg2.  My apologies.

                               

                              Thank you,

                              Lewis Shelton

                              • 12. Re: Snapshot removal timeout
                                lamw Guru
                                VMware EmployeesCommunity Warriors

                                ok, that's fine.

                                 

                                Can you also provide answers to these questions:

                                 

                                You mentioned this VM is pretty much idle? This means that no matter  when you run the ghettoVCB backup script, you're seeing 8 ping lost?  Are you manually pinging from vMA or from another host or from within  the VM pinging to a gateway or some sort? I'm also assuming that you  have the latest VMware Tools installed on this host?

                                • 13. Re: Snapshot removal timeout
                                  lshelton Novice

                                  Yes, I've run the script multiple times over the course of a couple of days.  The VMware host is only running this one VM, and always on removing the snapshot it looses 8 pings.

                                   

                                  I'm running a continous ping from my desktop to the VM

                                   

                                  Yes, I verified the VMware tools install this morning.

                                   

                                  Thanks,
                                  Lewis

                                  • 14. Re: Snapshot removal timeout
                                    lamw Guru
                                    VMware EmployeesCommunity Warriors

                                    You can try taking a look at your hostd and vmware.logs to see if there's anything odd, is this running on ESX or ESXi? You also said that if you manually created this snapshot and removed it you _always_ just lose a single ping on each operation?

                                     

                                    Only other difference is the script is running in the Service Console/Busybox versus going through the APIs which is what the vSphere Client is doing and this consumes some amount of resource but shouldn't be anything that would impact a single VM unless your ESX(i) host is constrained on resources

                                    1 2 Previous Next