10 Replies Latest reply on Nov 14, 2016 2:30 AM by AntonKr

    How to freeing up space in VDP with the capacity health check limit reached?

    racom Enthusiast

      Used Capacity on one of our VDP reached 96.46% and  Backup Scheduler was stopped. I'm able to start Backup Scheduler. But although I deleted many backups the used capacity is still the same. Is any way to freeing up space or should I better do rollback to the last validated checkpoint?

        • 1. Re: How to freeing up space in VDP with the capacity health check limit reached?
          snekkalapudi Expert
          VMware Employees

          See if you can delete some of your older backups and give enough maintenance window for the garbage to clean up.

          Note -

          If you are using VDP 5.1, then increase the blackout window temporarily - For garbage collection (cleanup deleted backups from storage)

          If you are using VDP 5.5, then just increase the maintenance window temporarily.

           

           

          On longer term you may have to rework on the retention policies so that you'll not end up with too many backups.

          • 2. Re: How to freeing up space in VDP with the capacity health check limit reached?
            racom Enthusiast

            Thanks for an aswer.

             

            I'm using VDP 5.1.10.32 so I've increased the blackout window. Must I wait to next beginning of the blackout window or can I push garbage collection anyway? I can see it like failed this morning:

             

            505923 2013-12-05 08:15:06 CET ERROR   4202  SYSTEM   PROCESS  /      failed garbage collection with error MSG_ERR_DISKFULL

             

            I'm not sure if "ConnectEMC is not running." error is related to it too. Can I try to start it running "dpnctl start mcs"? I'm not familiar with Avamar too much.

            • 4. Re: How to freeing up space in VDP with the capacity health check limit reached?
              racom Enthusiast

              Thanks again. But it looks I was caught in a trap.


              Checkpoints are validating but old:

               

              root@vm-vdp:~/#: cplist

              cp.20131202100237 Mon Dec  2 11:02:37 2013   valid rol ---  nodes   1/1 stripes   1916

              cp.20131202103632 Mon Dec  2 11:36:32 2013   valid rol ---  nodes   1/1 stripes   1916

               

               

              No hfscheck yet (since last reboot?) and gsan status is degraded:

               

              root@vm-vdp:~/#: status.dpn|less

              Čt pro  5 15:26:42 CET 2013  [vm-vdp.racom.cz] Thu Dec  5 14:26:42 2013 UTC (Initialized Wed Nov

                7 19:55:37 2012 UTC)

              Node   IP Address     Version   State   Runlevel  Srvr+Root+User Dis Suspend Load UsedMB Errlen

              %Full   Percent Full and Stripe Status by Disk

              0.0   192.168.20.17 6.1.81-130  ONLINE fullaccess mhpu+0hpu+0000   2 false   0.28 3594 27424967

              62.6%  62%(onl:644) 62%(onl:648) 62%(onl:642)

              Srvr+Root+User Modes = migrate + hfswriteable + persistwriteable + useraccntwriteable

               

              All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu+0hpu+0000)

              System-Status: ok

              Access-Status: admin

               

              No checkpoint yet

              No GC yet

              No hfscheck yet

               

              Maintenance windows scheduler capacity profile is active.

                WARNING: Scheduler is WAITING TO START until Fri Dec  6 08:00:00 2013 CET.

                Next backup window start time: Fri Dec  6 20:00:00 2013 CET

                Next blackout window start time: Fri Dec  6 08:00:00 2013 CET

                Next maintenance window start time: Fri Dec  6 16:00:00 2013 CET

               

              root@vm-vdp:~/#: dpnctl status

              Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)

              dpnctl: INFO: gsan status: degraded

              dpnctl: INFO: MCS status: up.

              dpnctl: INFO: Backup scheduler status: down.

              dpnctl: INFO: axionfs status: up.

              dpnctl: INFO: Maintenance windows scheduler status: enabled.

              dpnctl: INFO: Unattended startup status: enabled.

               

               

              I've tried to get GC in active state along to reply-5 but it looks like used capacity 96.4% is too high. I suppose GC not to start in the morning, am I right?

               

              root@vm-vdp:~/#: avmaint config --ava | grep diskrep disk

                disknocreate="90"

                disknocp="96"

                disknogc="85"

                disknoflush="94"

                diskwarning="50"

                diskreadonly="65"

                disknormaldelta="2"

                freespaceunbalancedisk0="30"

                diskfull="30"

                diskfulldelta="5"

                balancelocaldisks="true"

               

              root@vm-vdp:~/#: avmaint config disknogc=97  --ava

              2013/12/05-13:55:10.94029 [avmaint]  ERROR: <0949> Command failed because these config values do not meet the following criteria:

              2013/12/05-13:55:10.94040 [avmaint]  ERROR: <0001> 0 < diskwarning(50) < diskreadonly(65) < disknogc(97) < disknocreate(90) < disknoflush(94) < disknocp(96) < 100

              ERROR: avmaint: config: server_exception(MSG_ERR_INVALID_PARAMETERS)

               

              root@vm-vdp:~/#: avmaint config disknocp=99  --ava

              2013/12/05-13:55:41.90331 [avmaint]  ERROR: <0949> Command failed because these config values do not meet the following criteria:

              2013/12/05-13:55:41.90342 [avmaint]  ERROR: <0001> disknocp(99) <= diskfulldelta(5 -> 96.5) < diskfull(30 -> 97.0) < poolnocreate(20 -> 98.0) < 100

              ERROR: avmaint: config: server_exception(MSG_ERR_INVALID_PARAMETERS)

              • 5. Re: How to freeing up space in VDP with the capacity health check limit reached?
                basteku73 Enthusiast

                Hi,

                Have You resolved your issue ??

                I'm asking because I have the same problem, 96 % used capacity. Is there only solution to open a support case ??

                 

                Regards,

                Sebastian Ulatowski

                • 6. Re: How to freeing up space in VDP with the capacity health check limit reached?
                  racom Enthusiast

                  I've deployed new VDP and started new backup jobs. It was the most simple and fast way for me. I didn't open a support case.

                  • 7. Re: How to freeing up space in VDP with the capacity health check limit reached?
                    Andre443 Lurker

                    Hi,

                    I have the same problem with VDP 5.5.

                    There is solution to freeing up space ?

                    • 8. Re: How to freeing up space in VDP with the capacity health check limit reached?
                      racom Enthusiast

                      I'm afraid it's avalaible for VDPA only. Try to conntact support if deploying of new VDP isn't possible for you.

                      • 9. Re: How to freeing up space in VDP with the capacity health check limit reached?
                        wreigle2 Novice

                        FYI, this is still an issue in VDP 6.1.2.

                        My appliance hit 96.15% capacity. Backups failed. The 'Backup Scheduler' service will not start.

                        I have a case open with VMware. We have spent 2+ hours on a WebEx trying to get this thing back online. Finally escalated the case to EMC. Waiting on a resolution.

                        • 10. Re: How to freeing up space in VDP with the capacity health check limit reached?
                          AntonKr Lurker

                          Here goes the sad story with happy end about VDP 6.1.2.19. One unlucky day I have fed a couple of OLAP VMs to VDP and it choked failing to deduplicate properly. It have had ended up with nodes full at 98, 97 and 98 % respectively.

                          I do not know what exactly helped but here is a full list of my actions (add reboots as needed):

                          1. Delete big backups, run manual checkpoint, integrity check and garbage collection. Everything failed with MSG_DISK_FULL.

                          2. Rollback to earlier checkpoint, run manual checkpoint, integrity check and garbage collection. Everything failed with MSG_DISK_FULL.

                          3. Modify configuration threshold amounts to allow garbage collection run (as described above). Same errors when trying to set values to 99%.

                          4. Expand storage! Wizard went successfully. Only node1(/dev/sdc1) expanded . Run manual checkpoint, integrity check and garbage collection. Only checkpoint succeeded, hfscheck and gc failed with MSG_DISK_FULL.

                          5. At this point I have up and let the system run for a weekend.

                          6. On Monday it has magically repaired itself. Thare was a good checkpoint, good hfscheck and good gc. Admin mode persisted although.

                          7. Several times rebooting and running manual checkpoint including unmounting disks and running xfs_check helped at last. Fullaccess mode was there.

                          8. The last bit was using xfs_growfs on /dev/sdb1 and /dev/sdd1 to fix wrong size of nodes.

                           

                          Edit: I think that checkpoint rollback and waiting was enough...