racom
Enthusiast
Enthusiast

How to freeing up space in VDP with the capacity health check limit reached?

Used Capacity on one of our VDP reached 96.46% and  Backup Scheduler was stopped. I'm able to start Backup Scheduler. But although I deleted many backups the used capacity is still the same. Is any way to freeing up space or should I better do rollback to the last validated checkpoint?

Tags (1)
10 Replies
snekkalapudi
VMware Employee
VMware Employee

See if you can delete some of your older backups and give enough maintenance window for the garbage to clean up.

Note -

If you are using VDP 5.1, then increase the blackout window temporarily - For garbage collection (cleanup deleted backups from storage)

If you are using VDP 5.5, then just increase the maintenance window temporarily.

On longer term you may have to rework on the retention policies so that you'll not end up with too many backups.

-Suresh
0 Kudos
racom
Enthusiast
Enthusiast

Thanks for an aswer.

I'm using VDP 5.1.10.32 so I've increased the blackout window. Must I wait to next beginning of the blackout window or can I push garbage collection anyway? I can see it like failed this morning:

505923 2013-12-05 08:15:06 CET ERROR   4202  SYSTEM   PROCESS  /      failed garbage collection with error MSG_ERR_DISKFULL

I'm not sure if "ConnectEMC is not running." error is related to it too. Can I try to start it running "dpnctl start mcs"? I'm not familiar with Avamar too much.

0 Kudos
snekkalapudi
VMware Employee
VMware Employee

Check reply-6 in this thread - https://community.emc.com/thread/116610

-Suresh
racom
Enthusiast
Enthusiast

Thanks again. But it looks I was caught in a trap.


Checkpoints are validating but old:

root@vm-vdp:~/#: cplist

cp.20131202100237 Mon Dec  2 11:02:37 2013   valid rol ---  nodes   1/1 stripes   1916

cp.20131202103632 Mon Dec  2 11:36:32 2013   valid rol ---  nodes   1/1 stripes   1916

No hfscheck yet (since last reboot?) and gsan status is degraded:

root@vm-vdp:~/#: status.dpn|less

Čt pro  5 15:26:42 CET 2013  [vm-vdp.racom.cz] Thu Dec  5 14:26:42 2013 UTC (Initialized Wed Nov

  7 19:55:37 2012 UTC)

Node   IP Address     Version   State   Runlevel  Srvr+Root+User Dis Suspend Load UsedMB Errlen

%Full   Percent Full and Stripe Status by Disk

0.0   192.168.20.17 6.1.81-130  ONLINE fullaccess mhpu+0hpu+0000   2 false   0.28 3594 27424967

62.6%  62%(onl:644) 62%(onl:648) 62%(onl:642)

Srvr+Root+User Modes = migrate + hfswriteable + persistwriteable + useraccntwriteable

All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu+0hpu+0000)

System-Status: ok

Access-Status: admin

No checkpoint yet

No GC yet

No hfscheck yet

Maintenance windows scheduler capacity profile is active.

  WARNING: Scheduler is WAITING TO START until Fri Dec  6 08:00:00 2013 CET.

  Next backup window start time: Fri Dec  6 20:00:00 2013 CET

  Next blackout window start time: Fri Dec  6 08:00:00 2013 CET

  Next maintenance window start time: Fri Dec  6 16:00:00 2013 CET

root@vm-vdp:~/#: dpnctl status

Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)

dpnctl: INFO: gsan status: degraded

dpnctl: INFO: MCS status: up.

dpnctl: INFO: Backup scheduler status: down.

dpnctl: INFO: axionfs status: up.

dpnctl: INFO: Maintenance windows scheduler status: enabled.

dpnctl: INFO: Unattended startup status: enabled.

I've tried to get GC in active state along to reply-5 but it looks like used capacity 96.4% is too high. I suppose GC not to start in the morning, am I right?

root@vm-vdp:~/#: avmaint config --ava | grep diskrep disk

  disknocreate="90"

  disknocp="96"

  disknogc="85"

  disknoflush="94"

  diskwarning="50"

  diskreadonly="65"

  disknormaldelta="2"

  freespaceunbalancedisk0="30"

  diskfull="30"

  diskfulldelta="5"

  balancelocaldisks="true"

root@vm-vdp:~/#: avmaint config disknogc=97  --ava

2013/12/05-13:55:10.94029 [avmaint]  ERROR: <0949> Command failed because these config values do not meet the following criteria:

2013/12/05-13:55:10.94040 [avmaint]  ERROR: <0001> 0 < diskwarning(50) < diskreadonly(65) < disknogc(97) < disknocreate(90) < disknoflush(94) < disknocp(96) < 100

ERROR: avmaint: config: server_exception(MSG_ERR_INVALID_PARAMETERS)

root@vm-vdp:~/#: avmaint config disknocp=99  --ava

2013/12/05-13:55:41.90331 [avmaint]  ERROR: <0949> Command failed because these config values do not meet the following criteria:

2013/12/05-13:55:41.90342 [avmaint]  ERROR: <0001> disknocp(99) <= diskfulldelta(5 -> 96.5) < diskfull(30 -> 97.0) < poolnocreate(20 -> 98.0) < 100

ERROR: avmaint: config: server_exception(MSG_ERR_INVALID_PARAMETERS)

0 Kudos
basteku73
Enthusiast
Enthusiast

Hi,

Have You resolved your issue ??

I'm asking because I have the same problem, 96 % used capacity. Is there only solution to open a support case ??

Regards,

Sebastian Ulatowski

0 Kudos
racom
Enthusiast
Enthusiast

I've deployed new VDP and started new backup jobs. It was the most simple and fast way for me. I didn't open a support case.

0 Kudos
Andre443
Contributor
Contributor

Hi,

I have the same problem with VDP 5.5.

There is solution to freeing up space ?

0 Kudos
racom
Enthusiast
Enthusiast

I'm afraid it's avalaible for VDPA only. Try to conntact support if deploying of new VDP isn't possible for you.

0 Kudos
wreigle2
Contributor
Contributor

FYI, this is still an issue in VDP 6.1.2.

My appliance hit 96.15% capacity. Backups failed. The 'Backup Scheduler' service will not start.

I have a case open with VMware. We have spent 2+ hours on a WebEx trying to get this thing back online. Finally escalated the case to EMC. Waiting on a resolution.

AntonKr
Contributor
Contributor

Here goes the sad story with happy end about VDP 6.1.2.19. One unlucky day I have fed a couple of OLAP VMs to VDP and it choked failing to deduplicate properly. It have had ended up with nodes full at 98, 97 and 98 % respectively.

I do not know what exactly helped but here is a full list of my actions (add reboots as needed):

1. Delete big backups, run manual checkpoint, integrity check and garbage collection. Everything failed with MSG_DISK_FULL.

2. Rollback to earlier checkpoint, run manual checkpoint, integrity check and garbage collection. Everything failed with MSG_DISK_FULL.

3. Modify configuration threshold amounts to allow garbage collection run (as described above). Same errors when trying to set values to 99%.

4. Expand storage! Wizard went successfully. Only node1(/dev/sdc1) expanded Smiley Sad. Run manual checkpoint, integrity check and garbage collection. Only checkpoint succeeded, hfscheck and gc failed with MSG_DISK_FULL.

5. At this point I have up and let the system run for a weekend.

6. On Monday it has magically repaired itself. Thare was a good checkpoint, good hfscheck and good gc. Admin mode persisted although.

7. Several times rebooting and running manual checkpoint including unmounting disks and running xfs_check helped at last. Fullaccess mode was there.

8. The last bit was using xfs_growfs on /dev/sdb1 and /dev/sdd1 to fix wrong size of nodes.

Edit: I think that checkpoint rollback and waiting was enough...

0 Kudos