VMware Cloud Community
minimega
Contributor
Contributor

Fast way to reclaime unused disk space from thin provisioned VMDK

Hello to all, I've a ESXi server running 5 VMs, all Windows 10/Server 2019.

Each VM has a single disk drive, thin provisioned, size 500Gb.

Datastore has 1TB disk space, shared between all the VMs.

Each VMs disk will use the real data size contained on it (about 20/30Gb), and not the full 500Gb size.

Once at year I will do some cleaning inside OS, deleting unused/temp files, service packs, windows updates and so on.. At the end of this proccess I always delete old/create new VM snapshot, so I always have a restore point in case of future problems.

Once the VM maintenance have completed, I would like to recaim and free the space originally used by the deleted files.

As explaind by the official procedure at https://kb.vmware.com/s/article/2004155 the solution is:

- delete all the snapshots to achive a single -flat.vmdk file (vmkfstools does not works with snapshots)

- "sdelete.exe -z c:" to fill the unused space with zeroes

- "vmkfstools -K" to free the zeroed disk space

Ok, it works, but when I "Delete All" snapshots, the thin provisioned disk grows to the full disk space (500Gb) and it takes a long long long time to complete (inside the OS space used is about 20-30gb). I should do it for 5 times (once for VM) and it takes some days to manually complete the server maintenance.

More, growing to 500Gb, this method will work until the total space occupied by the other VMs is less than 500Gb. When it will be more than 500Gb I will not be able to delete snapshots anymore, because there will be no more space to let the disk grows to 500Gb.

So, there is a different way to free unused disk space, avoiding that "Delete All" snaphosts let the disk grows to the full 500Gb size?

At the moment my workaround is using a VM with Clonezilla, cloning the VM disk (linking the one referring at the last (current) snapshot as source disk) to a new VM empty disk, thin provisioned type (destination disk). At the end of the copy, the destination disk .vmdk size is the real data size contained in the source disk, and not 500Gb. Then, I proceed with manual .vmx edit, pointing the VM .vmdk disk to the newly created one. Once verified the VM starts and is working file I can proceed with de deletion of the -flat.vmdk and -sesparse.vmkd files, freeiyng the space on the datastore.

Is this a wrong approach? It is very fast (the copy takes less than 30 min for VM), and the disk stress is less (it's a non-sense to grow the disk size from 30Gb to 500Gb filling it with zeroes when the minute after the same space will be released..).

What do you think about? What other solution you suggest?

Thanks a lot

0 Kudos
9 Replies
kastlr
Expert
Expert

Hi minimega,

 

how many free space did your datastore report prior you'll start your activities?

And how many snapshots are in place (per VM/ in total)?

Deletion of snapshots will temporarily increase disk usage, simply because the snapshot content will be consolidated to the base file.
During that process the snapshot is still in place, it will be deleted after consolidation.

This is by design, as it's the best way to guarantee the data is still valid even if an error occurs during consolidation.

Btw, VMware recommend to use snapshots for up to 72 hours.

 

If you would use multiple snapshots over a longer timeframe each snapshot could grow to the full vmdk size.

Beside the fact that such a design could consume more diskspace than really used be the Guest OS it would also have a negative impact on Guest VM performance.

 

It sounds like the reason for this problem is caused by the way you're using snapshots.


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
minimega
Contributor
Contributor

Hi kastlr, thanks for your reply.

I'm using 1 snapshot for VM, and the snapshot is deleted/created at the beginning of each year, when I do some maintenance to the OS filesystems. I want to always have a "fallback" snapshot in case of OS fault, and is good to me to restart from the year beginning snapshot. I cannot do every 3 days a "delete/create new" to merge current snapshot to base disk; is good for me once a year.

So, for example, I have:

  •  "*-flat.vmdk", thin provisioned, of a nominal size of 500Gb, filled with 50Gb
  • the "ls" command shows me a size of 500Gb, the "du" command shows me a size of 50Gb
  • then I create a new snapshot, so all the future FS changes will be stored on a different "*-000001-sesparse.vmdk" file
  • at the end of the year, the snapshot file growed about 10Gb, due to the Windows updates, new files stored and so on..

When I start the maintenance, ad the beginning of the next year, I power off the VM, so no extra snapshot is needed for the disk consolidation. From the snapshot management I do "Delete All" and starting from now, a long time is needed for the task to complete.
For 500Gb disk size it takes more than 10 hours!!!! At the end of the process I get a single "*-flat.vmdk" file (the "*-000001-sesparse.vmdk" has been deleted), but "ls" and "du" commands shows the same size, 500Gb!!!

More, from the VM details, "Hard Disk 1", proprety "Thin provisioned" switches from Yes to No after the "Delete all" snapshot task.

I have to start the VM, execute "sdelete -z c:" command, then power off and from ESXi SSH run the "vmkfstools -K" command. At the end of the proccess the du command shows a size of about 60Gb, that is the expected size.

Now, hope to have been clear about my problem: the "Delete All" task will take a infinite time to complete, and as result the .vmdk size grows to 500Gb. In the future, when all the VMs will increase the size of data stored, this task will run out of datastore space.

Now, why the task to "merge" original -flat.vmdk with the (one year) "*-000001-sesparse.vmdk" snapshot needs to grow to 500Gb and convert the thin provisioned disk to tick provisioned disk? My expected result is to "add (append?)" 10Gb current year's new data to the previous year's 50Gb data base disk, for a final task result size of 60Gb, not 500Gb!!

To avoid this, I'm using the CloneZilla way, wich in less that 15 minutes creates a new .vmdk of 60Gb starting from the "*-000001-sesparse.vmdk" disk as source. A bit of manual work in editing the .vmx file and voilà, work done with time saving and less disk stress.

Now I'm investigating in the possibility to clone the "*-000001-sesparse.vmdk" (and the "*-flat.vmdk" too) to a single new .vmdk using the command "vmkfstools -i source-snapshot.vmdk dest-flat.vmdk -d thin"

If you have any suggest please let me know.

Thanks.

0 Kudos
kastlr
Expert
Expert

Hi,

thanks for the detailed information.

 

In general, the vmdk shouldn't grow to the maximum with your procedure.

Delete all Snapshots and Consolidate Snapshots feature FAQ (1023657)

 

Please check the following article, sounds like it could solve your problem.

Thinly provisioned Virtual Disks inflates to a larger size during snapshot removal process (56608)

 

Regards,

Ralf

 


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
minimega
Contributor
Contributor

Hi, thanks for the reply.

I already seen the KB https://kb.vmware.com/s/article/56608 and disabled disk optimization, but I will get the results only next year 🙂 (this year's maintenance have been already completed).

Regarding KB https://kb.vmware.com/s/article/1023657  the following paragraph is not clear to me:

Delete all snapshots operation will commit every snapshot of the chain directly to the Base Disk(s) of the virtual machine. With this new algorithm:
  • If the Base Disk is preallocated (thick provision), no extra space is required for the Delete all operation. The Base Disk will not grow as it is preallocated or thick.
  • If the Base Disk is non-preallocated (thin provision), - my case - the base disk will grow only on committing information from the snapshots. Each thin provision disk may grow up to its maximum size as mentioned in the Provisioned Size option in the virtual machine settings for the disk.

And this let me think that base disk should only grow for the amount of data contained in the snapshot I'm deleting, and this is what I'm expecting. "may" grow up, or "will" grow up? Why my VMs disk are ALL growing up to the maximum size?

 

Note: Time taken in Converting a Thin provisioned disk to a Thick provisioned disk depends on the size of the virtual machine disks, performance of the underlying storage device and also if the VM is running on snapshot, the snapshots would also go through a consolidation process. 

But nobody asked to convert from thin to thick!! This is what i really don't want!!! Disk should remain thin provisioned. Here it seems to me that Snapshot deletion imply the base disk needs to be switched from thin to thick for this task, or I'm I wrong? And I'm really surprised that with disk optimization active, the result is a thick disk of a full VM disk size (500gb) and disabling it the result is a thin disk of 60Gb.. I think there are other variables in play that are unknown to the "common mortals" 🙂

0 Kudos
kastlr
Expert
Expert

Hi,

 

the reason why your vmdks grow to their maximum size is explained in that KB article.

kastlr_0-1640089006772.png

Using sdelete could send Unmap Commands to any track which doesn't contain data.

As those tracks will be handled as zero filled data, the "Delete All" snapshot consolidation will write zeros to each track on the base vmdk.

When creating a vmdk with eagerzeroed thick each track will also be overwritten with zeros, that's why your thin vmdk will end as a thick vmdk after the consolidation is completed.

This is by design, and might be one of the reasons why VMware doesn't recommend to use long term snapshots.

😉

 

Ralf


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
IPPDJ
Contributor
Contributor

I know this isn't really the answer to what you're asking directly but as mentioned long term snapshots aren't beneficial. 

Snapshots aren't backups or meant to be used as backups. Have you considered backing up to a service or even doing a local backup to a hard drive to keep on hand, or better yet, both?

Snapshots get big and bulky and hurt performance. 

 

I'm curious though, have you tried just cloning the VM instead of consolidating snaps?

0 Kudos
minimega
Contributor
Contributor

Hi Ralf, thanks for the reply.

Not sure to have well understand your explanation, I'm not so "technician" as you! 🙂

You spoked about "scommand" command, but I execute it only AFTER the snapshot has been deleted and disk consolidated, bacause it should be executed once you have a single -flat.vmdk disk without any snapshot.

And it sounds me a bit strange just now, that "disk optimization" is the root cause of this problem. Each year I start with a new snapshot (completely empty) where data starts to be registered in a sequential way, adding new files and the deletion is very uncommon. Suppose at the end of the year my snapshot increase of 10Gb a new data.. if "disk optimization" is running, the worste case is that files will be moved in a totally different area of the disk, but the maximum amount of data moved is always 10Gb, so snaphot should grow up at most 20Gb, and the consolidation process should complete, at most, in +20Gb to the base disk (Unmap space + 10Gb data), not in 500Gb..

In each KB I readed about this problem, I always found "should increase", "is growing".. it lets understand that the file increase a bit more than expected, but nothing regarding "disk grows up exactly to the full size of the disk".. if this is by design, why it's not clearly described in this way? 

However, I have my CloneZilla solution, and still use it until it works fine. Next year, with "disk optimization" disabled, maybe the Snapshot deletion will be really fast and without disk growing up.. let's will see!! 🙂

0 Kudos
minimega
Contributor
Contributor

Hi IPPDJ,

yes, I understand (now) that snaphots are not the way to "backup" and "secure" a VM state, but it's so fast and convenient that I can't stop making use of it! 🙂

Yes, I also have a OS backup image done with Veem backup, so I can restore the OS in case of damage or cryptolocker attack, but "snapshotting" the image, parking it in a "safe state", and using a new snapshot where all the mistakes can be done without breaking the original image is very fast and convenient.

It's a pity that this way to use snapshots will result in a disk growing problem.. however yes, Clonezilla solution works fine (and I mean also with other cloning software, like MiniTool, EaseUs...)

Waiting to confirm that disabling "disk optimization" will resolve (or reduce) the impact of the problem in the next year disk consolidation!

0 Kudos
kastlr
Expert
Expert

Hi,

 

the Windows disk optimization does also use TRIM (or Unmap) commands like sdelete, but is run on a weekly basis (default).

As Trim or Unmap SCSI commands doesn't contain payload (only the info which track/sector should be cleared) the SEsparse file wouldn't really grow, all it has to store is the track or sector which was cleared (i.e. update $mft only)

But when you run a snapshot consolidation those info will be used to send zeroes to the track/sector, like it would do when creating a vmdk using eagerzeroed thick format.

 

So if the track table (like $mft) inside the SEsparse file did receive Trim/Unmap commands to almost every track during the year the consolidation would wipe each track by sending zeros which 

  • would grow your thin vmdk to maximum size
  • would take awfully long as (to the best of my knowledge) writes are send sequential without using outstanding IOs 
  • as all tracks are zeroed out, this is identically to a thick vmdk and that's why it looks like being converted
  • to regain space you've to perform the vmkfstools again 

I'm pretty sure that next year the process should run smoother.


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos