VMware Communities
LevaEng
Contributor
Contributor

Trim/Unmap support on GuestOS using FALLOC_FL_PUNCH_HOLE on HostOS (Thick provisioned vmdk)

Hi

I'm spinning multiple VMs on top of a ZFS filesystem dataset

Each VM has its disk fully allocated (Thick Provisioned), but given that ZFS support native compression (and deduplication of data too) the actual on-disk-size of each vmdk file is only as big as the written data (or even 1 / N with dedup)
Generally speaking this is what should happen on any modern FileSystem than support sparse-file / Fallocate/ Hole-Punching
(This setup is better than sparse vmdk because it have superior performance generally and extra far better storage efficiency if combined with ZFS native compression and on-line data deduplication )

This work good and well while the GuestOS write data, but then when such data is deleted it is never freed-up on the Host side because the HostOS can never know (if not hinted) that this range of the vmdk file has been "released" by the upper GuestOS

For this, at least on linux based OSes there is the fallocate( FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE ) that can "punch hole" in the file and free the not used space (logical size of the file stay the same but physical on-disk size shrink as needed)

But when configuring a VM on Workstation 17 with:
- Thick Provisioned Disk (single file)
- I/O controller : LSI Logic SAS
- Disk Type : NVMe

The GuestOS (Windows11) report that the disk do not have Trim support
But Trim support in this case should be supported and then converted in a fallocate call on the vmdk file by the HyperVisor (VMWare Workstation)

How can I can enable this behaviour? I already tried adding to the .vmx configuration file:
- nvme0:0.virtualSSD = "TRUE"
- disk.scsiUnmapAllowed = "TRUE"

But GuestOS is still reporting no Trim support

Thanks,
Luca

Also posted here 

Reply
0 Kudos
6 Replies
rm_bk
Enthusiast
Enthusiast

I've never seen "hole punch" work in a VM with thick provisioned disks with any virtualization technology.  Not with Workstation, not with ESXi, and not with KVM/QEMU. Thin provisioning has always been a requirement.  IMHO the best substitute is to write zeros within the guest (e.g. Microsoft SDelete).

Are you sure thin provisioning is meaningfully detrimental to performance?  My understanding is the only hit to performance is on initial allocation of a block; after this occurs the performance going forward [for this block] is the same.

Reply
0 Kudos
Technogeezer
Immortal
Immortal

TRIM/UNMAP does not appear to be supported for any VMware virtual disk technology. It certainly does not pass it through to the underlying host disks. The only way to reclaim unallocated space in a VMware hosted file system is to use VMware's shrinking capabilities, which tends to require defragging the file system, writing zeros to unallocated space, and then run the shrinking (many times using the vmware-vdiskmanger CLI). You have to take into account SSD wear if you are doing this process too often. 

And "modern file systems" tend to be copy-on-write, which means as they write new blocks or update old ones, they write the updated block to unallocated spaces on the disk. The original block is then either deallocated (returned to free block chain) or kept around depending on the number of other files and/or snapshots that reference the block.. That's great for SSDs but not so great for HDDs. Even though your VM may report it's using an amount of disk space, that's not what the virtual disk files consume because of not releasing previously used blocks on the host. 

 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
rm_bk
Enthusiast
Enthusiast

I'll have to experiment with this again. 3-4 years ago in my homelab I had a Windows VM on a thinly provisioned disk and I was able to watch it push unmaps down to storage in real time.  It was an automatic process, as in delete a large file inside the VM and watch the unmaps start flowing maybe 15-30 seconds later.  It only worked if Windows could detect that it was running on thinly-provisioned disk

I don't remember what storage I was using at the time; it was either iSCSI to FreeNAS or iSCSI to a Windows Server running the iSCSI target service.  I also don't remember if said Windows VM was on VMware Workstation or on ESXi.

But now I'm curious and I plan to run a few experiments...

Some light reading:
Thin Provisioning and Trim Storage Overview | Microsoft Learn
Plan and Deploy Thin Provisioning | Microsoft Learn
Thin Provisioning Performance Test - NTFS (LOGO) | Microsoft Learn
Thin Provisioning SCSI Compliance Test (LOGO) | Microsoft Learn

Of interest to OP regarding thin/thick performance:
Thin Provisioning Performance Test - RAW Disk (LOGO) | Microsoft Learn

Reply
0 Kudos
rm_bk
Enthusiast
Enthusiast

Confirmed working.. Server 2019 on thin vmdk on VMFS-6 on ESXi on a Pure Storage LUN presented via FC:

rm_bk_1-1693440455688.png

This occurred maybe 10 seconds after deleting a 96GB file. No intervention required... "It just works."

Now I need to repeat this experiment with VMware Workstation somehow. There's no esxtop to watch unmaps with...

Reply
0 Kudos
Technogeezer
Immortal
Immortal

That’s good news for ESXi - Pure Storgage is very, very good stuff and it’s nice that they have the integration with ESXi to take advantage of thin provisioning.

My experiences have been with Workstation and Fusion, so it’ll be interesting to see if you find that VMware has gotten it to work in the desktop hypervisor products. 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
LevaEng
Contributor
Contributor

Hi

Thank you for you answers
I will try to expand and clarify why thick-storage is better in this case

As support in the wild I can't verify right now but I'm 99% sure KVM/QEMU have TRIM support on GUEST and propagate it to the HOST (I used it before) (docs: driver -> discard)


About why not thin-provisioned disk?

My architetture stack is:

  • VMs: Windows 11 w NTFS 64K cluster size
  • Host: Workstation 17 on Linux
  • NFS over dedicated 10GBe
  • NAS: ZFS dataset w 64K record size & deduplication

With this solution I get the best performances from my system and a really good disk saving
Why: I found 64K sector is the sweetspot for my usage and it give good balance so that after deduplication and compression all VM disk stay nicely and hot in the ARC and L2ARC (ram and nvme cache respectively)

The problem with thin-provisioning is that I have no control over the allocation, granularity and most important alignment of it's internal chunks
If I put a thin disk on top of ZFS I'm putting a de-facto sparse datastructure on on top of a CoW filesystem, this create a really big write-amplification and also completely nullify deduplication given the fact that there is no more "sector/chunk" alignment from the Windows-NTFS to the actual storage-file

Why ZFS as backing storage?

  • Deduplication
  • Transparent compression
  • Automatic 0% performance penalty snapshot

I didn't use iSCSI on top of ZVOL mostly for simplify sysadmin operation for other in the team

The absence of TRIM/Discard/UnMap mean that block are allocated but never freed up
Yes you could enter every single machine and execute a SDelete on each of them and then at the end of the stack ZFS will recognise and free-up that chunk, but this IMHO is a really ugly hack


Hope this explain better why TRIM/Discard/UnMap support in important of Filesystem supporting FALLOC_FL_PUNCH_HOLE

Reply
0 Kudos