VMware Cloud Community
lulu62
Enthusiast
Enthusiast
Jump to solution

thin provisioned VMDKs inflated to max size after snapshot deletion job ran (powercli)

Hello VMware community,

We observed an issue with some of our thin provisioned VMs and snapshots:

We have a powercli script that runs everyday and delete snapshots older than x days:

Import-Module VMware.VimAutomation.Core

Connect-ViServer -Server vcenter.company.local

# This script will delete snapshots older than x days

# Exception added for VMtest(snapshots of this VM are not deleted at all) request of xxx 19-04-2019

$vm_range = (get-vm | where-object name -NotLike "VMtest").name

$snapshots_of_vm_range = (get-vm $vm_range | Get-Snapshot)

# Adjust number of days below

$snapshots_to_delete = $snapshots_of_vm_range | Where-Object {$_.created -lt (Get-Date).AddDays(-5)}

if($snapshots_to_delete)

{

Remove-Snapshot $snapshots_to_delete -confirm:$false -RunAsync

}

We are aware that this script will delete all snapshots at once and therefore will put some pressure on our storage (i/o, latency etc.)

We have a full flash storage array (Pure Storage FA-m20r2) and it seems to hold on well.

Last Sunday the script ran and deleted 53 snapshots at once.

Some VMs got 2 snapshots deleted at once and I noticed that these VMs had their thin-provisioned VMDKs inflated to the maximum size after snapshots were deleted but still remained in thin provision and size didn't shrink.

Prior to the snapshot deletion job, VMDKs were matching the true size in Windows.

After the snapshot deletion job, VMDKs inscreased to maximum size.

VM properties:

Guest OS: Microsoft Windows Server 2012 (64-bit)

Compatibility: ESXi 6.5 and later (VM version 13)

VMware Tools: Running, version:10341 (Current)

All disks thin provisioned - dependent mode

Datastore properties:

Type VMFS 6.81

Drive type Flash

Thin Provisioning Supported

Space Reclamation Priority Low: Deleted or unmapped blocks are reclaimed on the LUN at low priority

Device Backing

Device: PURE Fibre Channel Disk (naa.624[...])

Capacity 5 TB

Partition Format: GPT

Drive Type Flash

Sector format 512n

Hardware acceleration is supported on all hosts.

Why thin provisioned disks inflated after last snapshot deletion with powercli script above?

Reply
0 Kudos
26 Replies
lulu62
Enthusiast
Enthusiast
Jump to solution

Also, I have no issues when defrag runs in a thin-provisioned VM with no snapshots.

Reply
0 Kudos
wenqi22
Contributor
Contributor
Jump to solution

Hi Lulu62,

I have exactly the problem now. My original used space was shown as less than 120gb. After deleting all 3-4 snapshots, my servers' used space showed as 3.3tb and 4.5tb each. I did a deleting of snapshots on 2 VMs.

Both VMs had a C: and E:. with C: 500GB and E: 4tb. But both disks are thin provisioned.

I read you did a defrag to resolve this. Do you mean you deleted all snapshots, did a defrag on E:, power cycle the servers and the disk space was returned?

Many thanks and hope for your reply.

Reply
0 Kudos
wenqi22
Contributor
Contributor
Jump to solution

I tried to power off the VM, power on and did a defrag E: /L but the size is still the same. any help?

Reply
0 Kudos
wenqi22
Contributor
Contributor
Jump to solution

Issue: After deleting ALL snapshots, thin provisioned VMs disk space used becomes utilizing whole of what is provisioned.

I managed to resolve by doing below steps.:

1. Power off VM

2. Power on VM.

3. Do a defrag on C:\ and E:\.

4. Take a snapshot.

5. Delete all snapshots. This step must be done immediately after step 4.

Space back to normal.

Reply
0 Kudos
dozoekgr
Contributor
Contributor
Jump to solution

I first saw this problem about 2 years ago (when in-guest UNMAP became a thing and was fixed in 6.5U1) but couldn't pinpoint the issue back then as it appeared very rarely and I couldn't reliably reproduce this on Windows.

Today I got bit again, this time on Linux. A VM added a new quite large disk to VM, that was coincedentally almost immedialty snapshoted by backup software. Admin then created a file system (that UNMAPs/TRIMs whole block device). A few minutes later, snapshot was deleted and new disk grew to full size, very nearly exhausting datastore.

I can reliably reproduce this

  • Add disk to VM
  • Snapshot
  • Create file system
  • Delete snapshot - VMDK grows to full size

fstrim will clean up but if you snapshot and fstrim or recreate file system and once again disk will grow.

As snapshot delete is not cancelable operation, it's quite a dangerous bug/edge-case if thin VMDK grows to larger size than datastore.

I didn't notice anyone starting a SR so I'm going to create one to investigate further.

Edit: it seems to be documented expected behavior as this came up during SR creation search: VMware Knowledge Base

Reply
0 Kudos
JDukes10
Contributor
Contributor
Jump to solution

Hi,

This issue is still happening to us with the latest 6.7 installed. I can not believe that vmware has not fixed this still. We have had this happening to us in 2 separate networks. Exactly as describe. Is there a new way to fix this? The VM disk provisioned is larger than the datastore so I can not consolidate the VM!

Thanks,

Reply
0 Kudos
Steeve_Savard
Contributor
Contributor
Jump to solution

Hello everyone.

looks like it is a real challenge that Thin drive stuff.

i have some questions since i am experiencing the same issue BUT

i dont see this problem when i revert to snapshot AND

only after 3 snapshot creation and deletion i see the problem (well.. .so far). AND

i only see the problem when i delete the 'FIRST" initial snapshot (the one just after the "Root one.

also.. is the vCenter your running a VM it self ? build of the OVA/OVF ? the reason i am asking is that that is what i am presently running (vCenter as a VM) and i can see that this VM has a hardware version of 10.... while some of the VM's i run have HWV of 10, 13 and 14... that said i am not sure if this is a problem ?? honestly i think it could be sine if the vCenter (VM build) is not build at the same (or better) version of the running VM's (on the managed hosts) it (my own opinion) could be an issue.. again i could be wrong. that said, it is possible that the vCenter "build" beeing at version 10 may not affect the way it "creates" VM's at HWV 13 or 14 on managed host.. that is a good question..

on thing is for sure, now i will take a close look at the VM creation process when selecting what compatibility it needs.

originaly my vCenter was at 31000 ... now at 48000... so .. again... not sure if the fact that ORIGINAL 31000 puts the bar for the future.... HWV wise and if it will change with vCenters build updates.

i also have to keep in mind that we (some time) build VM based on template and that if HWV is in fact a VMware "challenge" (not to say known issue).. a lot of work will be into play...

lastly... have anyone tried to "migrate" the VM's from one host to another one and select "thin" when doing so... would this be possible since it is something you can use when migrating a VM from Thick to thin ?

 

Reply
0 Kudos