Hello VMware community,
We observed an issue with some of our thin provisioned VMs and snapshots:
We have a powercli script that runs everyday and delete snapshots older than x days:
Import-Module VMware.VimAutomation.Core
Connect-ViServer -Server vcenter.company.local
# This script will delete snapshots older than x days
# Exception added for VMtest(snapshots of this VM are not deleted at all) request of xxx 19-04-2019
$vm_range = (get-vm | where-object name -NotLike "VMtest").name
$snapshots_of_vm_range = (get-vm $vm_range | Get-Snapshot)
# Adjust number of days below
$snapshots_to_delete = $snapshots_of_vm_range | Where-Object {$_.created -lt (Get-Date).AddDays(-5)}
if($snapshots_to_delete)
{
Remove-Snapshot $snapshots_to_delete -confirm:$false -RunAsync
}
We are aware that this script will delete all snapshots at once and therefore will put some pressure on our storage (i/o, latency etc.)
We have a full flash storage array (Pure Storage FA-m20r2) and it seems to hold on well.
Last Sunday the script ran and deleted 53 snapshots at once.
Some VMs got 2 snapshots deleted at once and I noticed that these VMs had their thin-provisioned VMDKs inflated to the maximum size after snapshots were deleted but still remained in thin provision and size didn't shrink.
Prior to the snapshot deletion job, VMDKs were matching the true size in Windows.
After the snapshot deletion job, VMDKs inscreased to maximum size.
VM properties:
Guest OS: Microsoft Windows Server 2012 (64-bit)
Compatibility: ESXi 6.5 and later (VM version 13)
VMware Tools: Running, version:10341 (Current)
All disks thin provisioned - dependent mode
Datastore properties:
Type VMFS 6.81
Drive type Flash
Thin Provisioning Supported
Space Reclamation Priority Low: Deleted or unmapped blocks are reclaimed on the LUN at low priority
Device Backing
Device: PURE Fibre Channel Disk (naa.624[...])
Capacity 5 TB
Partition Format: GPT
Drive Type Flash
Sector format 512n
Hardware acceleration is supported on all hosts.
Why thin provisioned disks inflated after last snapshot deletion with powercli script above?
I can reproduce the inflation of our VMDKs (in our POC, in my homelab etc.)
Windows Server 2012 R2 with latest patches
ESXi650-201903001 03/28/2019
Thin-provisioned VM
1) Take a snapshot of the thin-provisioned VM
2) Run Windows disk optimization tool, either with the gui, or with cli, defrag.exe with the same switches as those defined in the scheduled task
3) Delete snapshot
4) VMDKs inflate to maximum allocated size
The Windows disk optimization is set to run weekly in our company, not sure if it's by default in Windows or GPO...
Anyway since it's set to run automatically once a week, there's a good chance that it will run on VMs with snapshot, most likely when we patch our servers with Windows Updates (a snapshot is created before installing updates and it is deleted 5 days later).
Now, I'm going to check if there's a particular parameter (switch) in the scheduled Windows optimization task that can be remove to avoid VMDK inflation or I'm just going to disable the scheduled task.
I know that the defrag tool in Windows 2012 can do TRIM so it's nice to have but I wonder it's not already enabled with in-guest UNMAP.
What StorageFormat is shown for the harddisks on such a VM?
Do a Get-HardDisk.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
Thin
Ex.
StorageFormat : Thin
Persistence : Persistent
DiskType : Flat
Filename : [...].vmdk
CapacityKB : 75497472
CapacityGB : 72
ParentId : VirtualMachine-vm-356830
Parent : VMname
Uid : /VIServer=xxx\xxx@xxx.xxx.xxx:443/VirtualMachine=VirtualMachine-vm-356830/HardDisk=2000/
ConnectionState :
ExtensionData : VMware.Vim.VirtualDisk
Id : VirtualMachine-vm-356830/2000
Name : Hard disk 1
I suspect this might be due to the consolidation.
From the vSphere documentation
"If the Base Disk is non-preallocated (thin provision), the base disk will grow only on committing information from the snapshots. Each thin provision disk may grow up to its maximum size as mentioned in the Provisioned Size option in the virtual machine settings for the disk."
It depends on how much was contained in the snapshots.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
It depends on how much was contained in the snapshots.
I don't have that info unfortunately.
Also, it is not just one disk that got inflated. Those VMs have 3 thin disks and all 3 disks got inflated to its maximum size. Users didn't create that much of data on all 3 disks together to inflate the disks to their maximum size (they are actually new VMs that have been deployed recently).
And if I look now at the disk space usage in Windows (1 partition 1 disk) only 1 disk (C:) contains data, the other 2 disks are empty, yet in the datastore they are all at their max size.
I have never seen that kind of behaviour.
Out of curiosity, can you svMotion such a VM, and check what happens with the VMDK sizes?
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
Out of curiosity, can you svMotion such a VM, and check what happens with the VMDK sizes?
Nothing. They don't shrink. And I have the impression that it's related to the initial issue:
I found a VM with the same symptoms in our POC (inflated thin disks which don't shrink).
It's a W2K12 VM. The size of the E: drive in Windows is 59.9 GB and used space is 301 MB.
In the vCenter the size of the VMDK that contains the E: drive is 61 GB.
All my attempts to reduce the storage usage of that VM (shrink VMDK) failed:
First of all, on the host:
esxcli storage core device vaai status get -d naa[...]
naa[...]
VAAI Plugin Name:
ATS Status: supported
Clone Status: supported
Zero Status: supported
Delete Status: supported
(I don't know if it's important but it's blank next to VAAI Plugin Name)
Then, on the VM:
defrag E: /L
Optimize-Volume -DriveLetter E -ReTrim –Verbose
And on the host:
esxcli storage vmfs unmap -l mydatastore
Nothing happened, the size of the VMDK is still the same, it didn't shrink.
Then I did a storage vMotion: migrate from shared datastore to local storage and specify thin and migrate back to shared datastore and specify thin:
Size of the VMDK is stil lthe same, it didn't shrink.
The missing VAAI plugin name might be an issue.
At some point, during the snapshot removal, the VMDKs might have grown, and they are not shrinking automatically.
That is normal.
Perhaps have a go at the procedure outlined in Reclaim disk space from thin provisioned VMDK files in ESXi Server
I know it is another storage vendor, but the procedure should be the same for you.
If in doubt contact your storage vendor.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
I've been troubleshooting some more and found something:
I managed to shrink the VMDKs.
Running
defrag E: /L
after rebooting power cycling the VM (power on/power off) works and shrinks the VMDKs !
I tested on 2 VMs and I got the same positive result.
Don't know exactly what that means but I'll keep investigating...
Moving post to ESXi because the issue is not related to powercli script but disks consolidation.
I'm still investigating this issue.
I didn't find yet what causes our thin VMDKs to expand to their maximum size when the VM snapshot is deleted.
Ex. This is what the VM storage usage looks like when the snapshot is deleted and VMDKs expand:
VMware support has been contacted and confirmed it's an known issue, thin VMDKs expansion can happen if a task like defragmentation, anti-malware or windows updates runs at the same time when disks are consolidated.
However this doesn't seem to be our case, we do have an anti-virus Trend Micro OfficeScan, but real-time scan is disabled.
Windows Updates are deployed with SCCM. The VM is rebooted after windows updates installation and is normally in a clean state.
If someone can help / provide troubleshooting plan cause I'm running out of ideas...
Hi,
What disk controller are you using with Windows VMs ? pvSCSI or SAS ?
Have you tried sdelete ?
Refer to below forum post where this issue has been discussed.
Solved: Sdelete a thin provisioned Disk
I found interesting deep dive posts that may interest you to check further.
WHAT’S NEW IN ESXI 6.5 STORAGE PART I: UNMAP
IN-GUEST UNMAP FIX IN ESXI 6.5 PART I: WINDOWS
This may be a known issue since long time. Have a look at another forum post on same issue. If nothing works, better ask VMware support for a public facing KB for this issue and the fix.
Deleting snapshots on thin provisioned disks results in fully allocated vmdk (ESXi 6.5)
Edit: formatting
Hi,
What disk controller are you using with Windows VMs ? pvSCSI or SAS ?
LSI Logic SAS
Have you tried sdelete ?
No, I tried defrag /L (retrim)
Degrag /L works only after power cycling the VM (power off, power on). Space is reclaimed, I can see the VMDK shrinking back to normal size.
A standard reboot doesn't make degrag /L work.
I found interesting deep dive posts that may interest you to check further.
Thx for the links, I actually found them before your comment and went over them. The thing is we are using the latest ESXi 6.5.0, 13004031 and we should not be affected by misaligned UNMAP commands.
This may be a known issue since long time. Have a look at another forum post on same issue. If nothing works, better ask VMware support for a public facing KB for this issue and the fix.
Deleting snapshots on thin provisioned disks results in fully allocated vmdk (ESXi 6.5)
This post looks very similar to the situation we are facing (but the ESXi version is different).
I contacted VMware support and was told that it's a guest OS issue... a task is running during consolidation... I doubt, I disabled the antivirus and thin VMDKs still inflate to max size after snapshot deletion.
As said above I know how to shrink back the VMDKs, defrag /L after power cycling the VM, which is not ideal.
I'm looking for a fix to stop VMDKs from inflating to max size after snapshot deletion. Because if I can fix it before inflation takes place I don't need to recover the empty space.
Can you do test with a windows based vm with pvSCSI controller and try to recreate the issue?
This will help us confirm if the issue is with controller or vm since I read in few posts (link shared in previous post) that with pvSCSI controller, windows server (>=2012) OS is capable to send TRIM commands to ESXi which may help.
Also, can you check the affected windows based vms and list the common third party software installed on these ?
Any other common windows based feature or role which was enabled/configured recently around time when you started facing this issue ?
Hi pragg12,
Thank you for your support.
I did another test, which I hope will help with the investigation:
I powered off one of the VMs with snapshot I knew it might inflate.
So the VM is powered off, I then do a delete snapshot. Thin VMDKs inflated to maximum size!
I suppose this test will exclude an issue with the guest OS?
This certainly excludes the possibility of any interference from guest OS in VMDK size inflation and gives a strong option to you to approach VMware support on same existing support case.
Do keep us updated on how this unfolds further.
Hi lulu62
Do you have any further update on this issue ?
Yes I do.
I managed to identify what's causing the inflation of our VMDKs and I have to come up with an action plan to stop it.
I will post more this evening when i'm back home.
I can reproduce the inflation of our VMDKs (in our POC, in my homelab etc.)
Windows Server 2012 R2 with latest patches
ESXi650-201903001 03/28/2019
Thin-provisioned VM
1) Take a snapshot of the thin-provisioned VM
2) Run Windows disk optimization tool, either with the gui, or with cli, defrag.exe with the same switches as those defined in the scheduled task
3) Delete snapshot
4) VMDKs inflate to maximum allocated size
The Windows disk optimization is set to run weekly in our company, not sure if it's by default in Windows or GPO...
Anyway since it's set to run automatically once a week, there's a good chance that it will run on VMs with snapshot, most likely when we patch our servers with Windows Updates (a snapshot is created before installing updates and it is deleted 5 days later).
Now, I'm going to check if there's a particular parameter (switch) in the scheduled Windows optimization task that can be remove to avoid VMDK inflation or I'm just going to disable the scheduled task.
I know that the defrag tool in Windows 2012 can do TRIM so it's nice to have but I wonder it's not already enabled with in-guest UNMAP.
I must be missing something here - so just curious ...
Why are you surprised that running defrag in the guest inflates thin provisioned flat.vmdk , delta.vmdk or sesparse.vmdk ?
Any move-file operation obviously inflates those mentioned vmdks - and defrag is a large scale move operation.
So if you decide to use thin provisioning you should not use defragmentation tools as that is contraproductive.
hello continuum,
In our environment Windows has a weekly scheduled defrag task. I don’t run defrag myself but windows does. To be confirmed if this weekly defrag task is created by default in Windows or not.