We are using 4TB thin provisioned LUNs on our netapp. I have a 4TB datastore mapped to this lun (snapshots/reserve are turned off on the netapp side).
I am testing space reclamation with the esxcli storage vmfs unmap command. I filled the datastore to about 95% full, so around 3.8TB. I then deleted half of everything, and the datastore in ESXi now shows 2TB free. So, I run the esxcli unmap command, and it recovers only about 100GB of space. If I run it again, it'll recover another 70GB. If I run it again it'll reclaim another 70GB. Etc and so on.
So my question is, why is the unmap not reclaiming the full 2TB of space I freed up, and I need to keep running the unmap command over and over and over?
maybe due to some failed unmount iterations ...can you please look to ESXTOP to device view (u) with "ao" filed's enabled.
Then check what values are in columns:
DELETE - total sum of successful unmap iterations
DELETE_F - unsuccessful UNMAP operations
MBDEL/s - MB deleted per second
In addition please post info about affected VMFS volume:
# vmkfstools -Ph -v 1 /vmfs/volumes/<volume label>
Message was edited by: vNEX
Did you run the command with default values for e.g. --reclaim-unit=xxx
Please see http://kb.vmware.com/kb/2057513 for details about command line options, and check what's supported/recommended by your storage vendor.
ESXTOP shows the following for my test lun device:
delete = 3702810
delete_F = 0
MBDEL/s = 0
Should I try watching esxtop while the vmfs unmap is actively running?
Here is the volume info.
VMFS-5.58 file system spanning 1 partitions.
File system label (if any): prodsan01aa_vmlun9_SAS
Mode: public ATS-only
Capacity 4 TB, 2.4 TB available, file block size 1 MB, max file size 64 TB
Volume Creation Time: Mon Oct 7 01:00:26 2013
Files (max/free): 130000/129674
Ptr Blocks (max/free): 64512/62788
Sub Blocks (max/free): 32000/31902
Secondary Ptr Blocks (max/free): 256/256
File Blocks (overcommit/used/overcommit %): 0/1728124/0
Ptr Blocks (overcommit/used/overcommit %): 0/1724/0
Sub Blocks (overcommit/used/overcommit %): 0/98/0
Volume Metadata size: 825131008
Partitions spanned (on "lvm"):
Is Native Snapshot Capable: YES
OBJLIB-LIB: ObjLib cleanup done.
We are using netapp, and I was unable to find a recommendation from them on what size to use for the units. I have tried running it though with the default 200, and other various sizes all the way up to 3000 with no change in the behavior I described in my original post.
Also, here is what the netapp reports for the LUN. Notice it says 3.3TB used. It was at 3.9TB but after running the vmfs unmap a bunch of times I've gotten it down to 3.3. But, it should says 1.6TB used.
lun show -v /vol/aggr1_vol0/prodsan01aa_vmlun9_SAS
/vol/aggr1_vol0/prodsan01aa_vmlun9_SAS 4.0t (4398314946560) (r/w, online, mapped)
Space Reservation: disabled
Multiprotocol Type: vmware
Occupied Size: 3.3t (3618815295488)
Creation Time: Sun Oct 6 20:58:42 EDT 2013
Cluster Shared Volume Information: 0x0
As mentioned snapshots are not enabled on the underlying volume on the netapp. Fractional reserve is also disabled.
We had a similar problem with ESXi 5.5 and 3PAR storage. What our support tech suggested was to run
esxcli storage vmfs unmap -l datastorename --reclaim-unit=999999
To see what is the max number for reclaim-unit (the command should fail and state that the max number is ...).
Then edit the previous command to include that max amount as --reclaim-unit, run it, and wait for it to finish.
In our case we saw a decrease of used storage on 3PAR LUN of approx. 400GB (out of about 4TB) in ~20 hours after unmap finished, and the process is still continuing. Results on other storage may vary
Thanks for the suggestion. I gave it a try, found the max blocks number, ran the unmap, and still the same result. I do have a ticket open with both netapp and vmware, so I'll respond here if a solution is found.
Were you able to determine the correct max number for the reclaim-unit parameter? If so, what came back? I ran the command
esxcli storage vmfs unmap -l datastore -n 999999
However, it did not error out and suggest a max number; rather it simply ran. this is a 3.75TB datastore, so can probably handle the large number (999999 x 1mb is ~1TB) but it was a bit disconcerting it didn't come back with a recommendation. Is there a way to properly determine or calculate this variable for NetApp hosted datastores/volumes?
Did you ever manage to find a solution for this? I am also having the same issue with our NetApp FAS8020 storage.
Would really appreciate if you have any advice on this as its driving me crazy! I myself have tickets open with VMware and NetApp and so far VMware have brushed their hands off to says its an issue with the storage.
I have tried all the options outlined here but get the same issues as you have seen previously.