VMware Cloud Community
BillyK66
Contributor
Contributor

Deduplication / Compression - Thick Objects

Hello Everyone,

We  have a 500TB+ vSAN consisting of SSD capacity drives.  I am presently only seeing a dedupe/compression savings of around 12TB and a 1.04x ratio.

USED BEFORE: 329TB

USED AFTER: 317

The mass majority of the the usage is from Microsoft / Linux Cluster servers with disks formatted as thick provision eager zeroed. (Based on vendor recommendation)  The storage policy that the drives are assigned to do not have OSR set. (defaults to 0%)  Can someone please confirm whether or not that disks formatted as thick provision eager zeroed as well as thick provision lazy zeroed benefit from dedupe/compression?  I've done a bit of research and I've stumbled across a few articles that seem to imply that "thick objects" do no benefit from dedup/compression.  This would probably explain why our savings is so low if that's the case.

If they don't benefit from dedupe/compression, what are my options for reaping that benefit?  Do I need to rebuild / clone them and set the disk formatting to thin provision?  Being that we're using SSD, I'm thinking that eager zeroed probably isn't even necessary.

Thanks,

- Bill

0 Kudos
4 Replies
TheBobkin
Champion
Champion

Hello Bill,

Welcome to Communities.

If disks are set as Thick (of any option) at the VM 'Hard Disk' level then these will be Thick-provisioned Objects over-riding OSR rules set in the Storage Policy - these will not dedupe as all of the provisioned space is reserved.

You can verify that this is the case by looking at one of these data-Objects via cmmds-tool/objtool/RVC/Web-Client:

RVC:

# vsan.vm_object_info <pathToVM>

If they are truly Thick then these will have the property 'proportionalCapacity = 100' , the same as any Thick-provisioned set via OSR=100 would have.

Web-Client:

Cluster > Monitor > Capacity > Note the 'Over-reserved' data amount - this is reserved space beyond what has been written to disk by the Guest-OS.

"Do I need to rebuild / clone them and set the disk formatting to thin provision?"

This may depend on how this is applied. Do you have some amount of storage aside from this vsanDatastore that might be used as swing-storage for SvMotioning, changing disk-provisioning format and SvMotion back? (Yes I am aware that this is 2018 and there is *likely* a way of doing in place but I think space-reclaim might be an issue).

Bob

0 Kudos
BillyK66
Contributor
Contributor

Thanks Bob. Are you implying that if OSR=100% was set on a policy and we left the the vmdks at the default of thin provision, they also would not benefit from dedupe / compression?  I thought I had read that OSR needs to be set to either 0% OR 100% to benefit from dedupe. I’m just wondering if OSR NEEDS to be set to 0% (default) to benefit from dedupe.  We set thick provisioning  on most of our vmdks  so it sounds like we’re hurting ourself for deduplication.  It’s a bit confusing between the differences of assigning a thick format at the vmdk level versus choosing thin provision and setting OSR=100% at the policy level in regards to dedupe benefits.  Thanks again.

0 Kudos
BillyK66
Contributor
Contributor

I know it varies from site to site but can someone give me a ballpark number as to what we should expect for savings (percentage) from compression / dedupe?  Thanks!

0 Kudos
TheBobkin
Champion
Champion

Hello Bill,

"Are you implying that if OSR=100% was set on a policy and we left the the vmdks at the default of thin provision, they also would not benefit from dedupe / compression?"

Yes.

"I thought I had read that OSR needs to be set to either 0% OR 100% to benefit from dedupe."

No, any reserved space can't benefit from dedupe+compression.

https://kb.vmware.com/s/article/52839

"We set thick provisioning  on most of our vmdks  so it sounds like we’re hurting ourself for deduplication."

It sure does.

"I know it varies from site to site but can someone give me a ballpark number as to what we should expect for savings (percentage) from compression / dedupe?"

Unfortunately the only answer for this is 'how long is a piece of string?' :smileygrin: .

What ratio is achievable depends on the commonality of the data, the frequency of this common data and the quantity of this data per Disk-Group (as this functions on a per Disk-Group basis).

Generally I see between the range of 1.5x and 8x (the former being a general mixed environment and the latter being homogenous VDIs).

There are methods to pull information from the Dedupe Uniqueness stats of any Object but this is not something that can be done via the GUI.

"Thanks again."

Happy to help, feel free to use the buttons if you find anyones comments helpful

Out of interest - how many TB of data is showing as 'Over-reserved' in the Web Client?

Bob

0 Kudos