VMware Cloud Community
FM19999999
Enthusiast
Enthusiast
Jump to solution

The case for not using Dedupe and Compression

I was asked why would I not use Dedupe and Compression on an ALL-Flash 4 node cluster.

I could think of a host of reasons why I would use it but can't really think of many solid reasons.

To me, one would be that you are planning on initially getting your cluster with small SSDs which you plan on swapping out later. Once your larger better disks are in then you would want to squeeze every penny out of them, thus enabling D+C.

Another reason I could think of is saving write IO amplification. But that's not really a good point since that data would be cold and VMs wouldn't need access to it. Only very large sets of data would start to impact your IOps.

What other case am I missing here?

Sincerely,

Frank

0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello Frank,

There are impacts/considerations from a fault tolerance/recoverability and managability perspective:

"You cannot remove a single disk from a disk group. You must remove the entire disk group to make modifications."

"A single disk failure causes the entire disk group to fail."

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-3D2D80CC-444E-4...

Additional compute overhead of up to ~5%.

"The processes of deduplication and compression on any storage platform incur overhead and potentially impact performance in terms of latency and maximum IOPS."

https://storagehub.vmware.com/#!/vmware-vsan/vsan-space-efficiency-technologies/deduplication-and-co...

As to the extent of possible performance hit, I cannot find much similar to compare to this online but this blog would suggest it may be significant if using RAID1 as the FTM:

http://blog.chrischua.net/2017/03/16/comparing-performance-r5-vs-r1-with-and-without-compressiondedu...

However as pointed out in the comments of the above by author and elsewhere (spiceworks) the jury seems to be out on whether these forms of synthetic test are telling the full picture - additionally I workload type and/or IO size can be significant factors and I see no reference to what was used in the above tests.

Some other good examples of things that may need to be considered can be found here:

Storage and Availability Technical Documents

So yes, there are a few potential costs to pay for using dedupe+compression but in my opinion (based on working with ~100 clusters/customers per quarter) these are reasonable trade-offs - unless something specific is required that conflicts with any of the above points.

Bob

View solution in original post

0 Kudos
1 Reply
TheBobkin
Champion
Champion
Jump to solution

Hello Frank,

There are impacts/considerations from a fault tolerance/recoverability and managability perspective:

"You cannot remove a single disk from a disk group. You must remove the entire disk group to make modifications."

"A single disk failure causes the entire disk group to fail."

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-3D2D80CC-444E-4...

Additional compute overhead of up to ~5%.

"The processes of deduplication and compression on any storage platform incur overhead and potentially impact performance in terms of latency and maximum IOPS."

https://storagehub.vmware.com/#!/vmware-vsan/vsan-space-efficiency-technologies/deduplication-and-co...

As to the extent of possible performance hit, I cannot find much similar to compare to this online but this blog would suggest it may be significant if using RAID1 as the FTM:

http://blog.chrischua.net/2017/03/16/comparing-performance-r5-vs-r1-with-and-without-compressiondedu...

However as pointed out in the comments of the above by author and elsewhere (spiceworks) the jury seems to be out on whether these forms of synthetic test are telling the full picture - additionally I workload type and/or IO size can be significant factors and I see no reference to what was used in the above tests.

Some other good examples of things that may need to be considered can be found here:

Storage and Availability Technical Documents

So yes, there are a few potential costs to pay for using dedupe+compression but in my opinion (based on working with ~100 clusters/customers per quarter) these are reasonable trade-offs - unless something specific is required that conflicts with any of the above points.

Bob

0 Kudos