VMware Cloud Community
Stevester
Contributor
Contributor

Opendedup - New deduplication product - Question

Good Evening Everyone,

A new open source project is out which is named Opendedup. The product has a file system that performs automatic inline and batch deduplication when data is copied or written to its file system. The Linux based file system is called SDFS. The product claims it can perform successful VMDK deduplication WITHOUT using the vStorage API.

My question is can a vmware virtual machine be successfully deduplicated without using the vStorage API?

Thanks

Steve

Reply
0 Kudos
6 Replies
AndreTheGiant
Immortal
Immortal

My question is can a vmware virtual machine be successfully deduplicated without using the vStorage API?

Those API are for backup operation.

One on the interesting function is the change block tracking that permit an "increamental" VM download.

I'm not sure that exist a native function to do a block "de-duplication"?!

You can dedupe a VM backup also on the backup server, also without those API but this mean that you have first to "download" the entire VM (like VCB mode).

Note that for running VMs there can be a second de-duplication level.

In this case if your storage can provide this function, then the VMs will benefit of deduplication (also on old ESX).

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
Stevester
Contributor
Contributor

So basically you can dedupe a vmdk file without using the vStorage API?

Thanks

Steve

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

Yes, you can dedupe the blocks of the vmdk at the storage level.

For example, by using NetApp.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
wila
Immortal
Immortal

Hi,

Without knowing anything about opendedup, but just going on what you described here, then the answer is YES, sure your VM can use deduplication techniques, but only on guest level on the current VM.

That's what you are saying, it utilizes its own file system which handles the deduplication.

So basically the OS is unaware of the virtualisation layer and does handle the deduplication within the file system.

It would work exactly the same if it was directly running on hardware without a virtualisation layer.

As the deduplication works on hardware, it will also work for VMDK files.

How well it does that at VMDK level is the question and I have no idea without testing.

The vStorage API works on a higher level as this and is normally not invoked from a guest OS directly, it is more common to be used from a management VM or host, such as a backup VM.

Hope this helps,



--
Wil
_____________________________________________________
VI-Toolkit & scripts wiki at http://www.vi-toolkit.com

Contributing author at blog www.planetvm.net

Twitter: @wilva

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
Stevester
Contributor
Contributor

That's just what I thought. I have suggested to the author of Opendedup, that the vStorage API should be used within the SDFS. (which is the deduplicated file system.) So I guess when backups are performed, then one would expect multiple VMDK files of lighter sizes? Or maybe that would depend on how the Opendedup system works.

Thanks

Steve

Reply
0 Kudos
Saturnous
Enthusiast
Enthusiast

Anyone considered this - use 2 ESX Hosts with local Storage and 2 Linux VM with DRBD and SDFS on a per NFS exportet Datastore. This would be a awesome cheap Cluster.

Reply
0 Kudos