6 Replies Latest reply on Feb 5, 2011 9:36 AM by lamw

    Deduplication with GhettoVCBg2

    nweeks Lurker

      Hi All,

       

      I've build a simple sender-side deduplicated backup system for ESX (vSphere4), but it runs on the ESX host console, so is limited by it's resource constraints.

      I've moved over to GhettoVCBg2, but I miss the fact that backing up my 200GB volumes moved only a hundred or so MB across the network per day.

       

      Is it possible to leverage the VMA, and somehow:

      Take the snapshot(this is easy)

      Mount the snapshot volume VMDK as a read-only filesystem in the VMA under /mnt

      Instead of using vmkfstools to copy the complete snapshot file, use my dedupe code against the locally mounted VMDK, and then copy it to NAS/NFS/RSYNC/etc.

       

      Been looknig at the raw mode connects in vmkfstools, and it's so close to how vmware's VDR system use to mount the snapshot images, but I can't quite work it out.

       

      Looking forward to your replies!

       

      Nige.

        • 1. Re: Deduplication with GhettoVCBg2
          lamw Guru
          Community WarriorsVMware Employees

          Hello,

           

          ghettoVCBg2 does not run on ESX(i), I believe you're referring to ghettoVCB. ghettoVCBg2 runs on vMA and leverages the vSphere API versus running directly on the Service Console for classic ESX or Busybox Console for ESXi. The actual transfer is still happening within the VMkernel on the host but operations are requested within vMA.

           

          To answer your other question about mounting a VMDK on vMA, it's possible and you should be able to use the VDDK (http://www.vmware.com/support/developer/vddk/) and a licensed version of ESX(i) to remotely mount. You may or may not get better speeds, it's use fuse to mount the remote VMDK. I've not really looked into this but it's possible.

          • 2. Re: Deduplication with GhettoVCBg2
            nweeks Lurker

            Yep, I'm aware that it runs inside the vMA, and it's quite handy not having to setup scripts inside each host(We're using ESX 4.0 on the hosts) I'm liking it so far!

             

            I'll go and have a look at VDDK, and see if it'll even work.  The code I have starts at byte 0 of the snapshotted file, and breaking it into N byte pieces, compares the SHA1's against the last known block in the backup location(querying the database at that end).

             

            There's plenty of other operations it does as well, but that's not really in the scope of this discussion.

            Anything to save copying the whole snapshot file off the SAN before I start comparing blocks will be a great benefit.

             

            Thanks for your input!

             

            Nige.

            • 3. Re: Deduplication with GhettoVCBg2
              lamw Guru
              Community WarriorsVMware Employees

              What you're discussing here is exactly what the vStorage API (vSphere API + VDDK) which uses a new feature called Change Block Tracking, it allows you track the blocks that since changed from the last backup and allows you to only backup the deltas. VDDK is definitely what you'll want and there's also a document on creating your own backup solution leveraging this technology - http://www.vmware.com/support/developer/vddk/vadp_vsphere_backup12.pdf

              • 4. Re: Deduplication with GhettoVCBg2
                vmbru Enthusiast

                Imagine...GhettoVCBg3 utilizing CBT.  YES!   .  One can hope.

                • 5. Re: Deduplication with GhettoVCBg2
                  nweeks Lurker

                  As an interim step it looks like I can leverage the vmware-cmd's setconfig and connectdevice features to add the VMDK from the VM I wish to backup, and connect it to the vMA as a non-persistent disk, so that the vMA can have block-level access to it.

                   

                  This is a start, until I re-learn enough PERL to look at the VDDK SDK - it's been 12 years since I last wrote any PERL.

                   

                  But yes, CBT might be handy. Have to wait for a rainy day.

                   

                  Nige.

                  • 6. Re: Deduplication with GhettoVCBg2
                    lamw Guru
                    VMware EmployeesCommunity Warriors

                    Well if you're looking to remotely mount VMDKs to vMA, VMware already built a suite of tools called VDDK (VMware Disk Development Kit) which in-conjunction with vSphere API = vStorage API for Data Protection which is how they're leveraging CBT functionality. The issue is how do you guarantee that your block level copy is in fact consistent/etc. there's quite a bit if you're going to re-implement what VMware already provides in their native API. It's not an easy scripting task and it's more of a programmatic approach that you'll have to take with implementing something complex such as this.