Re: NFS - Thin Provissioned Disks

davidjerwood · ‎01-18-2008

I was orginally running iSCSI datastores with thick vmdks. My default vmdk filename was "servername.vmdk" I added some NetAPP NFS storage and migrated my virtual machines.I understand that the default file type on NFS is thin, so using vmkfstools I converted the previously think vmdk files to thin provissioned vmdk files.I changed the filenames to "servername_thin.vmdk" and deleted the original vmdk. Great everything was working with thin provissioned disks.

However I have just migrated some virtual machines from one host to another host across NFS datastores. (Shutdown machine - migrated to new host - selected new datastore - started machine)

I have now noticed that the vmdk files some how have converted back to thick disks and some how even the filenames have converted back to my orginal naming convention and the vmx file changed to match.

What is going on ?

RussH · ‎01-18-2008

I've noticed the same behaviour. As soon as you clone, deploy (from a template) or migrate a thin provisioned vmdk on an NFS datastore - it converts back to thick. I think its an ESX bug and very annoying - thin provisioning is one of the reasons I choose NFS.

Log a service request with them, the more people that do will increase the likelihood of them fixing it.

Cheers

davidjerwood · ‎01-18-2008

RussH, What versions are you using. I am using VC2.5 and ESX3.02 (Just about to implement 3.5). I thought I had tested this and it worked when I was using VC2.01.

RussH · ‎01-18-2008

Running VMWare ESX 3.0.2 and VC 2.0.2 (update 2) on NetApp here......Havent tested on 3.5 yet.

davidjerwood · ‎01-18-2008

I have raised a SR with VMware and will update when I get something back, although if anyone know the problem or solution please post, as would like this resolved ASAP.

dalepa · ‎01-18-2008

That's the way it currenlty works... if you clone a vm or migrate to another datastore, the vms go thick... However since you are using Netapp, you can you can use A-SIS which will make the VMs much thinner than thin...

virtualOptics

bobross · ‎01-18-2008

Dalepa said:

However since you are using Netapp, you can you can use [A-SIS|http://viroptics.blogspot.com/2007/11/advanced-single-instance-storage-sis.html] which will make the VMs much thinner than thin...

How much does an A-SIS license cost? Is it licensed by the type of filer you own, or by the TB, or ??? I read you had to have a Nearstore license too, so you really have to buy two licenses. Thanks in advance for any advice.

dalepa · ‎01-18-2008

A-SIS is part of the Nearstore license which I believe is under $3K, but don't quote me... Your rep should be able to give you a quote. You need 1 license per filer and no other cost.

You vmware volumes will be reduced (calculate) by 50-80+% which will more than pay for the license...

More Information Here on A-SIS

virtualOptics

*

Table 1) A-SIS requirements overview.

Hardware

*

NearStore R200

FAS3020, FAS3040, FAS3050, FAS3070, FAS6030, FAS6070

FAS2020, FAS2050 (requires Data ONTAP 7.2.2L1)

IBM: N5200, N5300, N5500, N5600, N7600, N7800

*

Data ONTAP

*

Data ONTAP 7.2.2 or later

*

Software

*

nearstore_option (for all platforms except R200) license

a_sis license

*

Maximum Flexible Volume Size

*

FAS6070, N7800: 16TB

FAS6030, N7600: 10TB

FAS3070, N5600: 6TB

NearStore R200: 4TB

FAS3040, N5300: 3TB

FAS3050, N5500: 2TB

FAS3020, N5200: 1TB

FAS2050: 1TB

FAS2020: 0.5TB

*

Protocols

*

All file-based and block-based protocols supported by Data ONTAP

*

Applications

*

Refer to the "A-SIS Target Environment" section

bobross · ‎01-22-2008

dalepa said:

{color:#999999}A-SIS{color} is part of the Nearstore license which I believe is under $3K, but don't quote me... Your rep should be able to give you a quote. You need 1 license per filer and no other cost.
You vmware volumes will be reduced ([calculate|http://www.dedupecalc.com/]) by 50-80+% which will more than pay for the license...

Hmmm. I run 70 VMDKs that are roughly 10 GB each. If I reduce 5 GB (half) out of each VMDK, I save 350 GB. If the license is $2,000, I am not saving anything...I can get far more than 350 GB of disk for that $2,000. So it looks like while A-SIS may indeed reduce data, it certainly doesn't save me any money. Now, if the license was (say) $500, I'd be interested.

Thanks for the further information. I did a little reading on A-SIS but something doesn't add up. I understand it uses the WAFL checksum, which is a 16-bit quantity. That means it can identify 65,536 unique blocks of 4K. After that, every 4K will result in a hash collision, so it can't recognize any duplicates without doing a complete bit-for-bit comparison of the colliding 4K blocks - very slow. Doing the arithmetic, if it can index 64K blocks of 4K, that means only 256M of data can be indexed. That's not nearly enough. WAFL should do a 32-bit checksum. I also read that it doesn't dedup across flexvols. That's bad - it means I would have bunch of little islands of deduped data, but my entire system could still have N copies of (entire flexvols full of) duplicate blocks, where N is the number of flexvols. Again, that's not good. If I buy a license to dedupe, by george, I want to dedup everything, not just on a per-flexvol basis...

dalepa · ‎01-23-2008

Nearstore license may only be $500 for smaller filers.... i'm not sure...

We dedup 4TB volumes and 16TB is supported on the high end filers, so the math works out somehow.

Currently you must schedule a job to run per volume which isn't great, however we use 12 large 4TB volumes to hold all our 900+ VMs and backups, so it isn't a big deal. (set it and forget it)

We use the saved deduped space for added daily snapshots. Currently we get 21 days of snapshots for "free".

At max capacity, A-SIS will save us more than 48TB.

Netapp is working on aggregate level deduplication which should get even a higher level of deduplication and more space!

If you price 48TB of any netapp filer, ~$2000 for 48TB is "virtually free"

virtualOptics

bobross · ‎01-27-2008

dalepa said:

We dedup 4TB volumes and 16TB is supported on the high end filers, so the math works out somehow.

Yes, you can dedup large volumes; I am referring to the index depth. Beyond the index depth, the machine has to compare the 4K chunks bit-for-bit instead of comparing the indices, since there will be a collision everytime. It slows down the process, greatly; does not stop it.

Are your VMs and backups visible via NFS or SAN? I understand you use the saved space for daily snapshots, but that's all you can do with it. You can't use it for more VMs or files, for example. At least that's according to NetApp, so any experience you have to the contrary is much appreciated. It sounds like you know it well.

At max capacity (deduping/saving 48TB) how long does an A-SIS run take?

Also, do you plan to actually remove (paid for) disks from your filer since you have saved so much space? I certainly understand you are saving 'space' in terms of allocated, but you've paid for the disks from which the A-SIS routine runs, so are you really saving money? I ask because my CFO is wondering if NetApp will really take disks back and give a refund since A-SIS has saved (so much) space. If so, that's a winning business proposition for him.

dalepa · ‎01-27-2008

> At max capacity (deduping/saving 48TB) how long does an A-SIS run take?

A-SIS is an incremental process... it can take several hours initially on large volumes with data, however after the initial scan, only change blocks are deduped, so the process is quick... For example, we have over 48TB on one filer with 24TB free and the nighly process runs less than an hour per volume on the largest volume. You can schedule the a-sis to run at any time, weekends if you prefer... The more delta change blocks, the longer it takes.

The space that is saved is reallocated back to the aggregate, so other volumes in the aggregate can use this space. We have 16TB aggregates, with multiple volumes in each. We use NFS, so it's really nice to be able to resize our volumes on the fly.

A-SIS is real. It works, it's simple and saves you real $$. If we did not have A-SIS, our 24TB of a-sis data would be over 50TB and instead of being 50% full, we would be at 110% full and we would be purchasing another 25TB today.

If you are purchasing a filer today, adding the A-SIS license is a no brainer since it will only add a few % to you purchase and the ROI can be higher than 50% of your overall purchase price!

More A-SIS info here

virtualOptics

bobross · ‎01-27-2008

Thanks. So, you are saving $$ by postponing future purchases, not necessarily what you (already) have purchased. So, you have to purchase an amount of disk equal to or greater (probably 2x if you use thin due to the fractional reserve) to the actual need, ingest the initial data, then dedup it down. After that, you add new data daily and dedup nightly. I get it. Sounds like your filers aren't too busy at night doing production work, that's nice, my arrays are busy doing production 24x7 so I don't know if I could run A-SIS at night and get it done by the morning. But I'm glad to hear you can.

Using NFS makes sense; I read if you use SAN, you can't recover the space back to the aggregate for normal use, only snapshot (reserved) use.

In the future, though, if you wanted to migrate that data somewhere else, you'd have to undedup it, right? It sounds like A-SIS locks you totally into NetApp or forces you to undedup everything (for example, to back it up to tape for offsite or to another disk array). Hmmm, I'll have to think about that.

Thanks for all the info, though, glad to hear your story. To the original poster, sorry we hijacked the thread.

Nick_Triantos · ‎02-01-2008

Bobross,

You are postoning future purchases AND and increase current consolidation ratios on the storage side AND lowering the cost of storage (by increasing consolidation ratios). Like dalepa said, NetApp de-dupe is real and the beauty of it is that it can be applied to ALL data, not just backups and archives and is NOT a bolt-on solution. The other thing is that it acts as a "trash compactor" rather than a "shedder" so depending on the change rate of the data you don't have to run it every day and you get to pick when.

There's no 2X fractional reserve in block deployments anymore. That practice has been depricated since the introduction of ONTAP 7.1. It's X + Delta, with Delta being the rate of change of the

bobross · ‎02-02-2008

Thanks Nick. I have read some of your stuff - you work for NetApp, right?

Anyway, as for dedup on "blocks" (which I was told by another NetApp guy are really files in WAFL spoofed by Ontap as LUNs), I read that if you dedupe NFS, the space is returned for use in the aggregate, but if you dedupe blocks, that space is returned for use only in the reserved area, right? Or did I read that wrong (it was from a NetApp source)? Also, you cannot dedupe across aggregates, only within an aggregate, right?

About snap reserve space - thanks for the update, but even if it's X + delta, that means if I have a LUN of 500 GB, I have to reserve another 500 GB + delta, right? And that would be in addition to the fractional reserve I have to provide for thin provisioning of LUNs? I just want to know how much reserved space I need if I do 5 TB worth of thin LUNs and snaps of those LUNs.

Appreciate the fact that it is a 'trash compactor' - good analogy. I want to know more about what happens when I (try to) 'take the trash out' - i.e. replicate or copy that volume over to another (non-NetApp) array or to tape. Would I have to un-dedup everything?

Nick_Triantos · ‎02-02-2008

Hi bobross,

Yes I work for NetApp.

The begiging is always a good place to start.

So, LUNs are not files within the context of what a file is and means to most folks. LUNs have completely different attributes than files. LUNs have streams. Some streams contain data, some streams contain metadata. LUNs also have a VTOC (Vdisk Table of Contents) which contains the in-memory structures of LUNs. Furthermore, read/write access to LUNs follows completely different codepaths within DataONTAP than read/write access to a file.

Deduping an NFS filesystem is a little different than a LUN although the space savings are similar. The main difference is as to who "sees" and has access to these savings. When you dedup an NFS volume, the Storage Admin and the ESX server Admin can see the space savings. The reason for that behavior is because the array controls the filesystem in this case.

With LUNs the filesystem is controlled by the host. So when you dedup a LUN, the host filesystem has no clue as tp what has happened. So from an ESX admin's perpective, everything is as it was before. However, from a Storage Admin's perspective, it's different, in that there's now more available space that can be used for either more snapshots or for creating additional LUNs.

A NetApp volume has a few options that can be set. One of these options is called "fractional_reserve". That controls the amount of space reserved. At 0 you reserve 0% space. At 100 you reserve 100%. Given that we call for X + Delta, the fractional reserve should be set to 0. Delta. So setting the fractional reserve to 0 with de-dup, it returns the spacesavings to the volume rather than the reserve area.

Correct, you can not dedup across aggregate...for now. However, there have been changes in the upcoming 7.3 release in that the fingerprint file has been moved from the volume to the aggregate so that when you snap the volume you don't have the fingerprint file in the snap. no point for it really. Also we now won't be replicating the FP file either. Don't need it.

If you have a 500GB LUN and your daily change rate is, say 1%, and you want to take and keep online 7 snapshots, one per day, then your total change rate is 7%. So you need an additional 35GB. So the Volume size should be 535GB. The formula is X + Delta.

Cheers

dalepa · ‎02-02-2008

When you "take the compacted trash out" to another vendor's doorstep, you will end up with un-deduped data. Just copying the data over will un-dedup it... If you keep the "trash" in the Netapp house using snapmirror to replicate the data to another Netapp, the data stays compacted throughout the process and only dedup blocks get replicated.

virutalOptics

Nick_Triantos · ‎02-02-2008

Correct dalepa.

Although, with the upcoming 7.3 release you can de-dup someone elses disk by utilizing the V-series controller. You can also SnapMirror deduplicated volumes from a FAS system with NetApp disk to a V-series system fronting someone elses disk and vice versa. This functionality, was not previously available.The main reason was Qual cycles.

bobross · ‎02-03-2008

thanks Nick - that is very helpful. One thing still doesn't add up, though - you mentioned fractional reserve. If I have an existing LUN of 500 GB, where does that 35GB space exist to hold the snap redirect-on-write blocks? I thought if you snap a LUN, it went into fractional reserve, and that's why fractional reserve exists. Or am I confusing this with (volume) snap reserve? Or is "reserve" really just one space that contains both LUN fractional reserve and volume snap reserve?

Also, I read (VMware best practices) with thin provisioning LUNs, the LUN fractional reserve should be set to 100%. So - if the formula is X + delta for snapping normal (thick) LUNs, what is it for thin LUNs - 2X + delta?

thanks again - although I must say not being able to dedup across aggregates is a serious flaw IMHO.

Nick_Triantos · ‎02-04-2008

The only time the fractional reserve is used is when the volume is full. The idea of the fractional reserve (aka overwrite reserve) is to guarantee that block overwrites to blocks locked in snapshots will succeed I don't know your config, but one of the configs we recommend is set fractional_reserve to 0%, and then use policy space mgmg to autogrow the volume and/or autodelete snapshots when the volume used space reaches 98% (default).

Thin provisioning is a different animal and there are various configurations that can be deployed as well.

Dedup...we have to crawl before we walk, meaning that for the past year engineering paid close attention at to how customers have been using it. There are already changes made in the upcoming 7.3 release and more changes will be coming in future releases.