It might have been. From reading the post and associated links, it sounds like the penalty is only on first write even without VAAI. With VAAI (I'm using EqualLogic so I have VAAI) that penalty is probably not something the average user would notice and in general, is not worth worry about. I'm better off keeping thin disks and saving space.
To do a test, I added a 100gig disk to a vm and marked it as eager zero. It took 5 seconds to create and I barely had time to find which vaai counter in esxtop was increasing. I also cloned a VM which took about a minute but I also saw vaai counters moving during the creation so I guess things are working as they should.
>> is there any validity to thinking that thick provisioned disks might have their blocks less spread out and hence, faster disk access?
Results will differ in a very wide range - just consider this simple example:
A newly created 10Gb virtual disk is partitioned for the first time: Effectively there will be writes to the first MB and to the last MB. So you get writes to 2 one-mb blocks.
Thick format: vmdk may get allocated in one piece. So there is one lookup in the vmdk-to-phsysical-location mapping table plus 2 one-mb writes. Distance between both writes is 10Gb.
Thin:vmdk will use 3 fragments - 2 allocated blocks plus one large reference to /dev/zero. Effectively there are 2 lookups required to find the 2 allocated blocks plus 2 small writes.
Now depending on the state of the VMFS-volume the distance between both writes will vary between just one MB and several TB - depending on the size of your datastore.
If you now do benchmarktest you probably get inconsistent results.
If the 2 needed one-mb-blocks are allocated next to each other you may get the best performance results for thin.
If the 2 needed writes are allocated with a distance of TBs you may also get very poor results for thin.
In this example results for thick would be just average.
I assume that this effects will spoil any serious attempt to produce a widely accepted benchmark-result.
Also the Thinprovisioning feature is unbeatable when it comes to buying the next hardware.
A company that decides to allow thin-provisioning may get away with a 10TB SAN - if the decision is to allow only thick provisioning they may have to buy a 30TB SAN.
So in this point of view thin - vs - thick is easy to decide : thin is so much cheaper.
How large would a performance disadvantage of thin need to be, to really make a difference in the sales decision ?
I guess often even a 20% performance disadvantage would not overrule the radically lower price of thin provisioning.
In my opinion this whole performance discussion : thin vs thick is moot and pure theory.
From my point of view (my job is the repair/recovery of dead VMs) it simply boils down to:
Do I regard this VM as disposable ? - and in case of a "NO" - do I have a tested and ready-to-use backup or replica for this VM ?
If the answer is unsure or No thin-provisioned vmdks should be avoided because of their significantly higher failure risk.
Unfortunately I see a lot of risky use of thin provisioned vmdks especially in small environments.
In environments large enough to expect failures as regular part of the daily routine and have a policy of "We always replace any production VM that fails - we never repair or try to recover" the higher failure rate can be tolerated.
Thanks for you comments.
You mentioned higher risk of failure for thin provisioned vms is interesting. Can you elaborate? In my last couple jobs the admins setup vms, almost always, as thin provisioned (both in dev and prod) and I've never seen any issues so I'm curious where this failure risk is coming from. Maybe I'm living more dangerously than I'm comfortable with and don't even know it...
Your question should be answered in a very long blog-article. Please forgive me if I answer in several posts - just as time allows.
For the record: due to my job I believe I see more problems with VMFS than most users who just run their own vSphere-environment.So I of course have a biased view.
From my point of view it would be tempting to make a statement like:
expected survival rate after one year:
Windows 2008 system installed on certified hardware: 98%
same system on thick provisioned eagerzeroed vmdk running on ESXi: 97%
same system on thin provisioned vmdk running on ESXi: 60%
same as before plus automatic backup by Veeam or similar: 50 %
I dont think those numbers are completely off but I dont have statistics to backup such a claim.
So I rather ask the questions:
- what happens when problems occur ?
- how well are powerfailure or similar problems handled ?
- how well does the system check for errors and how well can it repair themselves ?
- which problems can a user handle himself ?
- which problems can be solved by VMware support ?
- is there any documentation for troubleshooting ?
- are there 3rd party tools that can be used if a problem occurs ?
- how severe do the problems have to be to result in a complete loss ?
- does the filesystem itself offer any repair or selfhealing features ?
I do remote support for this problems since about 2007 - the last 4 years as a consultant for a VMware partner.
The experience I gathered in that time can be summarized like this:
- smallest errors in a thin-vmdk mapping table render the vmdk as unreadable
- smallest errors in a snapshot graintable render the snapshot as unreadable
- loss of the partitiontable for the VMFS-volume has to be expected when the system has a powerfailure
- for small and medium VMware customers calling VMware support for help with damaged thin vmdks, snapshots or VMFS-volumes usually is not worth the effort
- VMFS seems to have no redundant functions to fix small problems after a reboot
- the heartbeat functions that enable cluster access can not be resetted by the user - that means that ESXi often denies to use/read a volume even if the reason to do so no longer exists
- 99 % of the vSphere admins I talk to in my job do not have the skills required to fix even smallest problems with thin vmdks or snapshots
- most of the admins I talk to somehow compare VMFS with the behaviour of NTFS - most are of them are shocked when I tell them that there is no equivalent for chkdsk
So IMHO this aspects all sum up to:
- thin provisioned vmdks die without early warning
- the chance that a user can fix an error himself are almost non-existant
- trying repairs is a waste of time in most cases
- it has to be expected that in case mission critical data has to be recovered inside a predictable time frame results in an Invoice from Kroll Ontrack starting with 5000$ or more.
If my customers ask for recommendations for Thin/thick provisioning I think there is only one safe answer:
To be on the safe side thin provisioning should only be used when either:
- the VM is disposable - like a View-VM
- a solid and tested backup / or replacement policy is active so that the loss of a thin VM just becomes a calculated loss of a few hours worth of data
For thick provisioned VMs the story is very different.
A skilled admin can aquire the skills required to fix all problems that are caused by the vmdk-layer and the VMFS filesystem.
So for a skilled admin thick VMs have almost the same behaviour as a Windows-system running on physical hardware.
have to interrupt now - to be continued later ...