VMware Cloud Community
Simon_H
Enthusiast
Enthusiast

Thin provisioning - size of next block?

I've always used thick disks for my virtual machines, simply because I don't want to cause disk fragmentation artificially.

However, I do think it would save me a fair amount of disk space across all VMs if I could use thin provisioning. If I did I'd want to do so in a way that when a disk expanded it took a large chunk of physical disk in one go. For example, let's say I have a Linux root disk requirement of 4GB. Typically I would create a 6 or 8GB thick disk, e.g.

vmkfstools -c 8G -d eagerzeroedthick -a lsilogic <vm_name>-root.vmdk

What I would like is to be able to create a thin provisioned disk of minimum size 4GB which then would extend, as space was required, in chunks of 1GB. In that way I'm thinking I could save some space, as well as give greater headroom for disks to grow, without causing lots of fragmentation. For those familiar with the traditional way an Oracle database worked I am thinking of an equivalent to the tablespace's initial extent size (i.e. the amount contiguously allocation at creation time) and the next extent size (i.e. the amount contiguously allocated when more space is required).

Is this possible? Is the theory behind this still valid, or does vmfs scatter data across a device anyway?

TIA!

Simon

Reply
0 Kudos
10 Replies
a_p_
Leadership
Leadership

I thought about this too some time ago, however there's no way you can do that out of the box.

What you can do is to create a thin provisioned disk and use a tool (e.g. "dd") to create and then delete a file with the initial disk size you want. Writing this file will allocate the blocks and also zeroes these blocks during the first write, so you will end up with an "eagerzeroed" thin disk with the minimum size you want.

About the block size you may want to read http://www.yellow-bricks.com/2009/05/14/block-sizes-and-growing-your-vmfs/. Reading the comments, it seems there's no real benefit from choosing larger block sizes.

André

Simon_H
Enthusiast
Enthusiast

Thanks for the reply André.

Another, higher maintentance alternative, would be to continue to use thick disks, but then more routinely expand the disk, e.g. by 1GB, as required. It presumably means you'd have to have the partition you'd want to expand at the end of the disk (or at least without having to do a lot of faffing around).

I did read the post you suggested, though the block size debate didn't really relate to contiguous vs fragmented storage as far as I could tell. Satyam Vaghani's comment did make me start wondering though if setting the growth increment hasn't yet been implemented in such a mature product, does this mean it has no benefit (for some other subtle reason)?

Simon

Reply
0 Kudos
a_p_
Leadership
Leadership

From what I read so far, it looks like the fragmentation impact is minimal, since the VMDK increments are not like in the OS itself (4k, 32k, ...) but in MBs. In addition to this, the fragmentation inside the OS's filesystem itself may cause more overhead than the one in the VMFS.

André

PS: I asked these questions myself in http://communities.vmware.com/message/1484310, however no responses so far Smiley Sad

Reply
0 Kudos
Simon_H
Enthusiast
Enthusiast

Very interesting - you had exactly the same thought a year ago Smiley Happy

Maybe for smaller OS files there wouldn't be much benefit, but for larger files, like database ones, there might be, especially where sequential access is concerned. What worries me is the layering - fragments at the OS level on top of fragments at the VMDK/VM level, fragment alignment etc. Trying to fathom out what's going on under heavy load would be very difficult indeed.

I suppose for bigger systems people carve up storage on the SAN into suitable LUNs and then pass the whole device through to the VM - this then means the VMFS question is irrelevant. Perhaps that's why there's not much interest.

I am wondering about the 8MB block size for datastores though - it's hard to see any downside (except for log and config files wasting a bit of space, though that could be managed with a different datastore if you could be bothered).

Simon

Reply
0 Kudos
a_p_
Leadership
Leadership

I am wondering about the 8MB block size for datastores though - it's hard to see any downside

According to different block posts on yellow-bricks and others, there is no downside in using an 8 MB block size.

André

Reply
0 Kudos
mcowger
Immortal
Immortal

Honestly, though, think about how your VMDKs are laid out on disk of your array.  For many arrays (including the 3PAR & EVA systems I'm most familiar with), a more fragmented filesystem will actually result in better performance, due to more physical spindles being used in the servicing of an IO.

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
Simon_H
Enthusiast
Enthusiast

Well, in my case, my disks are direct attached and each device is a RAID 0 striped disk pair. So I'm pretty sure fragmentation won't improve things in this case.

On a larger scale, I'd suggest it's usually up to the storage administrator to carve a LUN from a disk group distributed across an appropriate number of spindles. If I remember correctly an EMC stripe width is typically 64kB so a 1MB write will lead to 16 physical writes (albeit they should be cached and batched up) and 8MB could be 128 writes. In other words, if a disk group consists of, say, 10 disks then a contiguous write would be spread across all spindles anyway.

Reply
0 Kudos
Simon_H
Enthusiast
Enthusiast

Anyone who's read this post and is thinking about vmdk mapping to LUNs, if they've not already seen them, might also be interested in these articles by Duncan Epping: http://www.yellow-bricks.com/2010/04/08/aligning-your-vms-virtual-harddisks/ and Vaughn Stewart (NetApp): http://blogs.netapp.com/virtualstorageguy/2010/04/raising-awareness-around-the-misalignment-of-data....

Reply
0 Kudos
depping
Leadership
Leadership

I wouldn't nother worrying about fragmentation from a vmdk perspective. that blocksize (1mb is the default) that is used is larger than most blocksizes the OS's use so chances of running into degraded performance are slim. On top of that you will have multiple workloads running on the same Host and accessing the same VMFS volumes and same array. IO will be random anyway.

There is no real downside to using 8MB blocks for VMFS, for the OS it won't make difference if you used 1MB blocks or 8MB blocks as the blocksize is used for the allocation of the disk and the size it zeroes out the blocks with but not for actual Guest OS related IO. Even for configuration files and log files it won't make a difference as VMFS uses 64k sub-blocks which will be used for the smaller files.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

Reply
0 Kudos
depping
Leadership
Leadership

Simon H: Also note that a 1MB write from a VMFS perspective only occurs when it does a zero-out of the actual block, for the rest it is the OS which is dictating the sizes of the reads and writes... it is not VMFS which does this.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

Reply
0 Kudos