VMware Cloud Community
Saturnous
Enthusiast
Enthusiast

Just a stupid thought on aligment .. or maybe not ?

I attented a NetApp event where they told me that NetApp address (or will address) the allignment problem by offering datastores which are artificial misalligned underneath to neutralize the misalligment into the vmdks. Also they stated that Snapshot Deltas as they show up in linked clones are also "misalligned".

Now my questions - how does a misalligned vmdk (which has a 63 sector offset) behave on a datastore when i create the VMFS volume manual using a 65 sector offset ? Wouldnt that not move all partition in the vmdks to a alligned boundary ? Or where is my mistake in this simple and stupid theory ?

Can someone do check this in a lab or comment in theory ?

0 Kudos
8 Replies
EdWilts
Expert
Expert

This doesn't make sense.  It's easier to fix the alignment issues (although it can take a LONG time with a lot of guests).

One serious issue this will create is that if you were to create a Windows 2008 or RHEL 6 vmdk on one of these datastores, data that was already aligned by default by the operating system would now be misaligned.

I see this creating more problem than it fixes.  I just bounced it of our NetApp support guy to see what he says.

.../Ed (VCP4, VCP5)
0 Kudos
DSTAVERT
Immortal
Immortal

If the datastore was created using the vSphere Client it will automatically be alligned.

-- David -- VMware Communities Moderator
0 Kudos
EdWilts
Expert
Expert

Here's what I heard from NetApp reseller:

So I actually did just find out about a way to manually offset a LUN on creation on the filer (obviously won't apply to NFS).  This would really only be useful if you knew the datastore was going to be dedicated to a certain OS type, and only if you couldn't fix it through alignment with the tools.  For instance, if you've got a ton of 2k3 boxes that are misaligned, and you can't take them offline for some reason, you could create an offset LUN as a temporary stopgap and storage vmotion over to it.

I'm not sure what he's talking about with the "vmfs manual using a 65 sector offset", I'm assuming he's talking about offsetting at the vmfs layer (horrible idea). 

So he's partially right, you can create a LUN on the filer that's manually offset to compensate for win2k3 or older linux guests.  However it's a *DANGER WILL ROBINSON*.  You're right, you would have to KNOW it was offset and only put the appropriate guests on it.

The other big thing with filers is to be at 8.0.1 or 7.3.6 where they no longer put you into the penalty box after a very small amount of misaligned I/O's.

If your concern is NetApp doing this to ALL LUNs/filesystems, the answer is absolutely not.  My understanding is they will add an option to the VSC to allow you to create a "win2k3 LUN" which would have the proper default offset for misaligned win2k3 boxes.  It will NOT be the default LUN setting though, you'd have to manually create it as a special use-case.

.../Ed (VCP4, VCP5)
0 Kudos
Saturnous
Enthusiast
Enthusiast

1. I dont talk especial about Netapp - this would be a universial solution for block based Storage - just Netapp came up with the "delta".

2. Can someone confirm that Deltas (and so linked clones) are not alligned ?

I imagine that Deltas do start with any metadata as a full disk (a full disk contain a MBR and a partition table - a delta should start with just the first bit of a productive block which was redirected - so 4k - if there is some metadata on the beginning i need the exact size of it in clusters 512bytes).

I think i have to do some reverse engineering to find out Smiley Happy - zero out a disk - open a snapshot - and put some easy to find pattern with hexeditor on several places (swap over 4k boundery - on a 32k boundery etc) on the raw disk.

3. I would use such "contra alligned" datastore only in View, Labmanager or vCD enviroments to host the linked clones.

0 Kudos
a_p_
Leadership
Leadership

Just my thought about misaligned disks/partitions (63 blocks/31.5 kb).

This only applies to Windows operating system like XP/2003 (and older) and a couple of other operating systems. Starting with Windows Vista/2008 the OS partitions are aligned to 1024 kb by default. To avoid misalignment for e.g. XP/2003, you may just create aligned partitions on which you then install the OS.

Regarding misaligned snapshot vmdks. Since delta files consist of modified data blocks (independent of the underlying OS), any misaligned block from the base virtual disk will be copied 1:1 to the delta file. With an aligned base disk - to be precise, aligned OS partitions within the base disk - you should not see any misalignment in the delta disks.

André

0 Kudos
Saturnous
Enthusiast
Enthusiast

"I'm not sure what he's talking about with the "vmfs manual using a 65 sector offset", I'm assuming he's talking about offsetting at the vmfs layer (horrible idea)." - WHY it would be a horrible idea ??

0 Kudos
Saturnous
Enthusiast
Enthusiast

Regarding misaligned snapshot vmdks. Since delta files consist of modified data blocks (independent of the underlying OS), any misaligned block from the base virtual disk will be copied 1:1 to the delta file. With an aligned base disk - to be precise, aligned OS partitions within the base disk - you should not see any misalignment in the delta disks.

Wether they are independent or they are not. I think to they are independent and wether are aware about the starting offset nor the ntfs (or whatever) allocation unit size.

Imagine you change the bytes 4098 to 8195 in a file on a 4k NTFS cluster sized Filesystem. You change here 4097 bytes but touch 3 NTFS Allocation Units (which can be anywhere by fragmentation) containing each 4 SCSI blocks - but you change only 6 SCSI blocks from disk perspective.

But what happends when this change hits the delta - aditional to the 4097 bytes changed data there must be the information to which 6 scsi blocks in the flat they corelate.

A vmdk can contain 4194304 (2^22) scsi blocks - the base and the delta - so information must be there which block in the delta maps to which block within the flat. You see you need allways 22 bits to describe the position of a block into a 2 TB - 512byte disk.


For over above example it means between 6 bytes (for a non fragmented file) and 18 bytes (fragmented into 3 pieces) metadata must be written into the delta - but HOW ?


For a FULL table to save the x:y coreleation between the blocks in the base disk to the delta disk you would need 11 GB (2^22 scsi blocks * 22 bits to save their 'real' position). This doesnt seems the way vmware took.

It would make sense to think in "ranges" of blocks - because 512 is nowadays a very seldom size for a file cluster allocation it would save in every case information. Saving a range of data would mean to save only 2 parameters the beginning of the mapped range and his size. So twice 22 bit.

As you cant forcast how much data will come in one batch you cannot calculate the size of the metadata - in the VERY unlikely worst case this will be again 11GB - this metadata must be generated and saved in runtime - so this metadate is saved SOMEWHERE into the stream - it would make sense to write it into the beginning so that whenever more writes hits the same area and the range gets bigger it does not need to be new allocated.

I can only guess - anyone knows a good source to look deeper into the architecture of a snapshot.

In the end i assume that every chunk of remapped data in the delta file will need also metadate which will mess the allignment up - in case they "reallign" after every metadata it would waste a lot of space oO - and i cant imagine they invest so much brainpower to avoid this. Bu i assume there is some indexing mechanism to avoid high search times - but where - a fresh allocated delta takes 64bytes according "ls -l".

The NetApp guy claimed to saw the "overhead" by the misalligment into his lab - but maybe he saw metadata updates and falsly interpreted them as misalligment.

0 Kudos
a_p_
Leadership
Leadership

If you are interested in the vmdk format, you can request the specifications at http://www.vmware.com/technical-resources/interfaces/vmdk.html

André

0 Kudos