After the excitement of a large non-Platespin P2V project has settled down, or in hindsight when VI admins arrive on the scene to find a previous wave of Windows template or script based VM builds have resulted in hundreds (or more) VDIs or servers with misaligned partitions, the following questions get asked:
- Should I bother to retrospectively align partitions on my Win 2K3 or XP VMs?
- Is the mis-alignment actually having an impact on the storage array or VM performance?
- Is it relevant to align Windows OS partitions?
Couple of pre-reqs to understand if you are a newcomer to the scene:
- Of course, thankfully in the new era of Windows 2008 and Windows 7 we no longer need to worry about partition alignment as both diskpart and Disk Management tools do it for you (default to 1MB offset)
- Version 5 of VMware Converter has the option to "Optimize disk layout" (aka align) partitions as part of the conversion or V2V - NICE
So we come back to the original 3 questions. My personal views are as follows:
A1:
If you have misaligned VM partitions on NetApp storage (FC or NFS), then my answer to all 3 of the above is YES
NetApp WAFL filesystem uses a 4KB block size which for Windows default 4KB NTFS cluster size means most reads / writes in the guest OS will result in additional IO overhead on the storage array
NetApp also has a little documented threshold for "concurrent partial writes" (writes that result in split IOs at the WAFL FS level due to misaligned partitions). When this threshold is reached, the array will suspend IO until the outstanding "partial writes" are cleared - not nice. Am not sure whether there is an equivalent limitation with other arrays (?)
If you google - 'NetApp partition alignment partial writes' - you will get a host of related info from the likes of Duncan Epping and others.
The impact for NetApp NFS volumes (vs FC) is apparently greater still, and operations like caching and data de-duplication can have additional impact - or less benefit - in cases where a large number of misaligned partitions reside on the array in question.
Due to larger stripe sizes on EMC and other vendors arrays (usually 64K +), the impact of misaligned partitions may be significantly less to the point that a large retrospective partition alignment project may not be worthwhile, but you may want to target some of the high IO VMs, particularly if database or log partitions are misaligned.
Keep in mind that alignment of any Windows Dynamic disks/volumes should never be attempted. Most alignment tools will skip these but I've come across instances when that is not the case and we've had to revert to backups or snapshots in order to restore corrupted partition data after a failed alignment attempt.
A2:
This is something you can quantify by measuring the IOPS on all misaligned partitions over time, and giving this info to your storage vendor. The best tool for doing this is vSCSIStats because it details the number of IOPS and the IO size. If this number is large, you have NetApp storage, and vScsistats shows you the IO profile is largely made up of 4KB reads and writes (which it will generally be with Windows - at least on the OS partitions), then that's a bad combination that should definitely be addressed
A3:
In most environments the OS partition contains the Windows pagefile. If your VMs are low on memory, or run MS SQL or Java with over-ambitious memory reservations set, you may see peaks of sometimes hundreds or even over 1000 IOPS on these partitions. In these scenarios aligning your OS (or pagefile) partition is more important.
The same applies if you run bulk VM reboots or patching runs - during startup and patch installs etc you may see significant IOPS on the OS partition
Having summarised my view above, I've recently had the "opportunity" to perform some retrospective bulk alignments
We used the following tools:
- NetApp MBRAlign: cost effective (free with NetApp support), but only runs on NetApp and is more difficult to automate / script / schedule
- Quest vOptimizer: worked well but is relatively slow, and requires WMI access to Windows guest. Good for production / critical systems because it performs a full VM backup before alignment starts, and automatically rolls back to this backup if the alignment fails. So - provided you have capacity on your storage for the backups - and the IO load during alignment doesn't upset anything - you can safely schedule alignments during outage windows with some confidence that the worst case scenario will be a VM that is back up and running fine in the morning, but where the alignment did not actually complete and you'll have to run it again. Downside is that it's relatively costly (it does other stuff in addition to partition alignments such as free space zeroing for reducing the size of thin disks on the SAN), and currently there is no way to select which VMDK files get backed up which becomes a problem when you just want to align a 30GB OS partition, but have a 500GB data partition that is already aligned.
- Paragon PAT Cli (Partition Alignment Tool via command line): cost effective, overall faster than vOptimizer, and - using the WinPE embedded command line version in an ISO - can be automated using powershell scripts and does not require any interaction with the guest OS (e.g. no Windows auth required). Lacks the nice GUI and scheduling functionality in vOptimizer, but this can be worked around.
The attached sample script shows how this can be done (but with no automated rollback).
Only tested on vSphere 4.0x to date via Powershell 2.0 and PowerCLI 4.1
Use at your own risk, and not without fully understanding the logic
Pre-reqs:
- Purchase and upload the PAT Cli WinPE ISO to an appropriate shared datastore.
This should have a PAT Cli batch file embedded that will run the correct command to automatically align any mis-aligned partitons on the VM (skipping Dynamic volumes), and then reboot the VM back into the OS - Create the required input text file with one VM name per line
- Currently the script assumes that all VMs will have a CDROM drive device available. It would be easy to modify the script to add one for the duration if you have none in your environment, and remove it at the end
- Irrespective of the tool we use, we generally sVMotion a batch of VMs to a LUN from a dedicated disk subsystem in advance of an alignment run. If you don't have something similar, be carefull about how many alignments you run concurrently in terms of potential disruption to other systems using that storage.
Script:
- Connects to a specified vCenter instance
- Reads a batch of VM names from the text file
- Processes only 4 Vms concurrently to limit the impact of additonal IO on storage during alignment. Easily modfiied.
- For each VM:
- Connects the previously configured PAT Cli ISO image to the VM
- Updates the VM BIOS to boot from CD first
- Attempts to "shut down guest"
- If this has not completed after 3 minutes, powers off the VM
- Takes a snapshot of the VM
This provides a valid rollback in case the alignment is interrupted. Paragon has it's own recovery process which has worked reliably when I've deliberately interrupted the alignment, but having another rollback option for production systems is still vital.
Note that during testing in a single emvironment, we found the snapshots grew about 4GB per 40GB of partition being aligned - Powers the VM on - it will boot into the WinPE image and run the automated PAT Cli script to perform the alignments. Yes this script can be customised to align certain partitions only if required.
- While the alignment is running, reconfigures the VM BIOS back to the default boot order, and set's the VMs CD back to pass-through
- Once the alignment has completed, PATCli will reboot the VM automatically
- All going well, the VM will boot back into a nicely aligned set of partitions
The sample script does not automatically check that the OS has come back online after the alignment, nor does it delete the snapshot. Currently that's a manual process. For high IO VMs, it would be useful (or required) to to have this functionality in order to avoid excessive snapshot growth - assuming you can reliably verify the VM is fully functional before the snapshot is removed. If that's of concern, and you still want to automate / schedule alignments to run unattended, then it might be advisable to purchase a tool like vOptimizer.