Our company has recently re-adopted a storage tiering solution, but this time with a focus on IO Density. It has promise to be quite nice and save us a lot of money, however the execs cut the monitoring tools (Watch4Net, vcOps) from the proposal while keeping the savings on disk purchase. Isn't that always the way?
BACKGROUND
Our storage engineers are already configuring the arrays with this new layout. We will have Bronze, Silver, Gold, and Platinum tiers, allowing 0.03, 0.4, 3.2 and 20 IOps per GB of provisioned disk respectively. We are coming from an environment where provisioning was done strictly based on capacity, so it really is the wild west in terms of performance of our systems. We also are a large healthcare provider, with almost 700 hosts spread across four datacenters, so no lack of scale to contend with in monitoring this either. Our goal, as virtual infrastructure admins, is to accommodate this large cost savings play without killing our operations staff. As you can imagine, having no tool and no prior performance data will make this a challenge.
The approach has storage presenting LUNs to us from a given tier, in which we'll name the datastores for what tier they are in. Datastore Clusters or Storage Profiles would be a good play for some of our non-Prod workloads, however much of our Prod workloads leverage SRM array-based replication, so which specific LUN a VMDK sits on is important. The architects also assumed we could easily move each VMDK on the performance appropriate tier of storage, further compounding the complexities but saving capital dollars. Our first concern is "What happens when a mission critical app complains of performance on a single VMDK?" With replication set LUN specific, the only performance relief we know of would be to svMotion the VMDK to a higher tier. Having >15,000 VMDKs in our primary datacenter alone, this would be a nightmare for our operations staff. As storage is configured for a specific amount of IOps per LUN, and with svMotions drawing from array-limited IOps during data moves while simultaneously servicing Prod workloads, this was untenable. Our solution was to leverage VMDK IO Limits at an advertised level, yet configure the storage frames with a slightly higher limit so that svMotions would not impact Prod. This also allows our operations staff to increase the IO limit temporarily on the VMDK and batch the svMotions for a maintenance window. It should work, but the next question is how do we monitor the VMDKs to know if they are in an appropriate tier or if they need to be pro/demoted?
We need a way to monitor each VMDK's IO Limit and how frequently it is pushing the high and low sides of it. To facilitate this, I wrote the following script that executes on a scheduled task hourly, pulls the read and write IOps for each VMDK, records count and percentage of intervals >=90% of IO limit and <=5%, then records those values in a CSV file (re-used for subsequent executions). With 15,000 VMDKs in our single largest datacenter, trying to keep multiple intervals of that granular of data just proves exhausting. The thought being this report runs hourly across the entire enterprise, could be loaded into Excel, sorted for high and low watermarks to look for re-tiering candidates and then schedule them to move during a maintenance window.
Any assistance or ideas will be greatly appreciated. I've performed as much due diligence as possible against these forums, blogs and general web searches. It seems now the best I can do is ask the experts and see if there is a way to build a better mousetrap.
PROBLEMS
SCRIPT
Connect-VIServer VCServer
### Test to see if our output file exists
$OutFile = "c:\tmp\IOLimit-Rpt.csv"
if ($(Test-Path $OutFile) -like "True") {
$CSV = Import-Csv $OutFile
}
else { $CSV = "" }
### Create report of VMs, HDDs, Read & Write IOps stats per VMDK and groups like stats together so we can gather metrics
$report = @()
foreach ($VM in Get-VM) {
$VmHdds = Get-HardDisk -VM $VM | select name, capacitykb, parent, extensiondata
$Stats = Get-Stat -Entity $VM -Stat "virtualdisk.numberreadaveraged.average","virtualdisk.numberwriteaveraged.average" -Realtime | select entity, instance, metricid, value
$VmdkStats = $Stats | Group-Object -Property {$_.entity, $_.instance, $_.metricid} ### Group the 184 data points per metric per device together to determine their stats
foreach ($VmdkStat in $VmdkStats) {
$row = "" | select VmName, ScsiID, Type, Max, CapacityGB, CurrentIOLimit, IOLimit90, IOLimit5, HddName
$row.VmName = $VmdkStat.name.split(" ")[0]
$row.ScsiID = $($VmdkStat.name.split(" ")[1]).trimstart("scsi")
$VmHdd = $VmHdds | where {$_.parent.name -eq $row.VmName -and $([string]$_.extensiondata.controllerkey).substring(3,1) +":"+ $([string]$_.extensiondata.unitnumber) -eq $row.ScsiID} ### Grab the corresponding Vm Hdd object for this Stat
$row.Type = if($VmdkStat.name.split(" ")[2] -like "virtualdisk.numberreadaveraged.average") { "r" } else { "w" } ### Shrink the longer string for stat type to a simple "r" or "w"
$Values = $VmdkStat.group | select value | Measure-Object value -Maximum
$row.Max = $Values.Maximum
$row.CapacityGB = $VmHdd.CapacityKB / 1024 / 1024
$row.CurrentIOLimit = if ($VmHdd.ExtensionData.StorageIOAllocation.limit -eq $null -or $VmHdd.ExtensionData.StorageIOAllocation.limit -eq -1) {[int]($row.CapacityGB * .4)} else {$VmHdd.ExtensionData.StorageIOAllocation.limit} ### Supplies default value to compare with
$row.IOLimit90 = $($VmdkStat.Group | where {$_.value -ge ([int](.9 * $row.CurrentIOLimit))}).count ### How many data points equal or exceed 90% utilization
if ($row.IOLimit90 -eq $null) {$row.IOLimit90 = 0}
$row.IOLimit5 = $($VmdkStat.Group | where {$_.value -le ([int](.05 * $row.CurrentIOLimit))}).count ### How many data points less than or equal 5% utilization
if ($row.IOLimit5 -eq $null) {$row.IOLimit5 = 0}
$row.HddName = $VmHdd.Name
$report += $row
}
}
### Merge the latest Realtime report into the historic report
$report2 = @()
foreach ($record in $report) {
$row = "" | select VmName, ScsiID, Type, Max, CapacityGB, CurrentIOLimit, IOLimit90Count, IOLimit90Percent, IOLimit5Count, IOLimit5Percent, HddName, DataPoints
### If the CSV file does not exist already, we just dump this data into the file with no massaging
if ($CSV -like "") {
$row.VmName = $record.VmName; $row.ScsiID = $record.ScsiID; $row.Type = $record.Type; $row.Max = $record.Max; $row.CapacityGB = $record.CapacityGB; $row.CurrentIOLimit = $record.CurrentIOLimit; $row.IOLimit90Count = $record.IOLimit90; $row.IOLimit90Percent = ($record.IOLimit90 / 184); $row.IOLimit5Count = $record.IOLimit5; $row.IOLimit5Percent = ($record.IOLimit5 / 184); $row.HddName = $record.HddName; $row.DataPoints = 184
}
### The CSV file esists, so we need to find the prior record so the new data can be added
else {
$OldRecord = $CSV | where {$_.VmName -like $record.VmName -and $_.ScsiID -like $record.ScsiID -and $_.Type -like $record.Type}
### As our tiering is based on IO Density (CapacityGB x IOps allowed for a given tier), if CapacityGB or the IOLimit change, we archive the old data and start a fresh row
if ([single]$record.CapacityGB -ne [single]$OldRecord.CapacityGB -or $record.CurrentIOLimit -ne $OldRecord.CurrentIOLimit) {
### Write old data to archive file
"`"{0}`",`"{1}`",`"{2}`",`"{3}`",`"{4}`",`"{5}`",`"{6}`",`"{7}`",`"{8}`",`"{9}`",`"{10}`",`"{11}`",`"{12}`"" -f $OldRecord.vmname,$OldRecord.ScsiID,$OldRecord.Type,$OldRecord.Max,$OldRecord.CapacityGB,$OldRecord.CurrentIOLimit,$OldRecord.IOLimit90Count,$OldRecord.IOLimit90Percent,$OldRecord.IOLimit5Count,$OldRecord.IOLimit5Percent,$OldRecord.HddName,$OldRecord.DataPoints,$endDTM | Add-Content -path c:\tmp\IOLimit-Archive.csv
### Writes new stat data into CSV file
$row.VmName = $record.VmName; $row.ScsiID = $record.ScsiID; $row.Type = $record.Type; $row.CapacityGB = $record.CapacityGB; $row.CurrentIOLimit = $record.CurrentIOLimit; $row.HddName = $record.HddName
$row.Max = $record.Max
$row.IOLimit90Count = [int]$record.IOLimit90
$row.IOLimit90Percent = $row.IOLimit90Count / 184
$row.IOLimit90Percent = "{0:P2}" -f $row.IOLimit90Percent ### Puts it in % format versus decimal
$row.IOLimit5Count = [int]$record.IOLimit5
$row.IOLimit5Percent = $row.IOLimit5Count / 184
$row.IOLimit5Percent = "{0:P2}" -f $row.IOLimit5Percent ### Puts it in % format versus decimal
$row.DataPoints = 184
}
### Assuming no change in capacity or IO limit, compares metrics between old and current records
else {
$row.VmName = $record.VmName; $row.ScsiID = $record.ScsiID; $row.Type = $record.Type; $row.CapacityGB = $record.CapacityGB; $row.CurrentIOLimit = $record.CurrentIOLimit; $row.HddName = $record.HddName
$row.Max = if ($OldRecord.Max -gt $record.Max) {$OldRecord.Max} else {$record.Max}
$row.DataPoints = [int]$OldRecord.DataPoints + 184
$row.IOLimit90Count = [int]$OldRecord.IOLimit90Count + [int]$record.IOLimit90
$row.IOLimit90Percent = ([int]$OldRecord.IOLimit90Count + [int]$record.IOLimit90) / $row.DataPoints
$row.IOLimit90Percent = "{0:P2}" -f $row.IOLimit90Percent ### Puts it in % format versus decimal
$row.IOLimit5Count = [int]$OldRecord.IOLimit5Count + [int]$record.IOLimit5
$row.IOLimit5Percent = ([int]$OldRecord.IOLimit5Count + [int]$record.IOLimit5) / $row.DataPoints
$row.IOLimit5Percent = "{0:P2}" -f $row.IOLimit5Percent ### Puts it in % format versus decimal
}
}
$report2 += $row
}
$report2 | Export-Csv $OutFile -NoTypeInformation
Disconnect-VIServer VCServer -Confirm:$false -Force:$true
SAMPLE OUTPUT
Definitions
IOLimit90 - How many data point are >= 90% of the CurrentIOLimit
IOLimit5 - How many data points are <= 5% of the CurrentIOLimit
VmName | ScsiID | Type | Max | CapacityGB | CurrentIOLimit | IOLimit90Count | IOLimit90Percent | IOLimit5Count | IOLimit5Percent | HddName | DataPoints |
VM1 | 0:00 | w | 7 | 20 | 6 | 76 |
| 0 |
| Hard disk 1 | 184 |
VM1 | 0:01 | w | 0 | 8 | 30 | 0 |
| 180 |
| Hard disk 2 | 184 |
VM1 | 0:00 | r | 0 | 20 | 6 | 0 |
| 180 |
| Hard disk 1 | 184 |
VM1 | 0:01 | r | 0 | 8 | 30 | 0 |
| 180 |
| Hard disk 2 | 184 |
VM2 | 0:00 | w | 158 | 20 | 8 | 3 |
| 35 |
| Hard disk 1 | 184 |
VM2 | 0:01 | w | 1 | 8 | 3 | 0 |
| 176 |
| Hard disk 2 | 184 |
VM2 | 0:00 | r | 28 | 20 | 8 | 0 |
| 179 |
| Hard disk 1 | 184 |
VM2 | 0:01 | r | 0 | 8 | 3 | 0 |
| 180 |
| Hard disk 2 | 184 |
At first glance, I would suggest to replace the Get-Stat call you do for each VM with 1 Get-Stat call for all VMs.
After this Get-Stat you can split the result with the Group-Object cmdlet on the $_.Entity.Name property.
You would probably also win on execution time by doing the Get-Harddisk only once for all the VMs.
If you store the results in a hash table, you can easily look up the correct entry based on the VM name and the Harddisk name.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
Thanks for the tips. I'll play around with the Get-Stat.
As for the hash table on Get-Harddisk, you mention it would be based on VM name and Harddisk name. I'm unfamiliar with how to create a hash table with a combined key like that, as all the examples I can find are based on one value. Could you provide a snippet on how to build a hash table using a combined key?
I've added a second script for subsequent execution, for when I want to tier grade a given VMDK's performance. It has our company's IO density levels baked-in, but the values are easily changed for others. I'm also only keeping the higher of the read or write IOps values in the final report, as that is what we provision on.
$CSV = Import-Csv c:\tmp\IOLimit-RT-XrdcAug.csv
foreach ($CSVrow in $CSV) {
$IODensity = $CSVrow.Max / $CSVrow.CapacityGB
if ($IODensity -le .03) {$Tier = "Bronze"}
elseif ($IODensity -le .4) {$Tier = "Silver"}
elseif ($IODensity -le 3.2) {$Tier = "Gold"}
elseif ($IODensity -le 20) {$Tier = "Platinum"}
else {$Tier = "Diamond"}
Add-Member -InputObject $CSVRow -MemberType NoteProperty -Name "IODensity" -Value $IODensity -Force
Add-Member -InputObject $CSVRow -MemberType NoteProperty -Name "Tier" -Value $Tier -Force
}
$CsvGrouped = $CSV | Group-Object -Property {$_.VmName, $_.ScsiID}
$report = @()
foreach ($CsvGroupedRow in $CsvGrouped) {
if ($CsvGroupedRow.Group[0].Max -gt $CsvGroupedRow.Group[1].Max) {$Report += $CsvGroupedRow.Group[0]}
else {$Report += $CsvGroupedRow.Group[1]}
}
$report | Export-Csv c:\tmp\Xrdc-IOReport-Aug.csv -NoTypeInformation