Solved: San Sizing documentation request...

eggie · ‎10-25-2009

This was posted a couple years back but it seems a proper response was never given. Is there any industry documentation available concerning san sizing/planning.

"3. SAN sizing - VMware states a range that no more than 32 VMs per LUN

if they are highly utilized then spans to 100 VMs per LUN if the VMs

are not highly utilized. This is a very large range. When sizing for

performance, VMware needs to provide the info to back up the above,

that includes what the LUN's RAID level was, LUN size, disk count per

the LUN, disk speed, proc I/O maximum, etc so we know why this is being

stated. Example, I have a SAN with 1.3TB space - why can't I stripe,

say 28 disks as RAID 10 and have 1 large LUN to host 100 heavily

utilized servers? What if the LUN was striped with RAID 4? Simple

guideline(s) that address LUN sizing based on spindle count based on

RAID levels to assess proper sizing for performance for x number of VMs

shouldn't be too much to ask for."

Thanks in advance.

jjkrueger · ‎10-26-2009

That's the trick with disk sizing - there's no such thing as a boilerplate recommendation.

Your LUN sizing will most often be dependent on the I/O characteristics you need for the device - IOps, disk throughput, etc. The number of VMs you can reasonably place on a VM will also be limited by those numbers. What I try to do when sizing my environments is figure out a baseline of peak (think worst-case application scenario here - what if all of the VMs' apps will be going nuts on the disk at the exact same time) IOps (that's usually the biggest disk bottleneck - physics are a pain here, as the head on a spindle can only be in one place at a time), figure out how many spindles I need to meet that. Once I know the number of spindles my LUN has to be spread across, I can look at the overall size of the datastore that I'll need. Keep in mind not only disk sizing here, but also whether or not VM snapshots will be put to use and for how long, and also keep in mind the memory being allocated to each of the VMs to plan for .vswp file sizing. Once I have sizing information, I can look at the device split size on my storage array, figure out how many splits I need to meet both the I/O and size requirements for the data store, and then create the LUN keeping in mind RAID overhead. A rule of thumb I still use is 1 physical spindle provides me with ~150 IOps to work with. That number is very conservative (and frankly a bit old, but I haven't looked at disk in too much depth in the past 3 years). Also keep in mind that your LUNs will be sharing spindles (and therefore, sharing IOps) with other LUNs, so other activity on the storage array needs to be taken into account during the design process as well.

For example, If I have an environment with 10 applications, each with a peak IOps of 500, I would ultimately want a LUN that would support 5000 IOps. Working with my 150 IOps rule of thumb, I would be looking at a LUN spanning 34 spindles. If each of the VMs was provisioned with 50GB virtual disks and 4GB of RAM each, and I knew the VMs would not have any point that the snapshots would grow beyond 10% of the original virtual disk size, I would plan for 550GB for virtual disk/snapshot space, 40GB for .vswp files, and some overhead for VMFS metadata, config and log files, etc. So I'd be looking at, say, a 600GB LUN. At this point, can my storage array support a 600GB LUN spanning 34 spindles? That's a 17.6GB split size. Some compromise may need to be made depending on what you know of your application performance (is peak load going to happen all at once, or is the worst aggregate peak load lower? that could lead to a larger split size and fewer spindles) or what you know of your storage array (split sizes are available larger and smaller than what the calculations show - do you go with a smaller split and more spindles, wasting available IO, or do you take the larger split size with fewer spindles, with the possibility of either starving yourself of IOps or wasting precious disk space).

As for maximum numbers of VMs placed on a given data store, you wouldn't want many more than 32 VMs on each, largely due to the way VMFS file locking is maintained. Each host will lock files in the metadata space for the VMFS datastore. The locks are updated periodically. Those updates, and any other metadata update activities, cause an entire SCSI device lock to be maintained on the disk device. Above 32 heavily used VMs, metadata updates can easily keep the LUN in a locked state, preventing VM disk I/O from occurring.

It's nearly impossible to put together a sweeping document because recommendations made for your environment are specific to your environment. Sure, workloads between dissimilar shops are similar, but the subtleties of each shop means that any generalized recommendation like "build 500GB LUNs across 10 spindles and put 15 VMs on them" would be overkill for 45% of the shops looking, and would be a performance disaster for 45%, leaving a smallish 10% of environments happy and in a good comfort zone.

Hope that makes some sense, and helps.

-jk

View solution in original post

jjkrueger · ‎10-26-2009

That's the trick with disk sizing - there's no such thing as a boilerplate recommendation.

Your LUN sizing will most often be dependent on the I/O characteristics you need for the device - IOps, disk throughput, etc. The number of VMs you can reasonably place on a VM will also be limited by those numbers. What I try to do when sizing my environments is figure out a baseline of peak (think worst-case application scenario here - what if all of the VMs' apps will be going nuts on the disk at the exact same time) IOps (that's usually the biggest disk bottleneck - physics are a pain here, as the head on a spindle can only be in one place at a time), figure out how many spindles I need to meet that. Once I know the number of spindles my LUN has to be spread across, I can look at the overall size of the datastore that I'll need. Keep in mind not only disk sizing here, but also whether or not VM snapshots will be put to use and for how long, and also keep in mind the memory being allocated to each of the VMs to plan for .vswp file sizing. Once I have sizing information, I can look at the device split size on my storage array, figure out how many splits I need to meet both the I/O and size requirements for the data store, and then create the LUN keeping in mind RAID overhead. A rule of thumb I still use is 1 physical spindle provides me with ~150 IOps to work with. That number is very conservative (and frankly a bit old, but I haven't looked at disk in too much depth in the past 3 years). Also keep in mind that your LUNs will be sharing spindles (and therefore, sharing IOps) with other LUNs, so other activity on the storage array needs to be taken into account during the design process as well.

For example, If I have an environment with 10 applications, each with a peak IOps of 500, I would ultimately want a LUN that would support 5000 IOps. Working with my 150 IOps rule of thumb, I would be looking at a LUN spanning 34 spindles. If each of the VMs was provisioned with 50GB virtual disks and 4GB of RAM each, and I knew the VMs would not have any point that the snapshots would grow beyond 10% of the original virtual disk size, I would plan for 550GB for virtual disk/snapshot space, 40GB for .vswp files, and some overhead for VMFS metadata, config and log files, etc. So I'd be looking at, say, a 600GB LUN. At this point, can my storage array support a 600GB LUN spanning 34 spindles? That's a 17.6GB split size. Some compromise may need to be made depending on what you know of your application performance (is peak load going to happen all at once, or is the worst aggregate peak load lower? that could lead to a larger split size and fewer spindles) or what you know of your storage array (split sizes are available larger and smaller than what the calculations show - do you go with a smaller split and more spindles, wasting available IO, or do you take the larger split size with fewer spindles, with the possibility of either starving yourself of IOps or wasting precious disk space).

As for maximum numbers of VMs placed on a given data store, you wouldn't want many more than 32 VMs on each, largely due to the way VMFS file locking is maintained. Each host will lock files in the metadata space for the VMFS datastore. The locks are updated periodically. Those updates, and any other metadata update activities, cause an entire SCSI device lock to be maintained on the disk device. Above 32 heavily used VMs, metadata updates can easily keep the LUN in a locked state, preventing VM disk I/O from occurring.

It's nearly impossible to put together a sweeping document because recommendations made for your environment are specific to your environment. Sure, workloads between dissimilar shops are similar, but the subtleties of each shop means that any generalized recommendation like "build 500GB LUNs across 10 spindles and put 15 VMs on them" would be overkill for 45% of the shops looking, and would be a performance disaster for 45%, leaving a smallish 10% of environments happy and in a good comfort zone.

Hope that makes some sense, and helps.

-jk

All

San Sizing documentation request...