VMware Cloud Community
massimo
Contributor
Contributor

Iops calculation

Hi to all and sorry for my english,

i need to determine the IOPs value of a virtual infrastructure to understand the correct storage performance.

After run many and many day in thousand of opinion/suggestion and technical doc, i have made this type of approch using perfmon, esxtop and merging all in a "always fundamental" 🙂 good Excel sheet

  1. Perfmon in every VM with this counter: Phisical disk - Trasfer/sec, write/sec and read/sec
  2. Esxtop in batch mode on Esx console. I retrieve Command/sec, write/sec and read/sec for any VM

I had retrieve for 12 working hour for day; for 3 day.

Next (for every VM):

  1. Average of ((Average of Trasfer/Sec in a day from perfmon) + (Average of Command/Sec in a day from EsxTop)). Result is my VM IOPS
  2. Percentage of write in previous average
  3. Percentage of read in previous average
  4. Apply formula (VM IOps × Percentage of read)+ ((VM IOps × Percentage of write) × RAID Penalty). Result is my real VM IOPS on array

If , for instance, i obtain 50 IOPs (sum all real vm IOPs) by this caluclation, my storage can be safely composed by two 7.2 Rpm hard disk in Raid 10.

I apply RAID penalty 4 for RAID 5 and 2 for RAID 10

Can some give me an opinion about this method that should be enough neutral against the platform?

I hope to be clear.

Thanks in advanced for any help/suggestion.

KInd regards.

Massimo.

0 Kudos
2 Replies
gregschulz
Enthusiast
Enthusiast

Hello Massimo and nice job putting the model together.

A couple of comments and thoughts to consider.

First is that while IOPS play a role, there are also bandwidth/throughput (e.g. the size of those IOPS) to keep in mind along with response time/latency/queue depths to factor in.

In your model are you concerned with the just the average assuming that the workloads are constant, or do you need to account for spikes or dynamic activity in which case you could add an adjusted actual, or average peak. Likewise is if the workload is constant over a 24 hour day than the assumptions work, however if there is a spike say from 8AM-5PM and then little to no activity during the evening, than your averages will get skewed which could lead to a lower than needed sizing estimate for performance. In your mode, you make a good assumption about a 12-hour day so also assume that eliminates 12 hours of little to no activity.

In addition, what is the mix of IO sizes, or are they all constant? For a balanced workload, the formula should cover for that.

Now let us talk about RAID penalty, there are two pieces one of which is performance and the other is space capacity. The space capacity RAID penalty is straight forward with mirroring being equal amount of space for actual data for protection (e.g. N+N) which would also be case for RAID 10, For RAID 4 & 5 its N+1 where N=4 data drives there would be 5 disks one being parity (RAID 5 rotating, RAID 4 dedicated) thus 20% space overhead, RAID 6 & DP would be N+2. Note that the wide the stripe or N in RAID 4/5/6/DP the lower the space overhead, the fewer drives in a stripe/raid set would be a higher space/protection overhead.

Now for RAID performance penalty, on writes with RAID 1 there would be at least two writes for each IO, however on reads, there only need to be one. In some implementations, it is possible for concurrent reads to be satisfied by the controller/firmware/software. Likewise some implementations support not only mirror (e.g. N+N), also triple or quad mirrors. For RAID 5 (again depending on implementation) there should not be a penalty on reads in fact there could be a benefit (depending on implementation) due to stripe and width of raid set size. However, on writes there is a penalty for performing the parity calculation, which again depending on implementation can be helped with caching and other techniques. Of course, during a rebuild operation there would be a performance impact.

Now let us jump to the disk drives, which is where some interesting things happen.

On one hand, you can make an assumption of the number of IOPS an individual HDD, HHDD (Hybrid Hard Disk Drive) or SSD can do, however that also assumes a given IO size and often there are apples to oranges comparisons.

For example HDDs IOPS numbers are of a relative small size and for SSD the IOP size is often 1/2K to produce maximum IOP rating for the spec sheets. That is fine if your environment can leverage those smaller rates, however be careful to look at what the IOP for reads, writes or your particular workload is and how the device supports those. Now there is another piece and that is if you can actually get the stated number of IOPS (or bandwidth for that matter or latency) from the device. The controller, adapter, or interface and its configuration can make a difference.

Normally you would hope that an adapter, controller, storage system for that matter would not introduce a bottleneck (besides the network/storage interface connection) between your server and storage, however depending on implementation some can. Likewise, some adapters, controllers and storage systems can eliminate bottlenecks given their optimization producing cached or accelerated performance vs. a traditional device.

What the above all means is keep in mind the implementation of the model along with some of the other factors mentioned.

The specific implementation including products, protocols, etc. will have an impact.

Hope that helps or provides some additional thoughts/perspectives.

Cheers

gs

Cheers GS
0 Kudos
massimo
Contributor
Contributor

Hi gregschulz,

and thanks for your considerable explanation.

I start immediatly to collect data regarding your consideration and will come back soon with result. Smiley Happy

Thanks another time.

Regards.

Massimo.

0 Kudos