StorageIO industry trends cloud, virtualization and big data

There are at least  two different meanings for IOPs, which for those not familiar with the  information technology (IT) and data storage meaning is Input/output Operations  Per second (e.g. data movement activity). Another meaning for IOP that is the  international organization for a participatory society (iopsociety.org), and their fundraising activity found  here.

 

I recently came across a piece (here and here)  talking about RAID and IOPs that had some interesting points;  however, some generalizations could use some more comments. One of the  interesting comments and assertions is that RAID writes increase with the  number of drives in the parity scheme. Granted the specific implementation and  configuration could result in an it depends type response.

StorageIO industry trends cloud, virtualization and big data

Here are some more perspectives to the piece (here and here)  as the sites comments seem to be restricted.

 

Keep in mind that such as with RAID 5 (or 6) performance, your IO size will have  a bearing on if you are doing those extra back-end IOs. For example if you are  writing a 32KB item that is accomplished by a single front-end IO from an  applications server, and your storage system, appliance, adapter, software  implementing and performing the RAID (or erasure coding for that matter) has a  chunk size of say 8KB (e.g. the amount of data written to each back-end drive).  Then a 5 drive R5 (e.g. 4+1) would in fact have five back-end IOPS (32KB / 8KB  = 4 + 1 (8KB Parity)).

 

StorageIO industry trends cloud, virtualization and big data

 

Otoh of the front end IOP were only 16KB (using whole  numbers for simplicity, otherwise round-up), in the case of a write, there  would be three back-end writes with the R5 (e.g. 2 + 1). Keep in mind the controller/software managing the RAID would  (or should) try to schedule back-end IO with cache, read-head, write-behind,  write-back, other forms of optimization etc.

 

In the piece (here and here),  a good point is the understanding and factoring in IOPS is important, as is  also latency or response time in addition to bandwidth or throughput, along  with availability, they are all inter-related.

 

Also very important is to keep in mind the size of the IOP,  read and write, random, sequential etc.

 

RAID along with erasure coding is a balancing act between  performance, availability, space capacity and economics aligned to different  application needs.

 

RAID 0 (R0) actually has a big impact on performance, no penalty  on writes; however, it has no availability protection benefit and in fact can  be a single point of failure (e.g. loss of a HDD or SSD) impacts the entire R0  group. However, for static items, or items that are being journaled and  protected on some other medium/RAID/protection scheme, R0 is used more than  people realize for scratch/buffer/transient/read cache types of applications.  Keep in mind that it is a balance of all performance and capacity with the  exposure of no availability as opposed to other approaches. Thus, do not be  scared of R0, however also do not get burned or hurt with it either, treat it  with respect and can be effective for something's.

 

Also mentioned in the piece was that SSD based servers will  perform vastly better than SATA or SAS based ones. I am assuming that the  authors meant to say better than SAS or SATA DAS based HDDs?

 

StorageIO industry trends cloud, virtualization and big data

 

Keep in mind that unless you are using a PCIe nand flash SSD  card as a target or cache or RAID card, most SSD drives today are either SAS or  SATA (being the more common) along with moving from 3Gb SAS or SATA to 6Gb SAS & SATA.

 

Also while HDD and SSDs can do a given number of reads or  writes per second, those will vary based on the size of the IO, read, write,  random, sequential. However what can have the biggest impact and where I have  seen too many people or environments get into a performance jam is when  assuming that those IOP numbers per HDD or SSD are a given. For example  assuming that 100-140, IOPs (regardless of size, type, etc.) can be achieved as  a limiting factor is the type of interface and controller/adapter being used.

 

I have seen fast HDDs and SSDs deliver sub-par  performance or not meeting expectations fast interfaces such as  iSCSI/SAS/SATA/FC/FCoE/IBA or other interfaces due to bottlenecks in the  adapter card, storage system / appliance / controller / software. In some cases  you may see more effective IOPs or reads, writes or both, while on other  implementations you may see lower than expected due to internal implementation  bottlenecks or architectural designs. Hint, watch out for solutions where the  vendor tries to blame poor performance on the access network (e.g. SAS, iSCSI,  FC, etc.) particular if you know that those are not bottlenecks.

 

Here are some related content:
Are Hard Disk Drives (HDDs) getting too big?
How can direct attached storage (DAS) make a comeback if it  never left?
EMC VFCache re spinning SSD and intelligent caching
SSD and Green IT moving beyond green washing
Optimize Data Storage for Performance and Capacity  Efficiency
Is SSD dead? No, however some vendors might be
RAID Relevance Revisited
Industry Trends and Perspectives: RAID Rebuild Rates
What is the best kind of IO? The one you do not have to do
More storage and IO metrics that matter
IBM buys flash solid state device (SSD) industry veteran TMS

 

In terms of fund-raising, if you feel so compelled, send a gift, donation, sponsorship, project, buy some books, piece of work,  assignment, research project, speaking, keynote, web cast, video or seminar  event my way and just like professional  fund-raisers, or IOPS vendors, StorageIO accept visa, Master Card, American  express, Pay Pal, check and traditional POs.

 

As for this site and  comments, outside of those caught in the spam trap, courteous perspectives and  discussions are welcome.

 

Ok, nuff said.

 

Cheers gs