Skip navigation
2013

Storage I/O trends

IBM today  announced that they are acquiring privately held Dallas Texas-based Softlayer and Infrastructure as a Service (IaaS) provider.

IBM  is referring to this as Cloud without Compromise (read more about clouds, conversations and confidence here).

 

It's about the management, flexibly, scale up, out and down, agility and valueware.

 

Is  this IBM's new software defined data center (SDDC) or software defined  infrastructure (SDI) or software defined management (SDM), software defined cloud  (SDC) or software defined storage (SDS) play?

 

This  is more than a software defined marketing or software defined buzzword announcement.


  buzzword bingo

 

If  your view of software define ties into the theme of leveraging, unleashing  resources, enablement, flexibility, agility of hardware, software or services,  then you may see Softlayer as part of a software defined infrastructure.

 

On the other hand,  if your views or opinions of what is or is not software defined align with a  specific vendor, product, protocol, model or punditry then you may not  agree, particular if it is in opposition to anything IBM.

Cloud building blocks

 

During  the announcement briefing call with analysts there was a noticeable absence  of software defined buzz talk which given its hype and usage lately, was a  refreshing welcome relief. So  with that, lets set the software defined conversation aside (for now).

 

Cloud image

Who  is Softlayer, why is IBM interested in  them?

Softlayer  provide software and services to support both SMB, SME and other environments  with bare metal (think traditional hosted servers), along with multi-tenant  (shared) cloud virtual public and private cloud service offerings.

 

Softlayer supports various  applications, environments from little data processing to big data analytics to little data processing,  from social to mobile to legacy. This includes those app's or environments that  were born in the cloud, or legacy environments looking to leverage cloud in a complimentary  way.

 

Some  more information about Softlayer includes:

  • Privately held IaaS firm founded in 2005
  • Estimated revenue run rate of around $400 million with 21,000 customers
  • Mix of SMB, SME and Web-based or born in the cloud customers
  • Over 100,000 devices under management
  • Provides a common modularized management framework set of tools
  • Mix of customers from Web startups to global enterprise
  • Presence in 13 data centers across the US, Asia and Europe
  • Automation, interoperability, large number of API access and supported
  • Flexibility, control and agility for physical (bare metal) and cloud or  virtual
  • Public, private and data center to data center
  • Designed for scale, durability and resiliency without complexity
  • Part of OpenStack ecosystem both leveraging and supporting it
  • Ability for customers to use OpenStack, Cloudstack, Citrix, VMware, Microsoft and  others
  • Can be white or private labeled for use as a service by VARs

Storage I/O trends

What IBM is planning for Softlayer

Softlayer will report  into IBM Global Technology Services (GTS) complimenting existing capabilities  which includes ten cloud  computing centers on five continents. IBM has created a new Cloud  Services Division and expects cloud revenues could be $7 billion annually by the  end of 2015. Amazon Web Services (AWS) is estimated  to hit about $3.8 Billion by end of 2013. Note that in 2012 AWS target available market was estimated to be about $11 Billion which should become larger  moving forward. Rackspace by comparison had recent earning announcements on May  8 2013 of $362 Million with most that being hosting vs. cloud  services. That works out to an annualized estimated run rate of $1.448 Billion  (or better depending on growth).

 

I mention AWS and  Rackspace to illustrate the growth potential for IBM and Softlayer to discuss  the needs of both cloud services customers such as those who use AWS (among  other providers), as well as bare metal or hosting or dedicated servers such as  with Rackspace among others.

Storage I/O trends

 

What is not clear at  this time is if IBM is combing traditional hosting, managed services, new  offerings, products and services in that $7 billion number. In other words if  the $7 billion represents what the revenues of the new Cloud Services Division  independent of other GTS or legacy offerings as well as excluding hardware,  software products from STG (Systems Technology Group) among others, that would  be impressive and a challenge to the likes of AWS.

 

IBM has indicated  that it will leverage its existing Systems Technology Group (STG) portfolio of  servers and storage extending the capabilities of Softlayer. While currently  x86 based, one could expect IBM to leverage and add support for their  Power systems line of processors and servers, Puresystems, as well as storage such as XIV or V7000 among  others for tier 1 needs.

 

Some more  notes:

  • Ties into IBM Smart Cloud initiatives, model and paradigm
  • This deal is expected to close 3Q 2013, terms or price were not  disclosed.
  • Will enable Softlayer to be leveraged on a larger, broader basis by IBM
  • Gives IBM increased access to SMB, SME and web customers than in the  past
  • Software and development to stay part of Softlayer
  • Provides IBM an extra jumpstart play for supporting and leveraging  OpenStack
  • Compatible and supports Cloustack and Citrix who are also IBM partners
  • Also compatible and supports VMware who is also an IBM partner

Storage I/O trends

Some other thoughts and  perspectives

This is a good and  big move for IBM to add value and leverage their current portfolios of both  services, as well as products and technologies. However it is more than just  adding value or finding new routes to markets for those goods and services, it's also about enablement IBM has long been in the services including managed  services, out or in sourcing and hosting business. This can be seen as another  incremental evolution of those offerings to both existing IBM enterprise  customers, as well to reach new, emerging along with SMB or SME's  that tend to grow up and become larger consumers of information and data  infrastructure services.

 

Further this helps to  add some product and meaning around the IBM Smart Cloud initiatives and programs  (not that there was not before) giving customers, partners and resellers  something tangible to see, feel, look at, touch and gain experience not to mention confidence with clouds.

 

On the other hand, is  IBM signaling that they want more of the growing business that AWS has been  realizing, not to mention Microsoft Azure, Rackspace, Centurylink/Savvis,  Verizon/Terremark, CSC, HP Cloud, Cloudsigma, Bluehost among many others (if I  missed you or your favorite provider, feel free to add it to the comments section). This also gets IBM  added Devops exposure something that Softlayer practices, as well  as a Openstack play, not to mention cloud, software defined, virtual, big  data, little data, analytics and many other buzzword bingo terms.

 

Congratulations to  both IBM and the Softlayer folks, now lets see some execution to watch how this  unfolds.

 

Ok, nuff said.

Cheers gs

Storage I/O trends

How many IOPS can a HDD, HHDD or SSD do?

This is the second post of a two-part series looking at storage performance, specifically in the context of drive or device (e.g. mediums) characteristics across HDD, HHDD and SSD. In the first post the focus was around putting some context around drive or device performance with the second part looking at some workload characteristics (e.g. benchmarks).

 

A common question  is how many IOPS (IO Operations Per Second) can a storage device or system do?

 

The answer is or should be it depends.

 

Here are some examples to give you some more insight.

 

For example, the following shows how IOPS vary by changing the percent of reads, writes, random and sequential for a 4K (4,096 bytes or 4 KBytes) IO size with each test step (4 minutes each).

                                                                                                                                                                                                                                                                                                          

IO Size for test
Workload Pattern    of test
Avg. Resp (R+W) ms
Avg.  IOP Sec    (R+W)
  
Bandwidth KB Sec (R+W)
4KB
100% Seq 100% Read
0.0
  
29,736
118,944
4KB
  
60% Seq 100% Read
4.2
236
947
4KB
  
30% Seq 100% Read
7.1
140
563
4KB
0% Seq 100% Read
10.0
100
400
4KB
100% Seq 60% Read
3.4
293
1,174
4KB
  
60% Seq 60% Read
7.2
138
554
4KB
  
30% Seq 60% Read
9.1
109
439
4KB
0% Seq 60% Read
10.9
91
366
4KB
100% Seq 30% Read
5.9
168
675
4KB
  
60% Seq 30% Read
9.1
109
439
4KB
  
30% Seq 30% Read
10.7
93
373
4KB
0% Seq 30% Read
11.5
86
346
4KB
100% Seq 0% Read
8.4
118
474
4KB
  
60% Seq 0% Read
13.0
76
307
4KB
  
30% Seq 0% Read
11.6
86
344
4KB
0% Seq 0% Read
12.1
82
330

Dell/Western Digital (WD) 1TB 7200 RPM SATA HDD (Raw IO) thread count 1 4K IO size

 

In the above example the drive is a 1TB 7200 RPM 3.5 inch Dell (Western Digital) 3Gb SATA device doing raw (non file system) IO. Note the high IOP rate with 100 percent sequential reads and a small IO size which might be a result of locality of reference due to drive level cache or buffering.

 

Some drives have larger buffers than others from a couple to 16MB (or more) of DRAM that can be used for read ahead caching. Note that this level of cache is independent of a storage system, RAID adapter or controller or other forms and levels of buffering.

 

Does this mean you can expect or plan on getting those levels of performance?

 

I would not make that assumption, and thus this serves as an example of using metrics like these in the proper context.

 

Building off of the previous example, the following is using the same drive however with a 16K IO size.

                                                                                                                                                                                                                                                                                                          

IO Size    for test
Workload Pattern of test
Avg. Resp (R+W) ms
Avg. IOP Sec    (R+W)
Bandwidth KB Sec (R+W)
16KB
100% Seq 100% Read
0.1
  
7,658
122,537
16KB
  
60% Seq 100% Read
4.7
210
3,370
16KB
  
30% Seq 100% Read
7.7
130
2,080
16KB
0% Seq 100% Read
10.1
98
1,580
16KB
100% Seq 60% Read
3.5
282
4,522
16KB
  
60% Seq 60% Read
7.7
130
2,090
16KB
  
30% Seq 60% Read
9.3
107
1,715
16KB
0% Seq 60% Read
11.1
90
1,443
16KB
100% Seq 30% Read
6.0
165
2,644
16KB
  
60% Seq 30% Read
9.2
109
1,745
16KB
  
30% Seq 30% Read
11.0
90
1,450
16KB
0% Seq 30% Read
11.7
85
1,364
16KB
100% Seq 0% Read
8.5
117
1,874
16KB
  
60% Seq 0% Read
10.9
92
1,472
16KB
  
30% Seq 0% Read
11.8
84
1,353
16KB
0% Seq 0% Read
12.2
81
1,310

Dell/Western Digital (WD) 1TB 7200 RPM SATA HDD (Raw IO) thread count 1 16K IO size

 

The previous two examples are excerpts of a series of workload simulation tests (ok, you can call them benchmarks) that I have done to collect information, as well as try some different things out.

 

The following is an example of the summary for each test output that includes the IO size, workload pattern (reads, writes, random, sequential), duration for each workload step, totals for reads and writes, along with averages including IOP's, bandwidth and latency or response time.

 

disk iops

Want to see more numbers, speeds and feeds, check out the following table which will be updated with extra results as they become available.

                                                                                                                                                                                                                                                                                                                                                                                                                     

    

Device

    
    

Vendor

    
    

Make

    
    

Model

Form Factor
Capacity
Interface
RPM Speed
Test Result
HDD
HGST
Desktop
HK250-160
2.5
160GB
SATA
5.4K
HDD
Fujitsu
Desktop
MHWZ160BH
2.5
160GB
SATA
7.2K
HDD
WD/Dell
Enterprise
WD1003FBYX
3.5
1TB
SATA
7.2K
HDD
Seagate
Momentus
ST9160823AS
2.5
160GB
SATA
7.2K
HDD
Seagate
MomentusXT
ST95005620AS
2.5
500GB
SATA
7.2K(1)
Soon
HDD
Seagate
Savio 10K.3
ST9300603SS
2.5
300GB
SAS
10K
HDD
Seagate
Savio 15K.2
ST9146852SS
2.5
146GB
SAS
15K
HDD
WD/Dell
Enterprise
WD1003FBYX
3.5
1TB
SATA
7.2K
HDD
Seagate
Barracuda
ST3000DM01
3.5
3TB
SATA
7.2K
Soon
HDD
Seagate
Barracuda
ST3500320AS
3.5
500GB
SATA
7.2K
SSD
Samsung
830
2.5
256GB
SATA
SSD
Soon

Performance characteristics 1 worker (thread count) for RAW IO (non-file system)

 

Note: (1) Seagate Momentus XT is a Hybrid Hard Disk Drive (HHDD) based on a 7.2K 2.5 HDD with SLC nand flash integrated for read buffer in addition to normal DRAM buffer. This model is a XT I (4GB SLC nand flash), may add an XT II (8GB SLC nand flash) at some future time.

 

As a starting point, these results are raw IO with file system based information to be added soon along with more devices. These results are for tests with one worker or thread count,  other results will be added with such as 16 workers or thread counts to show how those differ.

 

The above results include all reads, all writes, mix of reads and writes, along with all random, sequential and mixed for each IO size. IO sizes include 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, 1024K and 2048K. As with any workload simulation, benchmark or comparison test, take these results with a grain of salt as your mileage can and will vary. For example you will see some what I consider very high IO rates with sequential reads even without file system buffering. These results might be due to locality of reference of IO's being resolved out of the drives DRAM cache (read ahead) which vary in size for different devices. Use the vendor model numbers in the table above to check the manufactures specs on drive DRAM and other attributes.

 

If you are used to seeing 4K or 8K and wonder why anybody would be interested in some of the larger sizes take a look at big fast data or cloud and object storage. For some of those applications 2048K may not seem all that big. Likewise if you are used to the larger sizes, there are still applications doing smaller sizes. Sorry for those who like 512 byte or smaller IO's as they are not included. Note that for all of these unless indicated a 512 byte standard sector or drive format is used as opposed to emerging Advanced Format (AF) 4KB sector or block size. Watch for some more drive and device types to be added to the above, along with results for more workers or thread counts, along with file system and other scenarios.

 

Using VMware as part of a Server, Storage and IO (aka StorageIO) test platform

The above performance results were generated on Ubuntu 12.04 (since upgraded to 13.04 which was hosted on a VMware vSphere 5.1 purchased version (you can get the ESXi free version here) with vCenter enabled system. I also have VMware workstation installed on some of my Windows-based laptops for doing preliminary testing of scripts and other activity prior to running them on the larger server-based VMware environment. Other VMware tools include vCenter Converter, vSphere Client and CLI.  Note that other guest virtual machines (VMs) were idle during the tests (e.g. other guest VMs were quiet). You may experience different results if you ran Ubuntu native on a physical machine or with different adapters, processors and device configurations among many other variables (that was a disclaimer btw ).

Storage I/O trends

 

All of the devices (HDD, HHDD, SSD's including those not shown or published yet) were Raw Device Mapped (RDM) to the Ubuntu VM bypassing VMware file system.

    

Example of creating an RDM for local SAS or SATA direct attached device.

    vmkfstools -z /vmfs/devices/disks/naa.600605b0005f125018e923064cc17e7c /vmfs/volumes/datastore1/RDM_ST1500Z110S6M5.vmdk

The above uses the drives address (find by doing a ls -l /dev/disks via VMware shell command line) to then create a vmdk container stored in a datastore. Note that the RDM being created does not actually store data in the .vmdk, it's there for VMware management operations.  

 

If you are not familiar with how to create a RDM of a local SAS or SATA device, check out this post to learn how.This is important to note in that while VMware was used as a platform to support the guest operating systems (e.g. Ubuntu or Windows), the real devices are not being mapped through or via VMware virtual drives.

 

vmware iops

The above shows examples of RDM SAS and SATA devices along with other VMware devices and datastores. In the next figure is an example of a workload being run in the test environment.

 

vmware iops

 

One of the advantages of using VMware (or other hypervisor) with RDM's is that I can quickly define via software commands where a device gets attached to different operating systems (e.g. the other aspect of software defined storage). This means that after a test run, I can quickly simply shutdown Ubuntu, remove the RDM device from that guests settings, move the device just tested to a Windows guest if needed and restart those VMs. All of that from where ever I happen to be working from without physically changing things or dealing with multi-boot or cabling issues.

 

So how many IOP's can a device do?

That depends, however have a look at the above information and results.

 

Check back from time to time here to see what is new or has been added including more drives, devices and other related themes.

 

Ok, nuff said (for now)

Cheers gs

Storage I/O trends

How many IOPS can a HDD, HHDD or SSD do?

 

A common question I run across is how many IOPS (IO Operations Per Second) can a storage device or system do or give.

 

The answer is or should be it depends.

 

This is the first of a two-part series looking at storage performance, and in context specifically around drive or device (e.g. mediums) characteristics across HDD, HHDD and SSD that can be found in cloud, virtual, and legacy environments. In this first part the focus is around putting some context around drive or device

performance with the second part looking at some workload characteristics (e.g. benchmarks).

 

What about cloud, tape, storage systems or appliance?

 

Lets leave those for a different discussion at another time.

Getting started

Part of my interest in tools, metrics that matter, measurements, analyst, forecasting ties back to having been  a server, storage and IO performance and capacity planning analyst when I worked in IT. Another aspect ties back to also having been a sys admin as well as business applications developer when on the IT customer side of things. This was followed by switching over to the vendor world involved with among other things competitive positioning, customer design configuration, validation, simulation and benchmarking HDD and SSD based solutions (e.g. life before becoming an analyst and advisory consultant).

 

Btw, if you happen to be interested in learn more about server, storage and IO performance and capacity planning, check out my first book Resilient Storage Networks (Elsevier) that has  a bit of information on it. There is also coverage of metrics and planning in my two other books The Green and Virtual Data Center (CRC Press) and Cloud and Virtual Data Storage Networking (CRC Press). I have some copies of Resilient Storage Networks available at a special reader or viewer rate (essentially shipping and handling). If interested drop me a note and can fill you in on the details.

 

There are many rules of thumb (RUT) when it comes to metrics that matter such as IOPS, some that are older while others may be guess or measured in different ways. However the answer is that it depends on many things ranging from if a standalone hard disk drive (HDD), Hybrid HDD (HHDD), Solid State Device (SSD) or if attached to a storage system, appliance, or RAID adapter card among others.

 

Taking a step back, the big picture

hdd image
Various HDD, HHDD and SSD's

Server, storage and I/O performance and benchmark fundamentals

Even if just looking at a HDD, there are many variables ranging from the rotational speed or Revolutions Per Minute (RPM), interface including 1.5Gb, 3.0Gb, 6Gb or 12Gb SAS or SATA or 4Gb Fibre Channel. If simply using a RUT or number based on RPM can cause issues particular with 2.5 vs. 3.5 or enterprise and desktop. For example, some current generation 10K 2.5 HDD can deliver the same or better performance than an older generation 3.5 15K. Other drive factors (see this link for HDD fundamentals) including physical size such as 3.5 inch or 2.5 inch small form factor (SFF), enterprise or desktop or consumer, amount of drive level cache (DRAM). Space capacity of a drive can also have an impact such as if all or just a portion of a large or small capacity devices is used. Not to mention what the drive is attached to ranging from in internal SAS or SATA drive bay, USB port, or a HBA or RAID adapter card or in a storage system.

disk iops
HDD fundamentals

 

How about benchmark and performance for marketing or comparison tricks including delayed, deferred or asynchronous writes vs. synchronous or actually committed data to devices? Lets not forget about short stroking (only using a portion of a drive for better IOP's) or even long stroking (to get better bandwidth leveraging spiral transfers) among others.

 

Almost forgot, there are also thick, standard, thin and ultra thin drives in 2.5 and 3.5 inch form factors. What's the difference? The number of platters and read write heads. Look at the following image showing various thickness 2.5 inch drives that have various numbers of platters to increase space capacity in a given density. Want to take a wild guess as to which one has the most space capacity in a given footprint? Also want to guess which type I use for removable disk based archives along with for onsite disk based backup targets (compliments my offsite cloud backups)?

 

types of disks
Thick, thin and ultra thin devices

 

Beyond physical and configuration items, then there are logical configuration including the type of workload, large or small IOPS, random, sequential, reads, writes or mixed (various random, sequential, read, write, large and small IO). Other considerations include file system or raw device, number of workers or concurrent IO threads, size of the target storage space area to decide impact of any locality of reference or buffering. Some other items include how long the test or workload simulation ran for, was the device new or worn in before use among other items.

 

Tools and the performance toolbox

Then there are the various tools for generating IO's or workloads along with recording metrics such as reads, writes, response time and other information. Some examples (mix of free or for fee) include Bonnie, Iometer, Iorate, IOzone, VdbenchTPC, SPC, Microsoft ESRP, SPEC and netmist, Swifttest, Vmark, DVDstore and PCmark 7 among many others. Some are focused just on the storage system and IO path while others are application specific thus exercising servers, storage and IO paths.

 

performance tools
Server, storage and IO performance toolbox

 

Having used Iometer since the late 90s, it has its place and is popular given its ease of use. Iometer is also long in the tooth and has its limits including not much if any new development, never the less, I have it in the toolbox.  I also have Futremark PCmark 7 (full version) which turns out has some interesting abilities to do more than exercise an entire Windows PC. For example PCmark can use a secondary drive for doing IO to.

 

PCmark can  be handy for spinning up with VMware (or other tools) lots of virtual Windows systems  pointing to a NAS or other shared storage device  doing real world type activity. Something that could be handy for testing or stressing virtual desktop infrastructures (VDI) along with other storage systems, servers and solutions. I also have Vdbench among others tools in the toolbox including Iorate which was used to drive the workloads shown below.

 

What I look for in a tool are how extensible are the scripting capabilities to define various workloads along with capabilities of the test engine. A nice GUI is handy which makes Iometer popular and yes there are script capabilities with Iometer. That is also where Iometer is long in the tooth compared to some of the newer generation of tools that have more emphasis on extensibility vs. ease of use interfaces. This also assumes knowing what workloads to generate vs. simply kicking off some IOPs using default settings to see what happens.

 

Another handy tool is for recording what's going on with a running system including IO's, reads, writes, bandwidth or transfers, random and sequential among other things. This is where when needed I turn to something like HiMon from HyperIO, if you have not tried it, get in touch with Tom West over at HyperIO and tell him StorageIO sent you to get a demo or trial. HiMon is what I used for doing start, stop and boot among other testing being able to see IO's at the Windows file system level (or below) including very early in the boot or shutdown phase.

 

Here is a link to some other things I did awhile back with HiMon to profile some Windows and VDI activity test profiling.

 

What's the best tool or benchmark or workload generator?

The one that meets your needs, usually your applications or something as close as possible to it.

 

disk iops
Various 2.5 and 3.5 inch HDD, HHDD, SSD with different performance

So how many IOP's can a device do?

That depends, however continue reading part II of this series to see some results for various types of drives and workloads.

 

Ok, nuff said (for now)

 

Cheers gs