Skip navigation

Server and Storage I/O Benchmarking and Performance Resources

 

server storage I/O trends

 

The following are a list of various articles, tips, post and other resources about server storage I/O performance benchmarking for legacy, virtual, cloud and software defined environments along with associated tools.

server storage I/O performance

The best server and storage I/O (input/output operation) is the one that you do not have to do, the second best is the one with the least impact.

 

server storage I/O locality of reference

 

This is where the idea of locality of reference (e.g. how close is the data to where your application is running) comes into play which is implemented via tiered memory, storage and caching shown in the figure above.

 

Cloud virtual software defined storage I/O
Server storage I/O performance applies to cloud, virtual, software defined and legacy environments

 

What this has to do with server storage I/O (and networking) performance benchmarking is keeping the idea of locality of reference, context and the application workload in perspective regardless of if cloud, virtual, software defined or legacy physical environments.

 

StorageIOblog: I/O, I/O how well do you know about good or bad server and storage I/Os?
StorageIOblog: Server and Storage I/O benchmarking 101 for smarties
StorageIOblog: How to test your HDD, SSD or all flash array (AFA) storage fundamentals
BizTech: 4 Ways to Performance-Test Your New HDD or SSD
EnterpriseStorageForum: Data Storage Benchmarking Guide
StorageSearch.com: How fast can your SSD run backwards?
OpenStack: How to calculate IOPS for Cinder Storage ?
StorageAcceleration: Tips for Measuring Your Storage Acceleration

 

server storage I/O STI and SUT


Spiceworks: Determining HDD SSD SSHD IOP Performance
Spiceworks: Calculating IOPS from Perfmon data
Spiceworks: profiling IOPs

 

vdbench server storage I/O benchmark
Vdbench example via StorageIOblog.com

 

StorageIOblog: What does server storage I/O scaling mean to you?
StorageIOblog: What is the best kind of IO? The one you do not have to do
StorageAcceleration: What, When, Why & How to Accelerate Storage
Filesystems.org: Various tools and links
StorageIOblog: Can we get a side of context with them IOPS and other storage metrics?

 

flash ssd and hdd

 

BrightTalk Webinar: Data Center Monitoring - Metrics that Matter for Effective Management
StorageIOblog: Enterprise SSHD and Flash SSD Part of an Enterprise Tiered Storage Strategy
StorageIOblog: Has SSD put Hard Disk Drives (HDD’s) On Endangered Species List?

 

server storage I/O bottlenecks and I/O blender

 

Microsoft TechNet: Measuring Disk Latency with Windows Performance Monitor (Perfmon)
Microsoft MSDN: List of Perfmon counters for sql server
Microsoft TechNet: Taking Your Server's Pulse
StorageIOblog: Part II: How many IOPS can a HDD, HHDD or SSD do with VMware?
CMG: I/O Performance Issues and Impacts on Time-Sensitive Applications

 

flash ssd and hdd

 

Virtualization Practice: IO IO it is off to Storage and IO metrics we go
InfoStor: Is HP Short Stroking for Performance and Capacity Gains?
StorageIOblog: Is Computer Data Storage Complex? It Depends
StorageIOblog: More storage and IO metrics that matter
StorageIOblog: Moving Beyond the Benchmark Brouhaha
Yellow-Bricks: VSAN VDI Benchmarking and Beta refresh!
 
   server storage I/O benchmark example
 
YellowBricks: VSAN performance: many SAS low capacity VS some SATA high capacity?
YellowBricsk: VSAN VDI Benchmarking and Beta refresh!
StorageIOblog: Seagate 1200 12Gbs Enterprise SAS SSD StorgeIO lab review
StorageIOblog: Part II: Seagate 1200 12Gbs Enterprise SAS SSD StorgeIO lab review
StorageIOblog: Server Storage I/O Network Benchmark Winter Olympic Games

 

flash ssd and hdd

 

VMware VDImark aka View Planner (also here, here and here) as well as  VMmark here
StorageIOblog: SPC and Storage Benchmarking Games
StorageIOblog: Speaking of speeding up business with SSD storage
StorageIOblog: SSD and Storage System Performance
 
  Hadoop server storage I/O performance
  Various Server Storage I/O tools in a hadoop environment
 
  Michael-noll.com: Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO
  Virtualization Practice: SSD options for Virtual (and Physical) Environments Part I: Spinning up to speed on SSD
  StorageIOblog: Storage and IO metrics that matter
  InfoStor: Storage Metrics and Measurements That Matter: Getting Started
  SilvertonConsulting: Storage throughput vs. IO response time and why it matters
  Splunk: The percentage of Read / Write utilization to get to 800 IOPS?
 
  flash ssd and hdd
  Various server storage I/O benchmarking tools
 
  Spiceworks: What is the best IO IOPs testing tool out there
  StorageIOblog: How many IOPS can a HDD, HHDD or SSD do?
  StorageIOblog: Some Windows Server Storage I/O related commands
  Openmaniak: Iperf overview and Iperf.fr: Iperf overview
  StorageIOblog: Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I and Part II)
  Dell/Quest: SQL Server Perfmon Poster (PDF)
  Server and Storage I/O Networking Performance Management (webinar)
  Data Center Monitoring - Metrics that Matter for Effective Management (webinar)
  Flash back to reality – Flash SSD Myths and Realities (Industry trends & benchmarking tips), (MSP CMG presentation)
  DBAstackexchange: How can I determine how many IOPs I need for my AWS RDS database?
  ITToolbox: Benchmarking the Performance of SANs
 
  server storage IO labs
 
  StorageIOblog: Dell Inspiron 660 i660, Virtual Server Diamond in the rough (Server review)
  StorageIOblog: Part II: Lenovo TS140 Server and Storage I/O Review (Server review)
  StorageIOblog: DIY converged server software defined storage on a budget using Lenovo TS140
  StorageIOblog: Server storage I/O Intel NUC nick knack notes First impressions (Server review)
  StorageIOblog & ITKE: Storage performance needs availability, availability needs performance
  StorageIOblog: Why SSD based arrays and storage appliances can be a good idea (Part I)
  StorageIOblog: Revisiting RAID storage remains relevant and resources

 

Interested in cloud and object storage visit our objectstoragecenter.com page, for flash SSD checkout storageio.com/ssd page, along with data protection, RAID, various industry links and more here.

Watch for additional links to be added above in addition to those that appear via comments.

 

Ok, nuff said (for now)

Cheers gs

Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I)

server storage I/O trends

 

This is part-one of a two-part post pertaining Microsoft Diskspd.that is also part of a broader  series focused on server storage I/O benchmarking, performance, capacity planning, tools and related technologies.

 

You can view part-two of this post here, along with companion links here.

Background

Many people use Iometer for creating synthetic (artificial) workloads to support benchmarking for testing, validation and other activities. While Iometer with its GUI is relatively easy to use and available across many operating system (OS) environments, the tool also has its limits. One of the bigger limits for Iometer is that it has become dated with little to no new development for a long time, while other tools including some new ones continue to evolve in functionality, along with extensibility. Some of these tools have optional GUI for easy of use or configuration, while others simple have extensive scripting and command parameter capabilities. Many tools are supported across different OS including physical, virtual and cloud, while others such as Microsoft Diskspd are OS specific.

 

Instead of focusing on Iometer and other tools as well as benchmarking techniques (we cover those elsewhere), lets focus on Microsoft Diskspd.

 

server storage I/O performance

What is Microsoft Diskspd?

 

Microsoft Diskspd is a synthetic workload generation (e.g. benchmark) tool that runs on various Windows systems as an alternative to Iometer, vdbench, iozone, iorate, fio, sqlio among other tools. Diskspd is a command line tool which means it can easily be scripted to do reads and writes of various I/O size including random as well as sequential activity. Server and storage I/O  can be buffered file system as well non-buffered across different types of storage and interfaces. Various performance and CPU usage information is provided to gauge the impact on a system when doing a given number of IOP's, amount of bandwidth along with response time latency.

What can Diskspd do?

 

Microsoft Diskspd creates synthetic benchmark workload activity with ability to define various options to simulate different application characteristics. This includes specifying read and writes, random, sequential, IO size along with number of threads to simulate concurrent activity. Diskspd can be used for testing or validating server and storage I/O systems along with associated software, tools and components. In addition to being able to specify different workloads, Diskspd can also be told which  processors to use (e.g. CPU affinity), buffering or non-buffered IO among other things.

What type of storage does Diskspd work with?

 

Physical and virtual storage including hard disk drive (HDD), solid state devices (SSD), solid state hybrid drives (SSHD) in various systems or solutions. Storage can be physical as well as partitions or file systems. As with any workload tool when doing writes, exercise caution to prevent accidental deletion or destruction of your data.

What information does Diskspd produce?

 

Diskspd provides output in text as well as XML formats. See an example of Diskspd output further down in this post.

Where to get Diskspd?

 

You can download your free copy of Diskspd from the Microsoft site here.

 

The download and installation are quick and easy, just remember to select the proper version for your Windows system and type of processor.

 

Another tip is to remember to set path environment variables point to where you put the Diskspd image.

 

Also stating what should be obvious, don't forget that if you are going to be doing any benchmark or workload generation activity on a system where  the potential for a data to be over-written or deleted, make sure you have a good backup and tested restore before you begin, if something goes wrong.

New to server storage I/O benchmarking or tools?

 

If you are not familiar with server storage I/O performance benchmarking or using various workload generation tools (e.g. benchmark tools), Drew Robb (@robbdrew) has a Data Storage Benchmarking Guide article over at Enterprise Storage Forum that provides a good framework and summary quick guide to server storage I/O benchmarking.

   

Via Drew:

   

Data storage benchmarking can be quite esoteric in that vast complexity awaits anyone attempting to get to the heart of a particular benchmark.

   

Case in point: The Storage Networking Industry Association (SNIA) has developed the Emerald benchmark to measure power consumption. This invaluable benchmark has a vast amount of supporting literature. That so much could be written about one benchmark test tells you just how technical a subject this is. And in SNIA’s defense, it is creating a Quick Reference Guide for Emerald (coming soon).

 

But rather than getting into the nitty-gritty nuances of the tests, the purpose of this article is to provide a high-level overview of a few basic storage benchmarks, what value they might have and where you can find out more.

 

Read more here including some of my comments, tips and recommendations.

 

In addition to Drew's benchmarking quick reference guide, along with the server storage I/O benchmarking tools, technologies and techniques resource page (here), check out this companion post as a primer for benchmarking and associated topics titled Server and Storage I/O Benchmarking 101 for Smarties.

How do you use Diskspd?


Tip: When you run Microsoft Diskspd it will create a file or data set on the device or volume being tested that it will do its I/O to, make sure that you have enough disk space for what will be tested (e.g. if you are going to test 1TB you need to have more than 1TB of disk space free for use). Another tip is to speed up the initializing (e.g. when Diskspd creates the file that I/Os will be done to) run as administrator.

 

Ok, nuff said (for now)

 

Cheers gs

Server Storage I/O Benchmarking 101 for Smarties or dummies

 

server storage I/O trends

 

This is the first of a series of posts and links to resources on server storage I/O performance and benchmarking (view more and follow-up posts here).

 

The best I/O is the I/O that you do not have to do, the second best is the one with the least impact as well as low overhead.

 

server storage I/O performance

 

Drew Robb (@robbdrew) has a Data Storage Benchmarking Guide article over at Enterprise Storage Forum that provides a good framework and summary quick guide to server storage I/O benchmarking.

   

Via Drew:

   

Data storage benchmarking can be quite esoteric in that vast complexity awaits anyone attempting to get to the heart of a particular benchmark.

   

Case in point: The Storage Networking Industry Association (SNIA) has developed the Emerald benchmark to measure power consumption. This invaluable benchmark has a vast amount of supporting literature. That so much could be written about one benchmark test tells you just how technical a subject this is. And in SNIA’s defense, it is creating a Quick Reference Guide for Emerald (coming soon).

 

But rather than getting into the nitty-gritty nuances of the tests, the purpose of this article is to provide a high-level overview of a few basic storage benchmarks, what value they might have and where you can find out more.

 

Read more here including some of my comments, tips and recommendations.

 

Drew's provides a good summary and overview in his article which is a great opener for this  first post in a series on server storage I/O benchmarking and related resources.

 

You can think of this series (along with Drew's article) as server storage I/O benchmarking fundamentals (e.g. 101) for smarties (e.g. non-dummies ).

 

Note that even if you are not a server, storage or I/O expert, you can still be considered a smarty vs. a dummy if you found the need or interest to read as well as learn more about benchmarking, metrics that matter, tools, technology and related topics.

Server and Storage I/O benchmarking 101

 

There are different reasons for benchmarking, such as, you might be asked or want to know how many IOPs per disk, Solid State Device (SSD), device or storage system such as for a 15K RPM (revolutions per minute) 146GB SAS Hard Disk Drive (HDD). Sure you can go to a manufactures website and look at the speeds and feeds (technical performance numbers) however are those metrics applicable to your environments applications or workload?

 

You might get higher IOPs with smaller IO size on sequential reads vs. random writes which will also depend on what the HDD is attached to. For example are you going to attach the HDD to a storage system or appliance with RAID and caching? Are you going to attach the HDD to a PCIe RAID card or will it be part of a server or storage system. Or are you simply going to put the HDD into a server or workstation and use as a drive without any RAID or performance acceleration.

 

What this all means is understanding what it is that you want to benchmark test to learn what the system, solution, service or specific device can do under different workload conditions.

 

Some benchmark and related topics include

    • What are you trying to benchmark
    • Why do you need to benchmark something
    • What are some server storage I/O benchmark tools
    • What is the best benchmark tool
    • What to benchmark, how to use tools
    • What are the metrics that matter
    • What is benchmark context why does it matter
    • What are marketing hero benchmark results
    • What to do with your benchmark results

 

server storage I/O benchmark step test
Example of a step test results with various workers and workload

 

  • What do the various metrics mean (can we get a side of context with them metrics?)
  • Why look at server CPU if doing storage and I/O networking tests
  • Where and how to profile your application workloads
  • What about physical vs. virtual vs. cloud and software defined benchmarking
  • How to benchmark block DAS or SAN, file NAS, object, cloud, databases and other things
  • Avoiding common benchmark mistakes
  • Tips, recommendations, things to watch out for
  • What to do next

 

server storage I/O trends

Where to learn more

 

The following are related links to read more about server (cloud, virtual and physical) storage I/O benchmarking tools, technologies and techniques.

 

Drew Robb's benchmarking quick reference guide
Server storage I/O benchmarking tools, technologies and techniques resource page
Server and Storage I/O Benchmarking 101 for Smarties.
Microsoft Diskspd download and Microsoft Diskspd overview (via Technet)
I/O, I/O how well do you know about good or bad server and storage I/Os?
Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I and Part II)

Wrap up and summary

 

We have just scratched the surface when it comes to benchmarking cloud, virtual and physical server storage I/O and networking hardware, software along with associated tools, techniques and technologies. However hopefully this and the links for more reading mentioned above give a basis for connecting the dots of what you already know or enable learning more about workloads, synthetic generation and real-world workloads, benchmarks and associated topics. Needless to say there are many more things that we will cover in future posts (e.g. keep an eye on and bookmark the server storage I/O benchmark tools and resources page here).

 

Ok, nuff said, for now...

 

Cheers gs

VMware announces vSphere V6 and associated virtualization technologies

server storage I/O trends

 

VMware has announced version 6 (V6) of its software  defined data center (SDDC) server virtualization hypervisor called vSphere aka  ESXi. In addition to a new version of its software defined server hypervisor along with companion software defined management and convergence tools.

 

VMware

VMware vSphere Refresh

 

As a refresh for those whose world does not revolve around  VMware, vSphere and software defined data centers (believe it or not there are  some who exist ;), ESXi is the hypervisor that virtualizes underlying physical  machines (PM's) known as hosts.

 

software defined data center convergence
The path to software defined data center convergence

 

Guest operating systems (or other hypervisors  using nesting) run as virtual machines (VM's) on top of the vSphere hypervisor host  (e.g. ESXi software). Various VMware management tools (or third-party) are used  for managing the virtualized data center from initial configuration,  configuration, conversion from physical to virtual (P2V) or virtual to virtual  (V2V) along with data protection, performance, capacity planning across  servers, storage and networks.

 

virtual machines

 

VMware vSphere is flexible and can adapt to different  sized environments from small office home office (SOHO) or small SMB, to large  SMB, SME, enterprise or cloud service provider. There are a free version of  ESXi along with paid versions that include support and added management  tool features. Besides the ESXi vSphere hypervisor, other commonly deployed  modules include the vCenter administration along with Infrastructure Controller  services platform among others. In addition, there are optional solution  bundles to add support for virtual networking, cloud (public and private), data  protection (backup/restore, replication, HA, BC, DR), big data among other  capabilities.

 

What is new with vSphere V6

 

VMware has streamlined the installation, configuration  and deployment of vSphere along with associated tools which for smaller  environments makes things simply easier. For the larger environments, having to  do less means being able to do more in the same amount of time which results in  cost savings. In addition to easier to use, deploy and configure, VMware has  extended the scaling capabilities of vSphere in terms of scaling-out (larger  clusters), scaling-up (more and larger servers), as well as scaling-down  (smaller environments and ease of use).

 

cloud virtual software defined servers

  • Compute: Expanded support for new hardware, guest  operating systems and general scalability in terms of physical, and virtual  resources. For example increasing the number of virtual CPU (vCPUs), number of  cluster nodes among other speeds and feeds enhancements.

server storage I/O vsan

  • Storage: This is an area where several enhancements were made including updates for Storage I/O controls (Storage QoS and  performance optimizations) with per VM reservations, NFS v4.1 with Kerberos  client, Virtual SAN (VSAN) improvements (new back-end underlying file system) as  well as new Virtual Volumes (vVOLs) for Storage Policy Based Management.
  • Availability: Improvements for vMotion (ability to live  move virtual machines between physical servers (vmware hosts) including long  distance fault-tolerance. Other improvements include faster replication, vMotion  across vCenter servers, and long distance vMotion (up to 100ms round trip time  latency).
  • Network: Network I/O Control (NIOC) provides per VM and  datastore (VM and data repository) bandwidth reservations for quality of  service (QoS) performance optimization.
  • Management: Improvements for multi-site, virtual data  centers, content-library (storage and versioning of files and objects including  ISOs and OVFs (Open Virtualization Format files) that can be on a VMFS (VMware  File System) datastore or NFS volume, policy-based management and web-client  performance enhancements.

What is vVOL?

 

The quick synopsis of VMware vVOL’s overview:

 

  • Higher level of abstraction of storage vs. traditional SCSI LUN’s or NAS NFS mount points
  • Tighter level of integration and awareness between VMware hypervisors and storage systems
  • Simplified management for storage and virtualization administrators
  • Removing complexity to support increased scaling
  • Enable automation and service managed storage aka software defined storage management

 

server storage I/O volumes

  How data storage access and managed via VMware today (read more here)

 

vVOL’s are not LUN’s like regular block (e.g. DAS or SAN) storage that use SAS, iSCSI, FC, FCoE, IBA/SRP, nor are they NAS volumes like NFS mount points. Likewise vVOL’s are not accessed using any of the various object storage access methods mentioned above (e.g. AWS S3, Rest, CDMI, etc) instead they are an application specific implementation. For some of you this approach of an applications specific or unique storage access method may be new, perhaps revolutionary, otoh, some of you might be having a DejaVu moment right about now.

 

vVOL is not a LUN in the context of what you may know and like (or hate, even if you have never worked with them), likewise it is not a NAS volume like you know (or have heard of), neither are they objects in the context of what you might have seen or heard such as S3 among others.

 

Keep in mind that what makes up a VMware virtual machine are the VMK, VMDK and some other files (shown in the figure below), and if enough information is known about where those blocks of data are or can be found, they can be worked upon. Also keep in mind that at least near-term, block is the lowest common denominator that all file systems and object repositories get built-up.

 

server storage I/O vVOL basics
How VMware data storage accessed and managed with vVOLs (read more here)

 

Here is the thing, while vVOL’s will be accessible via a block interface such as iSCSI, FC or FCoE or for that matter, over Ethernet based IP using NFS. Think of these storage interfaces and access mechanisms as the general transport for how vSphere ESXi will communicate with the storage system (e.g. their data path) under vCenter management.

 

What is happening inside the storage system that will be presented back to ESXi will be different than a normal SCSI LUN contents and only understood by VMware hypervisor. ESXi will still tell the storage system what it wants to do including moving blocks of data. The storage system however will have more insight and awareness into the context of what those blocks of data mean. This is how the storage systems will be able to more closely integrate snapshots, replication, cloning and other functions by having awareness into which data to move, as opposed to moving or working with an entire LUN where a VMDK may live.

 

Keep in mind that the storage system will still function as it normally would, just think of vVOL as another or new personality and access mechanism used for VMware to communicate and manage storage. Watch for vVOL storage provider support from the who's who of existing and startup storage system providers including Cisco, Dell, EMC, Fujitsu, HDS, HP, IBM, NetApp, Nimble  and many others. Read more about Storage I/O fundamentals here and vVOLs here and here.

What this announcement means

 

Depending on your experiences, you might use  revolutionary to describe some of the VMware vSphere V6 features and  functionalities. Otoh, if you have some Dejavu moments looking pragmatically at  what VMware is delivering with V6 of vSphere executing on their vision,  evolutionary might be more applicable. I will leave it up to you do decide if  you are having a Dejavu moment and what that might pertain to, or if this is  all new and revolutionary, or something more along the lines of  technolutionary.

 

VMware continues to execute delivering on the Virtual  Data Center aka Software Defined Data Center paradigm by increasing functionality,  as well as enhancing existing capabilities with performance along with resiliency  improvements. These abilities enable the aggregation of compute, storage,  networking, management and policies for enabling a global virtual data center  while supporting existing along with new emerging applications.

Where to learn more

 

If you were not part of the beta to gain early hands-on  experience with VMware vSphere V6 and associated technologies, download a copy  to check it out as part of making your upgrade or migration plans.

 

VMware communities

 

Check out the various VMware resources including communities  links here
VMware vSphere Hypervisor getting started and general vSphere information (including download)
VMware vSphere data sheet, compatibility guide along with speeds and feeds (size and other limits)

VMware Blogs and VMware vExpert page

Various fellow VMware vExpert blogs including among many others vsphere-land, scott lowe, virtuallyghetto and yellow-bricks among many others found at the vpad here.
StorageIO Out and About Update – VMworld 2014 (with  Video)
VMware vVOL’s and storage I/O fundamentals (Storage I/O overview and vVOL,  details Part I and Part II)
How many IOPs can a HDD or SSD do in a VMware environment (Part I and Part II)
VMware VSAN overview and primer, DIY converged software defined storage on a budget

 

Wrap up and summary

 

Overall VMware vSphere V6 has a great set of features that support both ease of management for small environments as well as the scaling needs of larger organizations.

 

Ok, nuff said, for now...

Cheers gs

How to test your HDD, SSD or all flash array (AFA) storage fundamentals

Storage I/O trends

 

Over at BizTech Magazine I have a new article4 Ways to Performance-Test Your New HDD or SSDthat provides a quick guide to verifying or learning what the speed characteristic of your new storage device are capable of.

 

BizTech Magazine

 

An out-takefrom the articleused by BizTech as a "tease" is:

   

These four steps will help you evaluate new storage drives. And … psst … we included the metrics that matter.

Building off the basics, server storage I/O benchmark fundamentals

 

The four basic steps in the article are:

  • Plan what and how you are going to test (what's applicable for you)
  • Decide on a benchmarking tool (learn aboutvarious tools here)
  • Test the test (find bugs, errors before a long running test)
  • Focus on metrics that matter (what's important for your environment)

 

Server Storage I/O performance

What this means and where to learn more

 

To some the above (read the full article here) may seem like common sense tips and things everybody should know otoh there are many people who are new to servers storage I/O networking hardware software cloud virtual along with various applications, not to mention different tools.

 

Thus the above is a refresher for some (e.g. Dejavu) while for others it might be new and revolutionary or simply helpful. Interested in HDD's,SSD'sas well as other server storage I/O performance along with benchmarking tools, techniques and trends check out the collection oflinks here(Server and Storage I/O Benchmarking and Performance Resources).

 

Ok, nuff said, for now...

Cheers gs

Storage I/O trends

If you are focused only on cost you might miss other cloud storage benefits

 

Drew Robb (@robbdrew) has a good piece (e.g. article) over at InfoStor titled Eight Ways to Avoid Cloud Storage Pricing Surprises that you can read here.

Drew start's his piece out with this nice analogy or story:

   

Let’s begin with a cautionary tale about pricing: a friend hired a moving company as they quoted a very attractive price for a complex move. They lured her in with a low-ball price then added more and more “extras” to the point where their price ended up higher than many of the other bids she passed up. And to make matters worse, they are already two weeks late with delivery of the furniture and are saying it might take another two weeks.

 

Drew extends his example in his piece to compare how some cloud providers may start with pricing as low as some amount only for the customer to be surprised when they did not do their homework to learn about the various fees.

 

Note that most reputable cloud providers do not hide their fees even though there are myths that all cloud vendors have hidden fees, instead they list what those costs are on their sites. However that means the smart shopper or person procuring cloud services needs to go look for those fee's and what they mean to avoid surprises. On the other hand if you can not find what extra fee's would be along with what is or is not included in a cloud service price,  to quote Jenny's line in the movie Forest Gump, "...Run, Forest! Run!...".

 

In Drew's piece he mentions five general areas to keep an eye on pertaining cloud storage costs including:

  • Be Duly Diligent
  • Trace Out Application Interaction
  • Avoid Fixed Usage Rates
  • Beware Lowballing
  • Demand Enterprise Visibility

Beware Lowballing

 

In Drew's piece, he includes a comment from myself shown below.

   

Just as in the moving business, lowballing is alive and well in cloud pricing. Greg Schulz, an analyst with StorageIO Group, warned users to pay attention to services that have very low-cost per GByte/TByte yet have extra fees and charges for use, activity or place service caps. Compare those with other services that have higher base fees and attempt to price it based on your real storage and usage patterns.             “Watch out for usage and activity fees with lower cost services where you may get charged for looking at or visiting your data, not to mention for when you actually need to use it,” said Schulz. “Also be aware of limits or caps on performance that may apply to a particular class of service.”

 

 

As a follow-up to Drew's good article, I put together the following thoughts that appeared earlier this year over  at InfoStor titled Cloud storage: Is It All About Cost? that you can read here. In that article I start out with the basic question of:

   

So what is your take on cloud storage, and in what context?           Is cloud storage all about removing cost, cost cutting, free storage?

 

Or perhaps even getting something else in addition to free storage?

 

I routinely talk with different people from various  backgrounds, environments from around the world, and the one consistency I hear  when it comes to cloud services including storage is that there is no consistency.

 

What I mean by this is that there are the cloud crowd  cheerleaders who view or cheer for anything cloud related, some of them  actually use the cloud vs. simply cheering.

What does this have to do with cloud costs

 

Simple, how do you know if cloud is cheaper or more  expensive if you do not know your own costs?

 

How do you know if cloud storage is  available, reliable, durable if you do not have a handle on your environment?

 

Are you making apples to oranges comparisons or simple  trading or leveraging hype and fud for or against?

   

Similar to regular storage, how you choose to use and configure on-site traditional storage for high-availability, performance, security among other best practices should be applied to cloud solutions. After all, only you can prevent cloud (or on premise) data loss, granted it is a shared responsibility. Shared responsibility means your service provider or system vendor needs to deliver quality robust solution that you can then take responsibility for configure to use with resiliency.

 

For some of you perhaps cloud might be about lowering, reducing  or cutting storage costs, perhaps even getting some other service(s) in  addition to free storage.

 

On the other hand, some of you might be using cloud as another class of cloud storage (e.g. AWS EBS) are those intended or optimized to be accessed from within a cloud via cloud servers or compute instances (e.g. AWS EC2 among others) vs. those that are optimized for both inside the cloud as well as outside the cloud access (e.g. AWS S3 or Glacier with costs shown here). I am using AWS examples; however, you could use Microsoft Azure (pricing shown here), Google (including their new Nearline service with costs shown here), Rackspace, (calculator here or other cloud files pricing here), HP Cloud (costs shown here), IBM Softlayer (object storage costs here) and many others.

 

Not all types of cloud storage are the same, which is similar to traditional storage you may be using or have used in your environment in the past. For example, there is high-capacity low-cost storage, including magnetic tape for data protection, archiving of in-active data along with near-line hard disk drives (HDD). There are different types of HDDs, as well as fast solid-state devices (SSD) along with hybrid or SSHD storage used for different purposes. This is where some would say the topic of cloud storage is highly complex.

Where to learn more

 

Data Protection Diaries
Cloud Conversations: AWS overview and primer)
Only you can prevent cloud data loss
Is Computer Data Storage Complex? It Depends
Eight Ways to Avoid Cloud Storage Pricing Surprises
Cloud and Object Storage Center
Cloud Storage: Is It All About Cost?
Cloud conversations: Gaining cloud confidence from insights into AWS outages (Part II)
Given outages, are you concerned with the security of the cloud?
Is the cost of cloud storage really cheaper than traditional storage?
Are more than five nines of availability really possible?
What should I look for in an enterprise file sync-and-share app?
How do primary storage clouds and cloud for backup differ?
What should I consider when using SSD cloud?
What's most important to know about my cloud privacy policy?
Data Archiving: Life Beyond Compliance
My copies were corrupted: The 3-2-1 rule
Take a 4-3-2-1 approach to backing up data

What this means

 

In my opinion there are cheap clouds (products, services, solutions) and there are low-cost options as well as there are value and premium offerings. Avoid confusing value with cheap or low-cost as something might have a higher cost, however including more capabilities or fees included that if useful can be more value. Look beyond the up-front cost aspects of clouds also considering ongoing recurring fees for actually using a server or solution.

 

If you can find low-cost storage at or below a penny per GByte per month that could be a good value if it also includes many free access, retrieval GETS head and lists for management or reporting. On the other hand, if you find a service that is at or below a penny per GByte per month however charges for any access including retrieval, as well as network bandwidth fees along with reporting, that might not be as good of a value.

 

Look beyond the basic price and watch out for statements like "...as low as..." to understand what is required to get that "..as low as.." price. Also understand what the extra fee's are which most of the reputable providers list these on their sites, granted you have to look for them. If you are already using cloud services, pay attention to your monthly invoices and track what you are paying for to avoid surprises.

   

From my InfoStor piece:

   

For cloud storage, instead of simply focusing on lowest cost of storage per capacity, look for value, along with ability to configure or use with as much resiliency as you need. Value will mean different things depending on your needs and cloud storage servers, yet the solution should be cost-effective with availability including durability, secure and applicable performance.

 

Shopping for cloud servers and storage is similar to acquiring regular servers and storage in that you need to understand what you are acquiring along with up-front and recurring fee's to understand the total cost of ownership and cost of operations not to mention making apples to apples vs. apples to oranges comparisons.

 

Btw, instead of simply using lower cost cloud services to cut cost, why not also use those capabilities to create or park another copy of your important data somewhere else just to be safe...

 

What say you about cloud costs?

 

Ok, nuff said, for now...

Cheers gs

Storage I/O trends

Gathering Transaction Per Minute Metrics from SQL Server and HammerDB

 

When using benchmark or workload generation tools such as HammerDB I needed a way to capture and log performance activity metrics such as transactions per minute. For example using HammerDB to simulate an application making database requests performing various transactions as part of testing an overall system solution including server and storage I/O activity. This post takes a look at the problem or challenge I was looking to address, as well as creating a solution after spending time searching for one (still searching btw).

 

The Problem, Issue, Challenge, Opportunity and Need

 

The challenge is to collect application performance such as transactions per minute from a workload using a database. The workload or benchmark tool (in this case HammerDB) is the System Test Initiator (STI) that drives the activity (e.g. database requests) to a System Under Test (SUT). In this example the SUT is a Microsoft SQL Server running on a Windows 2012 R2 server. What I need is to collect and log into a file for later analysis the transaction rate per minute while the STI is generating a particular workload.

 

Server Storage I/O performance

Understanding the challenge and designing a strategy

 

If you have ever used benchmark or workload generation tools such as  Dell/Quest Benchmark Factory (part of the  Toad tools collection) you might be spoiled with how it can be used to not only generate the workload, as well as collect, process, present and even store the results for database workloads such as  TPC simulations. In this situation,  Transaction Processing Council (TPC) like workloads need to be run and metrics on performance collected. Lets leave Benchmark Factory for a future discussion and focus instead on a free tool called  HammerDB and more specifically how to collection transactions per minute metrics from  Microsoft SQL Server. While the focus is SQL Server, you can easily adapt the approach for  MySQL among others, not to mention there are tools such as  SysbenchAerospike among  other tools.

 

The following image (created using my  Livescribe Echo digital pen) outlines the problem, as well as sketches out a possible solution design. In the following figure, for my solution I'm going to show how to grab every minute for a given amount of time the count of transactions that have occurred. Later in the post processing (you could also do in the SQL Script) I take the new transaction count (which is cumulative) and subtract the earlier interval which yields the transactions per minute (see examples later in this post).

collect TPM metrics from SQL Server with hammerdb
The problem and challenge, a way to collect Transactions Per Minute (TPM)

Finding a solution

 

HammerDB displays results via its GUI, and perhaps there is a way or some trick to get it to log results to a file or some other means, however after searching the web, found that it was quicker to come up with solution. That solution was to decide how to collect and report the transactions per minute (or you could do by second or other interval) from Microsoft SQL Server. The solution was to find what performance counters and metrics are available from SQL Server, how to collect those and log them to a file for processing. What this means is a SQL Server script file would need to be created that ran in a loop collecting for a given amount of time at a specified interval. For example once a minute for several hours.

 

Taking action

 

The following is a script that I came up with that is far from optimal however it gets the job done and is a starting point for adding more capabilities or optimizations.

 

In the following example, set loopcount to some number of minutes to collect samples for. Note however that if  you are running a workload test for eight (8) hours with a 30 minute ramp-up time, you would want to use a loopcount (e.g. number of minutes to collect for) of 480 + 30 + 10. The extra 10 minutes is to allow for some samples before the ramp and start of workload, as well as to give a pronounced end of test number of samples. Add or subtract however many minutes to collect for as needed, however keep this in mind, better to collect a few extra minutes vs. not have them and wished you did.

-- Note and disclaimer:
-- 
-- Use of this code sample is at your own risk with Server StorageIO and UnlimitedIO LLC
-- assuming no responsibility for its use or consequences. You are free to use this as is
-- for non-commercial scenarios with no warranty implied. However feel free to enhance and
-- share those enhancements with others e.g. pay it forward.
-- 
DECLARE @cntr_value bigint;
DECLARE @loopcount bigint; # how many minutes to take samples for

set @loopcount = 240

SELECT @cntr_value = cntr_value
 FROM sys.dm_os_performance_counters
 WHERE counter_name = 'transactions/sec'
 AND object_name = 'MSSQL$DBIO:Databases'
 AND instance_name = 'tpcc' ; print @cntr_value;
 WAITFOR DELAY '00:00:01'
-- 
-- Start loop to collect TPM every minute
-- 

while @loopcount <> 0
begin
SELECT @cntr_value = cntr_value
 FROM sys.dm_os_performance_counters
 WHERE counter_name = 'transactions/sec'
 AND object_name = 'MSSQL$DBIO:Databases'
 AND instance_name = 'tpcc' ; print @cntr_value;
 WAITFOR DELAY '00:01:00'
 set @loopcount = @loopcount - 1
end
-- 
-- All done with loop, write out the last value
-- 
SELECT @cntr_value = cntr_value
 FROM sys.dm_os_performance_counters
 WHERE counter_name = 'transactions/sec'
 AND object_name = 'MSSQL$DBIO:Databases'
 AND instance_name = 'tpcc' ; print @cntr_value;
-- 
-- End of script
-- 

 

The above example has loopcount set to 240 for a 200 minute test with a 30 minute ramp and 10 extra minutes of samples. I use the a couple of the minutes to make sure that the system test initiator (STI) such as  HammerDB is configured and ready to start executing transactions. You could also put this along with your HammerDB items into a script file for further automation, however I will leave that exercise up to you.

 

For those of you familiar with SQL and SQL Server you probably already see some things to improve or stylized or simply apply your own preference which is great, go for it. Also note that I'm only selecting a certain variable from the performance counters as there are many others which you can easily discovery with a couple of SQL commands (e.g. select and specify database instance and object name. Also note that the key is accessing the items in sys.dm_os_performance_counters of your SQL Server database instance.

 

The results

 

The output from the above is a list of cumulative numbers as shown below which you will need to post process (or add a calculation to the above script). Note that part of running the script is specifying an output file which I show later.

785
785
785
785
37142
1259026
2453479
3635138

Implementing the solution

 

You can setup the above script to run as part of a larger automation shell or batch script, however for simplicity I'm showing it here using Microsoft SQL Server Studio.

 

SQL Server script to collect TPM
Microsoft SQL Server Studio with script to collect Transaction Per Minute (TPM)

 

The following image shows how to specify an output file for the results to be logged to when using Microsoft SQL Studio to run the TPM collection script.

 

Specify SQL Server tpm output file
Microsoft SQL Server Studio specify output file

 

With the SQL Server script running to collect results, and HammerDB workload running to generate activity, the following shows Dell Spotlight on Windows (SoW) displaying WIndows Server 2012 R2 operating system level performance including CPU, memory, paging and other activity. Note that this example had about the system test initiator (STI) which is HammerDB and the system under test (SUT) that is Microsoft SQL Server on the same server.

 

Spotlight on Windows while SQL Server doing tpc
Dell Spotlight on Windows showing Windows Server performance activity

Results and post-processing

 

As part of post processing simple use your favorite tool or script or what I often do is pull the numbers into Excel spreadsheet, and simply create a new column of numbers that computes and shows the difference between each step (see below). While in Excel then I plot the numbers as needed which can also be done via a shell script and other plotting tools such as  R.

 

In the following example, the results are imported into Excel (your favorite tool or script) where I then add a column (B) that simple computes the difference between the existing and earlier counter. For example in cell B2 = A2-A1, B3 = A3-A2 and so forth for the rest of the numbers in column A. I then plot the numbers in column B to show the transaction rates over time that can then be used for various things.

 

Hammerdb TPM results from SQL Server processed in Excel
Results processed in Excel and plotted

 

Note that in the above results that might seem too good to be true they are, these were cached results to show the tools and data collection process as opposed to the real work being done, at least for now...

Where to learn more

 

Here are some extra links to have a look at:

How to test your HDD, SSD or all flash array (AFA) storage fundamentals
Server and Storage I/O Benchmarking 101 for Smarties
Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I)
The SSD Place (collection of flash and SSD resources)
Server and Storage I/O Benchmarking and Performance Resources
I/O, I/O how well do you know about good or bad server and storage I/Os?

What this all means and wrap-up

 

There are probably many ways to fine tune and optimize the above script, likewise there may even be some existing tool, plug-in, add-on module, or configuration setting that allows HammerDB to log the transaction activity rates to a file vs. simply showing on a screen. However for now, this is a work around that I have found for when needing to collect transaction activity performance data with HammerDB and SQL Server.

 

Ok, nuff said, for now...

 

Cheers gs

Storage I/O trends

Cloud Conversations: AWS S3 Cross Region Replication storage enhancements

Amazon Web Services (AWS) recently among other enhancements announced new Simple Storage Service (S3) cross-region replication of objects from a bucket (e.g. container) in one region to a bucket in another region. AWS also recently enhanced Elastic Block Storage (EBS)   increasing maximum performance and size of Provisioned IOPS (SSD) and General Purpose (SSD) volumes.  EBS enhancements included ability to store up to 16 TBytes of data in a single volume and do 20,000  input/output operations per second (IOPS). Read more about EBS and other recent AWS server, storage I/O and application enhancements here.

 

Amazon Web Services AWS

The Problem, Issue, Challenge, Opportunity and Need

The challenge is being able to move data (e.g. objects) stored in AWS buckets in one region to another in a safe, secure, timely, automated, cost-effective way.

Even though AWS has a global name-space, buckets and their objects (e.g. files, data, videos, images, bit and byte streams) are stored in a specific region designated by the customer or user (AWS S3, EBS, EC2, Glacier, Regions and Availability Zone primer can be found here).

 

aws regions architecture

Understanding the challenge and designing a strategy

The following diagram shows the challenge and how to copy or replicate objects in an S3 bucket in one region to a destination bucket in a different region. While objects can be copied or replicated without S3 cross-region replication, that involves essentially reading your objects pulling that data out via the internet and then writing to another place. The catch is that this can add extra costs, take time, consume network bandwidth and need extra tools (Cloudberry, Cyberduck, S3fuse, S3motion, S3browser, S3 tools (not AWS) and a long list of others).
  aws cross region replication

What is AWS S3 Cross-region replication

 

Highlightsof AWS S3 Cross-region replication include:

  • AWS S3 Cross region replication is as its name implies, replication of S3 objects from a bucket in one region to a destination bucket in another region.
  • S3 replication of new objects added to an existing or new bucket (note new objects get replicated)
  • Policy based replication tied into S3 versioning and life-cycle rules
  • Quick and easy to set up for use in a matter of minutes via S3 dashboard or other interfaces
  • Keeps region to region data replication and movement within AWS networks (potential cost advantage)

 

To activate, you simply enable versioning on a bucket, enable cross-region replication, indicate source bucket (or prefix of objects in bucket), specify destination region and target bucket name (or create one), then create or select an IAM (Identify Access Management) role and objects should be replicated.

 

Some AWS S3 cross-region replication things to keep in mind (e.g. considerations):

  • As with other forms of mirroring and replication if you add something on one side it gets replicated to other side
  • As with other forms of mirroring and replication if you deleted something from the other side it can be deleted on both (be careful and do some testing)
  • Keep costs in perspective as you still need to pay for your S3 storage at both locations as well as applicable internal data transfer and GET fees
  • Click here to see current AWS S3 fees for various regions

S3 Cross-region replication and alternative approaches

There are several regions around the world and up until today AWS customers could copy, sync or replicate S3 bucket contents between AWS regions manually (or via automation) using various tools such as Cloudberry, Cyberduck, S3browser and S3motion to name just a few as well as via various gateways and other technologies. Some of those tools and technologies are open-source or free, some are freemium and some are premium for a few that also vary by interface (some with GUI, others with CLI or APIs) including ability to mount an S3 bucket as a local network drive and use tools to sync or copy.

 

However a catch with the above mentioned tools (among others) and approaches is that to replicate your data (e.g. objects in a bucket) can involve other AWS S3 fees. For example reading data (e.g. a GET which has a fee) from one AWS region  and then copying out to the internet has fees. Likewise when copying data into another AWS S3 region (e.g. a PUT which are free) there is also the cost of storage at the destination.

 

Storage I/O trends

AWS S3 cross-region hands on experience (first look)

For my first hands on (first look) experience with AWS cross-region replication today I enabled a bucket in the US Standard region (e.g. Northern Virginia) and created a new target destination bucket in the EU Ireland. Setup and configuration was very quick, literally just a few minutes with most of the time spent reading the text on the new AWS S3 dashboard properties configuration displays.

 

I selected an existing test bucket to replicate and noticed that nothing had replicated over to the other bucket until I realized that new objects would be replicated. Once some new objects were added to the source bucket within a matter of moments (e.g. few minutes) they appeared across the pond in my EU Ireland bucket. When I deleted those replicated objects from my EU Ireland bucket and switched back to my view of the source bucket in the US, those new objects were already deleted from the source. Yes, just like regular mirroring or replication, pay attention to how you have things configured (e.g. synchronized vs. contribute vs. echo of changes etc.).

 

While I was not able to do a solid quantifiable performance test, simply based on some quick copies and my network speed moving via S3 cross-region replication was faster than using something like s3motion with my server in the middle.

 

It also appears from some initial testing today that a benefit of AWS S3 cross-region replication (besides being bundled and part of AWS) is that some fees to pull data out of AWS and transfer out via the internet can be avoided.

 

Amazon Web Services AWS

Where to learn more

 

Here are some links to learn more about AWS S3 and related topics

What this all means and wrap-up

For those who are looking for a way to streamline replicating data (e.g. objects) from an AWS bucket in one region with a bucket in a different region you now have a new option. There are potential cost savings if that is your goal along with performance benefits in addition to using what ever might be working in your environment. Replicating objects provides a way of expanding your business continuance (BC), business resiliency (BR) and disaster recovery (DR) involving S3 across regions as well as a means for content cache or distribution among other possible uses.

 

Overall, I like this ability for moving S3 objects within AWS, however I will continue to use other tools such as S3motion and s3sfs for moving data in and out of AWS as well as among other public cloud serves and local resources.

 

Ok, nuff said, for now..

Cheers gs

Storage I/O trends

Data Protection Diaries: Are your restores ready for World Backup Day 2015?

This is part of an ongoing data protection diaries series of post about, well, data protection and what I'm doing pertaining to World Backup Day 2015.

 

In case you forgot or did not know, World Backup Day is March 31 2015 (@worldbackupday) so now is a good time to be ready. The only challenge that I have with the World Backup Day (view their site here) that has gone on for a few years know is that it is a good way to call out the importance of backing up or protecting data. However its time to also put more emphasis and focus on being able to make sure those backups or protection copies actually work.

 

By this I mean doing more than making sure that your data can be read from tape, disk, SSD or cloud service actually going a step further and verifying that restored data can actually be used (read, written, etc).

 

The Problem, Issue, Challenge, Opportunity and Need

The problem, issue and challenges are simple, are your applications, systems and data protected as well as can you use those protection copies (e.g. backups, snapshots, replicas or archives) when as well as were needed?

storage I/O data protection

The opportunity is simple, avoiding downtime or impact to your business or organization by being proactive.

Understanding the challenge and designing a strategy

The following is my preparation checklist for World Backup Data 2015 (e.g. March 31 2015) which includes what I need or want to protect, as well as some other things to be done including testing, verification, address (remediate or fix) known issues while identifying other areas for future enhancements. Thus perhaps like yours, data protection for my environment which includes physical, virtual along with cloud spanning servers to mobile devices is constantly evolving.

collect TPM metrics from SQL Server with hammerdb
My data protection preparation, checklist and to do list

Finding a solution

While I already have a strategy, plan and solution that encompasses different tools, technologies and techniques, they are also evolving. Part of the evolving is to improve while also exploring options to use new and old things in new ways as well as eat my down dog food or walk the talk vs. talk the talk. The following figure provides a representation of my environment that spans physical, virtual and clouds (more than one) and how different applications along with systems are protected against various threats or risks. Key is that not all applications and data are the same thus enabling them to be protected in different ways as well as over various intervals. Needless to say there is more to how, when, where and with what different applications and systems are protected in my environment than show, perhaps more on that in the future.

server storageio and unlimitedio data protection
Some of what my data protection involves for Server StorageIO

Taking action

What I'm doing is going through my checklist to verify and confirm the various items on the checklist as well as find areas for improvement which is actually an ongoing process.

 

Do I find things that need to be corrected?

 

Yup, in fact found something that while it was not a problem, identified a way to improve on a process that will once fully implemented enabler more flexibility both if a restoration is needed, as well as for general everyday use not to mention remove some complexity and cost.

 

Speaking of lessons learned, check this out that ties into why you want 4 3 2 1 based data protection strategies.

 

Storage I/O trends

Where to learn more

 

Here are some extra links to have a look at:

Data Protection Diaries
Cloud conversations: If focused on cost you might miss other cloud storage benefits
5 Tips for Factoring Software into Disaster Recovery Plans
Remote office backup, archiving and disaster recovery for networking pros
Cloud conversations: Gaining cloud confidence from insights into AWS outages (Part II)
Given outages, are you concerned with the security of the cloud?
Data Archiving: Life Beyond Compliance
My copies were corrupted: The 3-2-1 rule
Take a 4-3-2-1 approach to backing up data
Cloud and Virtual Data Storage Networks- Chapter 8 (CRC/Taylor and Francis)

What this all means and wrap-up

Be prepared, be proactive when it comes to data protection and business resiliency vs. simply relying reacting and recovering hoping that all will be ok (or works).

 

Take a few minutes (or longer) and test your data protection including backup to make sure that you can:

a) Verify that in fact they are working protecting applications and data in the way expected

b) Restore data to an alternate place (verify functionality as well as prevent a problem)

c) Actually use the data meaning it is decrypted, inflated (un-compressed, un-de duped) and security certificates along with ownership properties properly applied

d) Look at different versions or generations of protection copies if you need to go back further in time

e) Identify area of improvement or find and isolate problem issues in advance vs. finding out after the fact

 

Time to get back to work checking and verifying things as well as attending to some other items.

 

Ok, nuff said, for now...

Cheers gs