Everything Is Not The Same Application Data Value Characteristics

 

This is part one of a five-part mini-series looking at Application Data Value Characteristics Everything Is Not The Same as a companion excerpt from  chapter 2 of my new book Software Defined Data Infrastructure  Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O  Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In  this post, we start things off by looking at general application server storage I/O characteristics that have an impact on data value as well as access.

 

Application Data Value Software Defined Data Infrastructure Essentials Book SDDC

 

Everything is not the same across different organizations including Information Technology  (IT) data centers, data infrastructures along with the applications as well as data they support. For example, there is so-called big data that can be many small files, objects, blobs or data and bit streams representing telemetry,  click stream analytics, logs among other information.

Keep in mind that applications impact how data is accessed, used, processed, moved and stored. What this means is that a focus on data value, access patterns, along with other related topics need to also consider application performance, availability, capacity, economic (PACE) attributes.

 

If everything is not the same, why is so much data along with many applications treated the same from a PACE perspective?

 

Data Infrastructure resources including servers, storage, networks might be cheap or inexpensive, however, there is a cost to managing them along with data.

 

Managing includes data protection (backup, restore, BC, DR, HA, security) along with other activities. Likewise, there is a cost to the software along with cloud services among others. By understanding how applications use and interact with data, smarter, more informed data management decisions can be made.

 

IT Applications and Data Infrastructure Layers
IT Applications and Data Infrastructure Layers

 

Keep in mind that everything is not the same across various  organizations, data centers, data  infrastructures, data and the applications that use them. Also keep in mind  that programs (e.g. applications) = algorithms (code) + data structures (how  data defined and organized, structured or unstructured).

 

There  are traditional applications, along with those tied to Internet of Things  (IoT), Artificial Intelligence (AI) and Machine Learning (ML), Big Data and  other analytics including real-time click stream, media and entertainment,  security and surveillance, log and telemetry processing among many others.

 

What  this means is that there are many different application with various character  attributes along with resource (server compute, I/O network and memory, storage  requirements) along with service requirements.

 

Common Applications Characteristics

Different  applications will have various attributes,  in general, as well as how they are used,  for example, database transaction  activity vs. reporting or analytics, logs and journals vs. redo logs, indices, tables, indices, import/export, scratch and temp space. Performance,  availability, capacity, and economics (PACE) describes the applications and  data characters and needs shown in the  following figure.

 

Application and data PACE attributes
Application PACE attributes (via Software Defined Data Infrastructure  Essentials)

 

All applications have PACE attributes, however:

  • PACE attributes vary by  application and usage
  • Some applications and their data  are more active than others
  • PACE characteristics may vary within different parts of an application

 

Think of applications along with associated data PACE as its  personality or how it behaves, what it does, how it does it, and when, along  with value, benefit, or cost as well as quality-of-service (QoS) attributes.

 

Understanding applications in different environments, including  data values and associated PACE attributes, is essential for making informed  server, storage, I/O decisions and data infrastructure decisions. Data  infrastructures decisions range from configuration to acquisitions or upgrades,  when, where, why, and how to protect, and how to optimize performance including  capacity planning, reporting, and troubleshooting, not to mention addressing  budget concerns.

 

Primary PACE attributes for active and inactive applications and data are:

P - Performance  and activity (how things get used)
A - Availability and durability (resiliency and data protection)
C - Capacity and space (what things use or occupy)
E - Economics  and Energy (people, budgets, and other  barriers)

 

Some applications need more performance (server computer,  or storage and network I/O), while others need space capacity (storage, memory,  network, or I/O connectivity). Likewise,  some applications have different availability needs (data protection,  durability, security, resiliency, backup,  business continuity, disaster recovery) that determine the tools, technologies, and techniques to use.

 

Budgets are also nearly always a concern, which for some applications means enabling more performance per cost while others are focused on maximizing space capacity and protection level per cost. PACE attributes also  define or influence policies for QoS (performance, availability, capacity), as well as thresholds, limits, quotas,  retention, and disposition, among others.

 

Performance and Activity (How Resources Get Used)

Some applications or components that comprise a larger solution will have more performance demands than others. Likewise,  the performance characteristics of applications along with their associated data will also vary. Performance applies to the server,  storage, and I/O networking hardware along with associated software and applications.

 

For servers, performance is focused on how much CPU  or processor time is used, along with memory and I/O operations. I/O operations to create, read, update, or delete  (CRUD) data include activity rate (frequency or data velocity) of I/O operations  (IOPS). Other considerations include the volume or amount of data being moved (bandwidth, throughput,  transfer), response time or latency, along with queue depths.

 

Activity is the amount of work to do or being done in a given amount of time (seconds, minutes, hours, days, weeks), which can be transactions, rates, IOPs. Additional performance considerations include latency, bandwidth, throughput, response time,  queues, reads or writes, gets or puts, updates, lists, directories, searches,  pages views, files opened, videos viewed, or downloads.
  
  Server,  storage, and I/O network performance include:

  • Processor CPU usage time and  queues (user and system overhead)
  • Memory usage effectiveness  including page and swap
  • I/O activity including between  servers and storage
  • Errors, retransmission, retries, and rebuilds

 

the  following figure shows a generic performance example of data being accessed  (mixed reads, writes, random, sequential, big, small, low and high-latency) on a local and a remote basis. The example  shows how for a given time interval (see lower right), applications are  accessing and working with data via different data streams in the larger image  left center. Also shown are queues and I/O handling along with end-to-end (E2E)  response time.

 

fundamental server storage I/O
Server I/O performance  fundamentals (via Software Defined  Data Infrastructure Essentials)

 

Click here to view a larger version of the above figure.

 

Also shown on the left in the above figure is an example of  E2E response time from the application through the various data infrastructure  layers, as well as, lower center, the response time from the server to the memory  or storage devices.

 

Various queues are shown in the middle of  the above figure which are indicators of how much work is occurring, if the processing is keeping up with  the work or causing backlogs. Context is  needed for queues, as they exist in the  server, I/O networking devices, and software drivers, as well as in storage  among other locations.

 

Some  basic server, storage, I/O metrics that matter include:

  • Queue depth of I/Os waiting to be processed and concurrency
  • CPU and memory usage to process  I/Os
  • I/O size, or how much data can be moved in a given operation
  • I/O activity rate or IOPs =  amount of data moved/I/O size per unit of time
  • Bandwidth = data moved per unit  of time = I/O size × I/O rate
  • Latency usually increases with  larger I/O sizes, decreases with smaller requests
  • I/O rates usually increase with  smaller I/O sizes and vice versa
  • Bandwidth increases with larger  I/O sizes and vice versa
  • Sequential stream access data  may have better performance than some random access data
  • Not all data is conducive to  being sequential stream, or random
  • Lower response  time is better, higher activity rates and bandwidth are better

 

Queues  with high latency and small I/O size or small I/O rates could indicate a  performance bottleneck. Queues with low latency and high I/O rates with good bandwidth  or data being moved could be a good  thing. An important note is to look at several metrics, not just IOPs or  activity, or bandwidth, queues, or response time. Also, keep in mind that metrics that matter for your environment  may be different from those for somebody else.

 

Something to keep in perspective is that there can be a large amount  of data with low performance, or a small  amount of data with high-performance, not to mention many other variations. The  important concept is that as space capacity scales, that does not mean  performance also improves or vice versa, after all, everything is not the same.

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI)  and related topics via the following links:

https://storageioblog.com/data-infrastructure-primer-overview/

 

SDDC Data Infrastructure

 

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

 

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various  organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. However all applications have some element (high or low) of performance, availability, capacity, economic (PACE) along with various similarities. Likewise data has different value at various times. Continue reading the next post (Part II Application Data Availability  Everything Is Not The Same) in this five-part mini-series here.

 

Ok, nuff said, for now.

Gs