This is part two of a five-part mini-series looking at Application Data Value Characteristics everything is not the same as a companion excerpt from  chapter 2 of my new book Software Defined Data Infrastructure  Essentials – Cloud, Converged and Virtual Fundamental Server Storage I/O  Tradecraft (CRC Press 2017). available at Amazon.com and other global venues. In  this post, we continue looking at application performance, availability, capacity, economic (PACE) attributes that have an impact on data value as well as availability.

 

4 3 2 1 data protection  Book SDDC

Availability (Accessibility, Durability, Consistency)

Just as  there are many different aspects and focus areas for performance, there are also several facets to availability. Note that applications  performance requires availability and availability relies  on some level of performance.

 

Availability is a broad and encompassing area that includes data protection to  protect, preserve, and serve (backup/restore, archive, BC, BR, DR, HA) data and  applications. There are logical and physical aspects of availability including  data protection as well as security including key management (manage your keys  or authentication and certificates) and permissions, among other things.

 

Availability  = accessibility (can you get to your application and data) + durability (is the  data intact and consistent). This includes  basic Reliability, Availability, Serviceability (RAS), as well as high  availability, accessibility, and durability. “Durable”  has multiple meanings, so context is  important. Durable means how data infrastructure resources hold up to, survive,  and tolerate wear and tear from use (i.e., endurance), for example, Flash SSD or mechanical devices such as Hard Disk  Drives (HDDs). Another context for durable refers to data, meaning how many  copies in various places.

 

Server,  storage, and I/O network availability topics include:

  • Resiliency and self-healing to tolerate  failure or disruption
  • Hardware, software, and services  configured for resiliency
  • Accessibility to reach or be reached for handling work
  • Durability and consistency of  data to be available for access
  • Protection of data, applications, and assets including security

 

Additional server  I/O and data infrastructure along with storage topics include:

  • Backup/restore, replication,  snapshots, sync, and copies
  • Basic Reliability, Availability, Serviceability, HA, fail over, BC,  BR, and DR
  • Alternative paths, redundant components, and associated software
  • Applications that are fault-tolerant,  resilient, and self-healing
  • Non disruptive upgrades, code (application  or software) loads, and activation
  • Immediate data consistency and  integrity vs. eventual consistency
  • Virus, malware, and other data corruption or loss prevention

 

From a data protection standpoint, the fundamental rule or guideline is 4 3 2 1, which means  having at least four copies consisting of at least three versions (different  points in time), at least two of which are on different systems or storage  devices and at least one of those is off-site (on-line, off-line, cloud, or  other). There are  many variations of the 4 3 2 1 rule shown  in the following figure along with approaches on how to manage technology to  use. We will go into deeper this subject in later chapters. For now, remember the following.

 

large version application server storage I/O
4 3 2 1 data protection (via Software Defined Data Infrastructure  Essentials)

 

1    At  least four copies of data (or more), Enables durability in case a copy goes  bad, deleted, corrupted, failed device, or site.
2    The  number (or more) versions of the data to retain, Enables various recovery  points in time to restore, resume, restart from.
3    Data  located on two or more systems (devices or media/mediums), Enables protection  against device, system, server, file  system, or other fault/failure.

4    With  at least one of those copies being off-premise and not live (isolated from  active primary copy), Enables resiliency across sites, as well as space, time,  distance gap for protection.

Capacity and Space (What Gets Consumed and Occupied)

In  addition to being available and accessible in a timely manner (performance),  data (and applications) occupy space. That space is memory in servers, as well as using available consumable processor  CPU time along with I/O (performance) including over networks.

 

Data  and applications also consume storage space where they are stored. In addition to basic data space, there is also space  consumed for metadata as well as protection copies (and overhead), application  settings, logs, and other items. Another aspect of capacity includes network IP  ports and addresses, software licenses, server, storage, and network bandwidth  or service time.

 

Server,  storage, and I/O network capacity topics include:

  • Consumable time-expiring  resources (processor time, I/O, network bandwidth)
  • Network IP and other addresses
  • Physical resources of servers,  storage, and I/O networking devices
  • Software licenses based on  consumption or number of users
  • Primary and protection copies of  data and applications
  • Active and standby data infrastructure  resources and sites
  • Data  footprint reduction (DFR) tools and techniques for space optimization
  • Policies, quotas, thresholds,  limits, and capacity QoS
  • Application and database  optimization

 

DFR includes various techniques,  technologies, and tools to reduce the impact or overhead of protecting, preserving,  and serving more data for longer periods of time. There are many different  approaches to implementing a DFR strategy,  since there are various applications and data.

 

Common DFR  techniques and technologies include archiving, backup modernization, copy data management  (CDM), clean up, compress, and consolidate, data  management, deletion and dedupe, storage tiering, RAID (including parity-based, erasure codes , local reconstruction codes [LRC] , and Reed-Solomon , Ceph Shingled Erasure Code (SHEC ), among  others), along with protection configurations along with thin-provisioning,  among others.

 

DFR can be implemented in various  complementary locations from row-level compression in database or email to  normalized databases, to file systems, operating systems, appliances, and  storage systems using various techniques.

 

Also, keep in mind that not all data is the same; some is sparse, some is dense, some can be compressed  or deduped while others cannot. Likewise,  some data may not be compressible or dedupable.  However, identical copies can be  identified with links created to a common copy.

Economics (People, Budgets, Energy and other Constraints)

If one thing in life and  technology that is constant is change, then  the other constant is concern about economics  or costs. There is a cost to enable and maintain a data infrastructure on  premise or in the cloud, which exists to protect, preserve, and serve data and  information applications.

 

However, there should also be a benefit to having the data infrastructure  to house data and support applications that provide information to users of the  services. A common economic focus is what something costs, either as up-front  capital expenditure (CapEx) or as an operating expenditure (OpEx) expense,  along with recurring fees.

 

In general, economic considerations  include:

  • Budgets (CapEx and  OpEx), both up front and in recurring fees
  • Whether you buy,  lease, rent, subscribe, or use free and open sources
  • People time needed to integrate  and support even free open-source software
  • Costs including hardware,  software, services, power, cooling, facilities, tools
  • People time includes  base salary, benefits, training and education

Where to learn more

Learn more about Application Data Value, application characteristics, PACE along with data protection, software defined data center (SDDC), software defined data infrastructures (SDDI)  and related topics via the following links:

 

https://storageioblog.com/data-infrastructure-primer-overview/

https://storageioblog.com/data-infrastructure-primer-overview/

https://storageioblog.com/data-infrastructure-primer-overview/

SDDC Data Infrastructure

 

Additional learning experiences along with common questions (and answers), as well as tips can be found in Software Defined Data Infrastructure Essentials book.

 

https://storageio.com/book4.html

Software Defined Data Infrastructure Essentials Book SDDC

What this all means and wrap-up

Keep in mind that with Application Data Value Characteristics Everything Is Not The Same across various  organizations, data centers, data infrastructures spanning legacy, cloud and other software defined data center (SDDC) environments. All applications have some element of performance, availability, capacity, economic (PACE) needs as well as resource demands. There is often a focus around data storage about storage efficiency and utilization which is where data footprint reduction (DFR) techniques, tools, trends and as well as technologies address capacity requirements. However with data storage there is also an expanding focus around storage effectiveness also known as productivity tied to performance, along with availability including 4 3 2 1 data protection. Continue reading the next post (Part III Application Data Characteristics Types Everything Is Not The Same) in this series here.

 

Ok, nuff said, for now.

 

Gs