2012

October 2012

# How many degrees separate you and your  information?

Posted by gregschulz Oct 28, 2012

In case you are not familiar, degrees of separation refer to  how you are connected to other people.

When you know somebody directly then you are a first  connection, and you are a second degree of separation from people that they are  directly connected to. The theory goes that via a mix of the number of  people you are directly connected to, as well as how well they are connected to  others, that you are only so many degrees of connection separation from many  (if not millions of people) and if you go out seven degrees, that could be  billions.

If you are familiar with or use Linked In and are directly connected  to somebody like myself, which is a first degree. For example in the following  image, person A is a first or 1 degree connection to person B, person B is a  direct or first degree connection to person C who in turn is a direct  connection to person D. Person A is 2 degree from person C and three degree  from person D.

The reason I bring this up is not to say or play games  around who is connected to whom, or compare contacts or the number of them,  rather to use the idea of degrees of separation in the context of where and how  you get your information. For example, you may get your information, insight or  experience directly from what you do. On the other hand, you may get  information or knowledge directly from the source or person involved with it,  which would be 1 degree of separation.

You could also get the information from somebody else such  as a friend, coworker, blogger, analyst, consultant, media journalist,  reporter, vendor, VAR or other  person who got it directly from the source, which would be 2 degrees of  separation. Another example would be you get your information from somebody who  cites a report, study, survey or some research that came from another source  that involved another party who collected and analyzed the data.

At each point, there is the potential for the information to  be changed, adjusted, reinterpreted, misunderstood, or simply adapted to meet  particularly needs. What if person A gets their information from person B who  in turn got their information from source C, and that comes from person D who  got it directly from person E? Assuming that the information was collected and  passed along as is, person A should get what was given from person E to person  D. However, along the way, various interpretations, more material and  views can be applied resulting in a different message.

There is also another variation, which are your spheres of  influence or circles of contacts. For example I get to talk with lots of IT  pros around the world live in person, virtually and via different venues, those  would be direct or no separation. When I hear from a vendor or PR or some  pundit telling me what they heard direct, that's 1 degree however if they heard  it from their marketing who heard it from a sales rep or other source then it's  at least two.

Another example of degrees of separation is where you are in  relation to technology timelines, evolution, revolution, industry adoption vs.  customer deployment. For example, if you are a researcher or development  engineer, you are further along on a technology evolution curve than others are.  Somebody then takes the researchers work and productize it including making it  manufacture able on a cost-effective basis. Along the lines there is also the  different degrees of separation between the researcher, initial publicity of a  technology breakthrough, general industry  adoption and later customer deploy and subsequent success stories. For  example, to a research something that they did many years along with those who  follow at that point may view what is emerging for real customer deploy as  old and yesterday's news.

On the other hand, for customers getting ready to deploy a  new technology, product or service, some breaking research may be interesting  to hear about, however it may be out several years at best from customer actual  use. Also on that theme, the customer of a component can be a manufacturer that  in turn test, qualifies and sells a finished solution to their customers. Thus,  there are different degrees of separation between industry adoption (e.g. talking  about and awareness) and customer deployment (actually buying and using on a mainstream basis) in the technology supply  chain.

Yet another degree of separation is between you and your  information or data. Some of that data is very close in your own memory (e.g.  brain), perhaps others written on note pads (physical or digital) with a copy  local or remote including at the cloud.  Depending on how your data and information are backed up or protected, there can  be added degrees of separation between you and your information.

Thus, there are different degrees of separation between you  and your various forms of information.

Your ability to learn and share information, meet and  interact with various people from across different sections of environments is  bound by what you are willing to engage via various mediums including social media involvement.

If you are comfortable with where you are at, or what you  know, then stay in your comfort zone, or sphere of influence, otherwise,  take a chance, venture out, learn what you do not know, meet who you do not  know, interact and see new things, or have some dejavu and share what you have  seen or experienced before.

After all, knowledge not shared with others is useless if kept only to you. Of course, for NDA material, what is not generally  known about, or understood is not  discussed and let us leave sleeping dogs lay where they rest.

How good or reliable is your information or G2 that you might be using for forming  opinions or making informed decisions around?

Feel free to expand your network getting closer by a degree  or two, if not directly too different sources. You can connect with me via Twitter (@storageio), Goggle+, Linked In and Facebook among other means here. Likewise,  check out the StorageIO events  calendar here for upcoming virtual and live activities. These activities include seminars,  web casts, video chats along with in person events while out and about in North America as  well as Europe.

Ok, nuff said.

Cheers gs

# Little data, big data and very big data (VBD) or big BS?

Posted by gregschulz Oct 28, 2012

This is an industry trends and perspective piece about big data and little data, industry adoption and customer deployment.

If you are in any way associated with information technology (IT), business, scientific, media and entertainment computing or related areas, you may have heard big data mentioned. Big data has been a popular buzzword bingo topic and term for a couple of years now. Big data is being used to describe new and emerging along with existing types of applications and information processing tools and techniques.

I routinely hear from different people or  groups trying to define what is or is not big data and all too often those are  based on a particular product, technology, service or application focus. Thus  it should be no surprise that those trying to police what is or is not big data  will often do so based on what their interest, sphere of influence, knowledge  or experience and jobs depend on.

Not long ago while out traveling I ran into a person who told me that big  data is new data that did not exist just a few years ago. Turns out this person  was involved in geology so I was surprised that somebody in that field was  not aware of or working with geophysical, mapping, seismic and other legacy or  traditional big data. Turns out this person was basing his statements on  what he knew, heard, was told about or on sphere of influence around a particular  technology, tool or approach.

Fwiw, if you have not figured out already, like cloud,  virtualization and other technology enabling tools and techniques, I tend to take  a pragmatic approach vs. becoming latched on to a particular bandwagon (for or against) per  say.

Not surprisingly there is confusion and debate about what is or is not big data including if it only applies to new vs. existing and old data. As with any new technology, technique or buzzword bingo topic theme, various parties will try to place what is or is not under the definition to align with their needs, goals and preferences. This is the case with big data where you can routinely find proponents of Hadoop and Map reduce position big data as aligning with the capabilities and usage scenarios of those related technologies for business and other forms of analytics.

Not surprisingly the granddaddy of all business analytics, data science and statistic analysis number crunching is the Statistical Analysis Software (SAS) from the SAS Institute. If these types of technology solutions and their peers define what is big data then SAS (not to be confused with Serial Attached SCSI which can be found on the back-end of big data storage solutions) can be considered first generation big data analytics or Big Data 1.0 (BD1 ). That means  Hadoop Map Reduce is Big Data 2.0 (BD2 ) if you like, or dislike for that matter.

Funny thing about some fans and proponents or surrogates of BD2 is that  they may have heard of BD1 like SAS with  a limited understanding of what it is or how it is or can be used.  When I worked in IT as a performance and capacity planning analyst focused on servers, storage, network hardware, software and applications I used SAS to crunch various data streams of event, activity and other data from diverse sources.  This involved correlating data, running various analytic algorithms on the data to determine response times, availability, usage and other things in support of modeling, forecasting, tuning and trouble shooting. Hmm, sound like first generation big data analytics or Data Center Infrastructure Management (DCIM) and IT Service Management (ITSM) to anybody?

Now to be fair, comparing SAS, SPSS or any number of other BD1 generation tools to Hadoop and Map Reduce or BD2 second generation tools is like comparing apples to oranges, or apples to pears.     Lets move on as there is much more to what is big data than simply focus around SAS or Hadoop.

Another type of big data are the information generated, processed, stored and used by applications that result in large files, data sets or objects. Large file, objects or data sets include    low resolution and high-definition photos, videos, audio, security and surveillance, geophysical mapping and seismic exploration among others. Then there are data warehouses where transactional data from databases gets moved to for analysis in systems such as those from Oracle, Teradata, Vertica or FX among others. Some of those other tools  even play (or work) in both traditional e.g. BD1 and new or emerging BD2 worlds.

This is where some interesting discussions, debates or disagreements can occur between those who latch onto or want to keep big data associated with being something new and usually focused around their preferred tool or technology. What results from these types of debates or disagreements is a missed opportunity for organizations to realize that they might already be doing or using a form of big data and thus have a familiarity and comfort zone with it.

By having a familiarity or comfort zone vs. seeing big data as something new, different, hype or full of FUD (or BS),  an organization can be comfortable with the term big data. Often after taking a step back and looking at big data beyond the hype or fud, the reaction is along the lines of, oh yeah, now we get it, sure, we are already doing something like that so lets take a look at some of the new tools and techniques to see how we can extend what we are doing.

Likewise many organizations are doing big bandwidth already and may not realize it thinking that is only what media and entertainment, government, technical or scientific computing, high performance computing or high productivity computing (HPC) does. I'm assuming that some of the big data and big bandwidth pundits will disagree, however if in your environment you are doing many large backups, archives, content distribution, or copying large amounts of data for different purposes that consume big bandwidth and need big bandwidth solutions.

Yes I know, that's apples to oranges and perhaps stretching the limits of what is or can be called big bandwidth based on somebody's definition, taxonomy or preference. Hopefully you get the point that there is diversity across various environments as well as types of data and applications, technologies, tools and techniques.

I often say that if big data is getting all the marketing dollars to generate industry adoption, then little data is generating all the revenue (and profit or margin) dollars by customer deployment. While tools and technologies related to Hadoop (or Haydoop if you are from HDS) are getting industry adoption attention (e.g. marketing dollars being spent) revenues from customer deployment are growing.

Where big data revenues are strongest for most vendors today are centered around solutions for hosting, storing, managing and protecting big files, big objects. These include scale out NAS solutions for large unstructured data like those from Amplidata, Cray, Dell, Data Direct Networks (DDN), EMC (e.g. Isilon), HP X9000 (IBRIX), IBM SONAS, NetApp, Oracle and Xyratex among others. Then there flexible converged compute storage platforms optimized for analytics and running different software tools such as those from EMC (Greenplum), IBM (Netezza), NetApp (via partnerships) or Oracle among others  that can be used for different purposes in addition to supporting Hadoop and Map reduce.

If little data is databases and things not generally  lumped into the big data bucket, and if you think or perceive big data only to be  Hadoop map reduce based data, then does that mean all the large unstructured  non little data is then very big data or VBD?

Of course the virtualization folks might want to if they  have not already corner the V for Virtual Big Data. In that  case, then instead of Very Big Data, how about very very Big Data (vvBD).  How about Ultra-Large Big Data (ULBD), or High-Revenue Big Data (HRBD), granted  the HR might cause some to think its unique for Health Records, or Human  Resources, both btw leverage different forms of big data regardless of  what you see or think big data is.

Does that then mean we should really be calling videos,  audio, PACs, seismic, security surveillance video and related data to be VBD? Would this further confuse the market, or the industry or  help elevate it to a grander status in terms of  size (data file or object capacity, bandwidth, market size and application  usage, market revenue and so forth)?

Do we need various industry consortiums, lobbyists or trade groups to go off and create models, taxonomies, standards and dictionaries based on their constituents needs and would they align with those of the customers, after all, there are big dollars flowing around big data industry adoption (marketing).

What does this all mean?

Is Big Data BS?

First let me be clear, big data is not BS, however there is a lot of BS marketing BS by some along with hype and fud adding to the confusion and chaos, perhaps even missed opportunities. Keep in mind that in chaos and confusion there can be opportunity for some.

IMHO big data is real.

There are different variations, use cases and types of products, technologies and services that fall under the big data umbrella. That does not mean everything can or should fall under the big data umbrella as there is also little data.

What this all means is that there are different types of  applications for various industries that have big and little data, virtual and  very big data from videos, photos, images, audio, documents and more.

Big data is a big buzzword bingo term these days with vendor marketing big  dollars being applied so no surprise the buzz, hype, fud and more.

Ok, nuff said, for now.

Cheers gs

# RAID and  IOPS and IO observations

Posted by gregschulz Oct 15, 2012

There are at least  two different meanings for IOPs, which for those not familiar with the  information technology (IT) and data storage meaning is Input/output Operations  Per second (e.g. data movement activity). Another meaning for IOP that is the  international organization for a participatory society (iopsociety.org), and their fundraising activity found  here.

I recently came across a piece (here and here)  talking about RAID and IOPs that had some interesting points;  however, some generalizations could use some more comments. One of the  interesting comments and assertions is that RAID writes increase with the  number of drives in the parity scheme. Granted the specific implementation and  configuration could result in an it depends type response.

Here are some more perspectives to the piece (here and here)  as the sites comments seem to be restricted.

Keep in mind that such as with RAID 5 (or 6) performance, your IO size will have  a bearing on if you are doing those extra back-end IOs. For example if you are  writing a 32KB item that is accomplished by a single front-end IO from an  applications server, and your storage system, appliance, adapter, software  implementing and performing the RAID (or erasure coding for that matter) has a  chunk size of say 8KB (e.g. the amount of data written to each back-end drive).  Then a 5 drive R5 (e.g. 4+1) would in fact have five back-end IOPS (32KB / 8KB  = 4 + 1 (8KB Parity)).

Otoh of the front end IOP were only 16KB (using whole  numbers for simplicity, otherwise round-up), in the case of a write, there  would be three back-end writes with the R5 (e.g. 2 + 1). Keep in mind the controller/software managing the RAID would  (or should) try to schedule back-end IO with cache, read-head, write-behind,  write-back, other forms of optimization etc.

In the piece (here and here),  a good point is the understanding and factoring in IOPS is important, as is  also latency or response time in addition to bandwidth or throughput, along  with availability, they are all inter-related.

Also very important is to keep in mind the size of the IOP,  read and write, random, sequential etc.

RAID along with erasure coding is a balancing act between  performance, availability, space capacity and economics aligned to different  application needs.

RAID 0 (R0) actually has a big impact on performance, no penalty  on writes; however, it has no availability protection benefit and in fact can  be a single point of failure (e.g. loss of a HDD or SSD) impacts the entire R0  group. However, for static items, or items that are being journaled and  protected on some other medium/RAID/protection scheme, R0 is used more than  people realize for scratch/buffer/transient/read cache types of applications.  Keep in mind that it is a balance of all performance and capacity with the  exposure of no availability as opposed to other approaches. Thus, do not be  scared of R0, however also do not get burned or hurt with it either, treat it  with respect and can be effective for something's.

Also mentioned in the piece was that SSD based servers will  perform vastly better than SATA or SAS based ones. I am assuming that the  authors meant to say better than SAS or SATA DAS based HDDs?

Keep in mind that unless you are using a PCIe nand flash SSD  card as a target or cache or RAID card, most SSD drives today are either SAS or  SATA (being the more common) along with moving from 3Gb SAS or SATA to 6Gb SAS & SATA.

Also while HDD and SSDs can do a given number of reads or  writes per second, those will vary based on the size of the IO, read, write,  random, sequential. However what can have the biggest impact and where I have  seen too many people or environments get into a performance jam is when  assuming that those IOP numbers per HDD or SSD are a given. For example  assuming that 100-140, IOPs (regardless of size, type, etc.) can be achieved as  a limiting factor is the type of interface and controller/adapter being used.

I have seen fast HDDs and SSDs deliver sub-par  performance or not meeting expectations fast interfaces such as  iSCSI/SAS/SATA/FC/FCoE/IBA or other interfaces due to bottlenecks in the  adapter card, storage system / appliance / controller / software. In some cases  you may see more effective IOPs or reads, writes or both, while on other  implementations you may see lower than expected due to internal implementation  bottlenecks or architectural designs. Hint, watch out for solutions where the  vendor tries to blame poor performance on the access network (e.g. SAS, iSCSI,  FC, etc.) particular if you know that those are not bottlenecks.

In terms of fund-raising, if you feel so compelled, send a gift, donation, sponsorship, project, buy some books, piece of work,  assignment, research project, speaking, keynote, web cast, video or seminar  event my way and just like professional  fund-raisers, or IOPS vendors, StorageIO accept visa, Master Card, American  express, Pay Pal, check and traditional POs.

As for this site and  comments, outside of those caught in the spam trap, courteous perspectives and  discussions are welcome.

Ok, nuff said.

Cheers gs

# Does  Dell have a cloudy cloud strategy story (Part I)?

Posted by gregschulz Oct 8, 2012

This is first of a  two-part post (click here for second post) that is part of ongoing industry trends and perspective cloud conversations series that looks at Dell and their cloud strategy story. For background, some previous Dell posts are found  here, here, here and here. Here is a link that has video of the live  Dell Storage Customer Advisory (CAP) panel that Dell asked me to moderate back  in June that touches on some related themes and topics. Btw, fwiw and for disclosure Dell AppAssure is a site advertiser on storageioblog.com ;).

Depending on your view of what is or is not a cloud service, product or solution, naturally you will then have various opinions of where Dell is at with their cloud strategy and story.

If you consider  object based storage to be part of or a component of private clouds or at least  for medical, healthcare and related focus, then Dell is already there with  their DX object storage solutions (Caringo based).

From a scale out, clustered or grid file system, Dell bought Exanet in a post holiday shopping sale a few years back and has invested in its development having renamed it Fluid File System and initially available as the FS7000 series (EqualLogic) and more recently expanded systems such as the FS8600 (Compellent based), EqualLogic and NX3500 (MD3000 based).

If you view  clouds as being part of services provided including via hosting or similar, Dell  is already there via their Perot systems acquisitions.

If you view cloud  as being part of VDI, or VDI being part of cloud, Dell is there with their  tools including various acquisitions and solution bundles.

On the other hand  if you view clouds as reference architectures across VMware vSphere, Microsoft  Hyper-V and Citrix Xen among others, guess what, Dell is also there with their  VIS.

Or, if you view  private clouds as being a bundled solution (server, storage, hardware,  software) such as EMC vBlock or NetApp FlexPod, then Dell vStart (not to be  confused as being a service) is on the list with other infrastructure stack solutions.

How about being a  technology supplier to what you may consider as being true cloud providers or  enables including those who use OpenStack or other APIs and cloud tools, guess  what, Dell is also there including at Rackspace (via public web info).

So the above all  comes back to that Dell like many vendors who offer services, solutions and  related items for data and information infrastructures have diverse offerings including servers, storage, networking, hardware, software and support.

Dell like others similar to them has to find  a balance between providing services that compete with  their customers, as well as supplier such as to Rackspace. In this case Dell is no different from EMC who happened to move their Mozy backup service off to their VMware  subsidiary and has managed to help define where VCE (and here) and ATMOS fit as products  while being services capable. IBM has figured this out having a mix of old school  services such as SmartCloud Services (or here), IBM Global Services and BCRS (business continuity recovery services), not to mention newer backup and  storage cloud services, products and solutions they have acquired, or OEM or have reseller agreements with.

HP has expanded their traditional focused EDS as well as other HP services along with  products being joined by their Amazon like Cloud Services including compute,  storage and content distribution network (CDN) capabilities. NetApp is taking  the partnering route along with Cisco staying focused for at least now on being  a partner supplier. Oracle, well Oracle is Oracle and they have a mix of  products and services. In fact some might say Oracle is late to the cloud game  however  they have been in the game since the late 90s when they came  out with Oracle online, granted the cloud purist will call that application  service provider (e.g. ASP) vs. today's applications as a service (AaaS) models.

Continue with the second post here, ok, nuff said (for now).

Cheers gs

# StorageIO  going Dutch and Deutsch fall 2012

Posted by gregschulz Oct 8, 2012

Following a busy spring and summer schedule, the  fall 2012 StorageIO out and about  activities are underway including events on both the European and North American continents.

In addition to  in person events, there are also some virtual  activities including live and recorded video and audio sessions, as well as  webcast on the fall schedule with more in the works.

Some of the fall  events include SNW (past SNW posts here, here, and here) in Santa Clara, as  well as SNW Europe and Power the  Cloud event (Frankfurt Deutschland aka Germany) October 30 and 31st where I will be  doing some meetings and briefing, along with attending sessions and the expo  activities.

On November 1st its  off to Storage  Expo Holland in Utrecht (here and here) where I will be  presenting two sessions. One is on SSD industry trends and tips on deployment  with a theme of not if, rather when, where, why and with what to use SSD. In  addition I will be doing a general industry trends and perspective session on gaining confidence with clouds,  virtualization, data and storage networking including object storage and backup (e.g. data protection modernization).

European travel tools and technologies

In addition to the  above activities, following successful past events in Nijkerk Holland including  the most recent May 2012 sessions, a new  seminar has been announced focused on backup,  restore, BC, DR and archiving hosted by Brouwer  Consultancy on November 5th and 6th 2012. These  workshop format seminars are very interactive providing independent perspectives  on technology, tools, trends and what to do to address various challenges  including more informed and effective IT decision-making.

In addition to the  new seminar that you can learn  more about here, two other sessions will also be offered in Holland. These  include a one  day storage technology deep dive, speeds and feeds, who is doing what workshop  seminar on November 7th. The other session is a two-day workshop seminar on November 8th and 9th covering  storage and networking industry trends covering clouds, virtualization and other  broad topics.

Examples of Dutch refreshments

Watch for more  events, seminars, live video, webinars and virtual trade shows by visiting the StorageIO events page.

Drop me a note if  you would like to schedule or arrange for a meeting, webinar, seminar or other  activity at an event near you. If you planning to be in or near Holland early  November, and interested in scheduling a meeting or session, send me a note or contact Brouwer Consultancy (here) to make arrangements.

Time to get ready  for these and other events, ok, nuff said.

Cheers gs

By date: By tag: