Skip navigation

Greg's Blog

February 11, 2012 Previous day Next day
gregschulz Hot Shot

Cloud and travel fun

Posted by gregschulz Feb 11, 2012

Warning if you are a cloud purist who does not take lightly to fun in and around all types of clouds, well, try to have some fun, otherwise enjoy this fun in and around clouds post.


On a recent trip to a video recording studio in the Boston (BOS) area, I took a few photos with my iPhone of traveling above, in and around clouds. In addition, during the trip I also used cloud based services from the airplane (e.g. Gogo WiFi) for cloud backup and other functions.


Above the clouds, the engine (A GE/CFM56) enables this journey to and above the clouds
View of a GE CFM56 powering a Delta A320 journey to the clouds

Easy to understand Disaster Recovery (DR) plan for planes traveling through and above cloudsEasy to understand Disaster Recovery (DR) plan for planes traveling through and above clouds
Easy to understand cloud emergency and contingency procedures

On board above the cloud marketing
Example of cloud marketing and value add services

Nearing Boston
Clouds are clearing nearing destination Boston aka IATA: BOS

Easy to understand above the cloud networking
Example of easy to understand converged cloud networking

A GE/CFM56 jet engine flying over the GE Lynn MA jet engine facility
GE Aviation plant in Lynn MA below GE CFM56 jet engine

On rampe or waiting area to return back to above the clouds
Back at Logan, long day of travel, video shoot, time for a nap.

Clear sky at sunset as moon rises over Cloud Expo 2011 in Santa Clara
From a different trip, wrapping up a cloud focused day, at Cloud Expo in Santa Clara CA in November.



Here are some additional links about out and about, clouds, travel, technology, trends and fun:
Commentary on Clouds, Storage, Networking, Green IT and other topics
Cloud, virtualization and storage networking conversations
What am I hearing and seeing while out and about


Oh, what was recorded in the video studios on that day?


Why something about IT clouds, virtualization, storage, networking and other related topics of course that will be appearing at some venue in the not so distant future.

Ok, nuff fun for now, lets get back to work.


Cheers gs


Greg Schulz - Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio

Amazon Web Services (AWS) today announced the beta of their new storage gateway functionality that enables access of Amazon S3 (Simple Storage Services) from your different applications using an appliance installed in your data center site. With this beta launch, Amazon joins other startup vendors who are providing standalone gateway appliance products (e.g. Nasuni etc) along with those who have disappeared from the market (e.g. Cirtas). In addition to gateway vendors, there are also those with cloud access added to their software tools such as (e.g. Jungle Disk that access both Rack space and Amazon S3 along with Commvault Simpana Cloud connector among others). There are also vendors that have joined cloud access gateways as part of their storage systems such as TwinStrata among others. Even EMC (and here) has gotten into the game adding qualified cloud access support to some of their products.


What is a cloud storage gateway?

Before going further, lets take a step back and address what for some may be a fundemental quesiton of what is a cloud storage gateway?

Cloud services such as storage are accessed via  some type of network, either the public Internet or a private connection. The  type of cloud service being accessed (figure 1) will decide what is  needed. For example, some services can be accessed using a standard Web browser,  while others must plug-in or add-on modules. Some cloud services may need  downloading an application, agent, or other tool for accessing the cloud  service or resources, while others give an on-site or on-premises appliance  or gateway.

Generic cloud access example via Cloud and Virtual Data Storage Networking (CRC Press)
Figure 1: Accessing and using clouds (From Cloud and Virtual Data Storage Networking (CRC Press))


Cloud access software and gateways  or appliances are used for making cloud storage accessible to local applications.  The gateways, as well as enabling cloud access, provide replication,  snapshots, and other storage services functionality. Cloud access gateways or  server-based software include tools from BAE, Citrix, Gladinet, Mezeo,  Nasuni, Openstack, Twinstrata and Zadara among others. In addition to cloud gateway  appliances or cloud points of presence (cpops), access to public services is  also supported via various software tools. Many data protection tools  including backup/restore, archiving, replication, and other applications have  added (or are planning to add) support for access to various public services  such as Amazon, Goggle, Iron Mountain, Microsoft, Nirvanix, or Rack space among several others.


Some of the tools have added native  support for one or more of the cloud services leveraging various applicaiotn programming interfaces (APIs), while  other tools or applications rely on third-party access gateway appliances or a  combination of native and appliances. Another option for accessing cloud  resources is to use tools (Figure 2) supplied by the service provider, which  may be their own, from a third-party partner, or open source, as well as  using their APIs to customize your own tools.


Generic cloud access example via Cloud and Virtual Data Storage Networking (CRC Press)
Figure 2: Cloud access tools (From Cloud and Virtual Data Storage Networking (CRC Press))


For example, I can use my Amazon  S3 or Rackspace storage accounts using their web and other provided tools for  basic functionality. However, for doing backups and restores, I use the tools  provided by the service provider, which then deal with two different cloud  storage services. The tool presents an interface for defining what to back up,  protect, and restore, as well as enabling shared (public or private) storage  devices and network drives. In addition to providing an interface (Figure 2),  the tool also speaks specific API and protocols of the different services,  including PUT (create or update a container), POST (update header or Meta  data), LIST (retrieve information), HEAD (metadata information access), GET  (retrieve data from a container), and DELETE (remove container) functions. Note  that the real behavior and API functionality will vary by service provider.  The importance of mentioning the above example is that when you look at some  cloud storage services providers, you will see mention of PUT, POST, LIST,  HEAD, GET, and DELETE operations as well as services such as capacity and availability.  Some services will include an unlimited number of operations, while others will  have fees for doing updates, listing, or retrieving your data in addition to  basic storage fees. By being aware of cloud primitive functions such as PUT or  POST and GET or LIST, you can have a better idea of what they are used for as  well as how they play into evaluating different services, pricing, and services  plans.


Depending on the type of cloud  service, various protocols or interfaces may be used, including iSCSI, NAS NFS, HTTP  or HTTPs, FTP, REST, SOAP, and Bit Torrent, and APIs and PaaS mechanisms  including .NET or SQL database commands, in addition to XM, JSON, or other  formatted data. VMs can be moved to a cloud service using file transfer tools  or upload capabilities of the provider. For example, a VM such as a VMDK or VHD  is prepared locally in your environment and then uploaded to a cloud provider  for execution. Cloud services may give an access program or utility that  allows you to configure when, where, and how data will be protected, similar to  other backup or archive tools.


Some traditional backup or archive  tools have added direct or via third party support for accessing IaaS cloud  storage services such as Amazon, Rack space, and others. Third-party access  appliance or gateways enable existing tools to read and write data to a cloud environment  by presenting a standard interface such as NAS (NFS and/or CIFS) or iSCSI (Block) that gets mapped to the back-end  cloud service format. For example, if you subscribe to Amazon S3, storage is  allocated as objects and various tools are used to use or utilize. The cloud  access software or appliance understands how to communicate with the IaaS  storage APIs and abstracts those from how they are used. Access software tools  or gateways, in addition to translating or mapping between cloud APIs, formats  your applications including security with encryption, bandwidth optimization,  and data footprint reduction such as compression and de-duplication. Other functionality  include reporting, management tools that support various interfaces, protocols  and standards including SNMP or SNIA, Storage Management Initiative  Specification (SMIS), and Cloud Data Management Initiative (CDMI).


First impression: Interesting, good move Amazon, I was ready to install and start testing it today

The good news here is that Amazon is taking steps to make it easier for your existing applications and IT environments to use and leverage clouds for private and hybrid adoption models with both an Amazon branded and managed services, technology and associated tools.


This means leveraging your existing Amazon accounts to simplify procurement, management, ongoing billing as well as leveraging their infrastructure. As a standalone gateway appliance (e.g. it does not have to be bundled as part of a specific backup, archive, replication or other data management tool), the idea is that you can insert the technology into your existing data center between your servers and storage to begin sending a copy of data off to Amazon S3. In addition to sending data to S3, the integrated functionality with other AWS services should make it easier to integrated with Elastic Cloud Compute (EC2) and Elastic Block storage (EBS) capabilities including snapshots for data protection.


Thus my first impression of AWS storage gateway at a high level view is good and interesting resulting in looking a bit deeper resulting in a second impression.


Second impression: Hmm, what does it really do and require, time to slow down and do more home work

Digging deeper and going through the various publicly available material (note can only comment or discuss on what is announced or publicly available) results in a second impression of wanting and needing to dig deeper based on some of caveats. Now granted and in fairness to Amazon, this is of course a beta release and hence while on first impression it can be easy to miss the notice that it is in fact a beta so keep in mind things can and hopefully will change.


Pricing aside, which means as with any cloud or managed storage service, you will want to do a cost analysis model just as you would for procuring physical storage, look into the cost of monthly gateway fee along with its associated physical service running VMware ESXi configuration that you will need to supply. Chances are that if you are an average sized SMB, you have a physical machine (PM) laying around that you can throw a copy of ESXi on to if you dont already have room for some more VMs on an existing one.


You will also need to assess the costs for using the S3 storage including space capacity charges, access and other fees as well as charges for doing snapshots or using other functionality. Again these are not unique to Amazon or their cloud gateway and should be best practices for any service or solution that you are considering. Amazon makes it easy by the way to see their base pricing for different tiers of availability, geographic locations and optional fees.


Speaking of accessing the cloud, and cloud conversations, you will also want to keep in mind what your networking bandwidth service requirements will be to move data to Amazon that might not already be doing so.


Another thing to consider with the AWS storage gateway is that it does not replace your local storage (that is unless you move your applications to Amazon EC2 and EBS), rather makes a copy of what every you save locally to a remote Amazon S3 storage pool. This can be good for high availability (HA), business continuance (BC), disaster recovery (DR) and compliance among other data management needs. However in your cost model you also need to keep in mind that you are not replacing your local storage, you are adding to it via the cloud which should be seen as complimenting and enhancing your private now to be hybrid environment.


Walking the cloud data protection talk

FWIW, I leverage a similar model where I use a service (Jungle Disk) where critical copies of my data get sent to that service which in turn places copies at Rack space (Jungledisks parent) and Amazon S3. What data goes to where depends on different policies that I have established. I also have local backup copies as well as master gold disaster copy stored in a secure offsite location. The idea is that when needed, I can get a good copy restored from my cloud providers quickly regardless of where I am if the local copy is not good. On the other hand, experience has already demonstrated that without sufficient network bandwidth services, if I need to bring back 100s of GBytes or TBytes of data quickly, Im going to be better off bring back onsite my master gold copy, then applying fewer, smaller updates from the cloud service. In other words, the technologies compliment each other.

By the way, a lesson learned here is that once my first copy is made which have data footprint reduction (DFR) techniques applied (e.g. compress, de dupe, optimized, etc), later copies occur very fast. However subsequent restores of those large files or volumes also takes longer to retrieve from the cloud vs. sending up changed versions. Thus be aware of backup vs. restore times, something of which will apply to any cloud provider and can be mitigated by appliances that do local caching. However also keep in mind that if a disaster occurs, will your local appliance be affected and its cache rendered useless.


Getting back to AWS storage gateway and my second impression is that at first it sounded great.


However then I realized it only supports iSCSI and FWIW, nothing wrong with iSCSI, I like it and recommend using it where applicable, even though Im not using it. I would like to have seen a NAS (either NFS and/or CIFS) support for a gateway making it easier for in my scenario different applications, servers and systems to use and leverage the AWS services, something that I can do with my other gateways provided via different software tools. Granted for those environments that already are using iSCSI for your servers that will be using AWS storage gateway, then this is a non issue while for others it is a consideration including cost (time) to factor in to prepare your environment for using the ability.


Depending on the amount of storage you have in your environment, the next item that caught my eye may or may not be an issue  that the iSCSI gateway supports up to 1TB volumes and up to 12 of them hence a largest capacity of 12TB under management. This can be gotten around by using multiple gateways however the increased complexity balanced to the benefit the functionality is something to consider.


Third impression: Dig deeper, learn more, address various questions

This leads up to my third impression  the need to dig deeper into what AWS storage gateway can and cannot do for various environments. I can see where it can be a fit for some environments while for others at least in its beta version will be a non starter. In the meantime, do your homework, look around at other options which ironically by having Amazon launching a gateway service may reinvigorate the market place of some of the standalone or embedded cloud gateway solution providers.


What is needed for using AWS storage gateway

In addition to having an S3 account, you will need to acquire for a monthly fee the storage gateway appliance which is software installed into a VMware ESXi hypervisor virtual machine (VM). The requirements are VMware ESXi hypervisor (v4.1) on a physical machine (PM) with  at least 7.5GB of RAM and four (4) virtual processors assigned to the appliance VM along with 75GB of disk space for the Open Virtual Alliance (OVA) image installation and data. You will also need to have an proper sized network connection to Amazon. You will also need iSCSI initiators on either Windows server 2008, Windows 7 or Red Hat Enterprise Linux.


Note that the AWS storage gateway beta is optimized for block write sizes greater than 4Kbytes and warns that smaller IO sizes can cause overhead resulting in lost storage space. This is a consideration for systems that have not yet changed your file systems and volumes to use the larger allocation sizes.


Some closing thoughts, tips and comments:

  • Congratulations to Amazon for introducing and launching an AWS branded storage gateway.
  • Amazon brings trust the value of trust to a cloud relationship.
  • Initially I was excited about the idea of using a gateway  that any of may systems could use my S3 storage pools with vs. using gateway  access functions that are part of different tools such as my backup software or  via Amazon web tools. Likewise I was excited by the idea of having an easy to  install and use gateway that would allow me to grow in a cost effective way.
  • Keep in mind that this solution or at least in its beta version DOES NOT replace your existing iSCSI based storage needs, instead it compliments what you already have.
  • I hope Amazon listens carefully to what they customers and prospects want vs. need to evolve the functionality.
  • This announcement should reinvigorate some of the cloud appliance vendors as well as those who have embedded functionality to Amazon and other providers.
  • Keep bandwidth services and optimization in mind both for sending data as well as for when retrieving during a disaster or small file restore.
  • In concept, the AWS storage gateway is not all that different than appliances that do snapshots and other local and remote data protection such as those from Actifio, EMC (Recoverpoint), or dedicated gateways such as those from Nasuni among others.
  • Here is a link to added AWS storage gateways frequently asked questions (FAQs).
  • If the AWS were available with a NAS interface, I would probably be activating it this afternoon even with some of their other requirements and cost aside.
  • Im still formulating my fourth impression which is going to take some time, perhaps if I can get Amazon to help sell more of my books so that I can get some money to afford to test the entire solution leveraging my existing S3, EC2 and EBS accounts I might do so in the future, otherwise for now, will continue to research.
  • Learn more about the AWS storage gateway beta, check out this free Amazon web cast on February 23, 2012.


Learn more abut cloud based data protection, data footprint reduction, cloud gateways, access and management, check out my book Cloud and Virtual Data Storage Networking (CRC Press) which is of course available on Amazon Kindle as well as via hard cover print copy also available at


Ok, nuff said for now, I need to get back to some other things while thinking about this all some more.


Cheers gs


Greg Schulz - Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)


twitter @storageio

In my 2012 (and 2013) industry trends and perspectives predictions I mentioned that some storage systems vendors who managed their costs could benefit from the current Hard Disk Drive (HDD) shortage. Most in the industry would say that is saying what they have said, however I have an alternate scenario. My scenario is that for vendors who already manage good (or great) margins on their HDD sales and who can manage their costs including inventories stand to make even more margin. There is a popular myth that there is no money or margin in HDD or for those who sell them which might be true for some.


Without going into any details, lets just say it is a popular myth just like saying that there is no money in hardware or that all software and people services are pure profit. Ok, lets leave sleeping dogs lay where rest (at least for now).


Why will some storage vendors make more margin off of HDD when everybody is supposed to be adopting or deploying solid state devices (SSD). Or Hybrid Hard Disk Drives (HHDD) in the case of workstation, desktop or laptops? Simple, SSD adoption (and deployment) is still growing and a lot of demand generator incentives available. Likewise HDD demand continues to be strong and with supplies affected, economics 101 says that some will raise their prices, manage their expenses, make more profits which can be used to help fund or stimulate increased SSD or other initiatives.


Storage, IT and general Economics 101


Economics 101 or basics introduces the concept of supply and demand along with revenue minus costs = profits or margin. If there is no demand yet a supply of a product exists then techniques such as discounting, bundling or other forms of adding value to incentivize customers to make a purchase. Bundling can include offering some other product, service or offering that could be as simple as an extended warranty to motivate sellers. Beyond discounts, coupons, two for one, future buying credits, gift cards or memberships for frequent buyers (or flyers) are other forms of stimulating sales activity.


Likewise if there is a supply or competition for a given market of a product or alternative, vendors or those selling the products including value added resellers (VARS) may sacrifice margin (profits) to meet revenue as well as unit shipped (e.g. expand their customer and installed base footprint) goals.


Currently in the IT industry and specifically around data storage even with increased and growing adoption and demand deployment around SSD, there is also a large supply in different categories. For example there are several fabrication facilities (FABs) that produce the silicon dies (e.g. chips) that form nand flash SSD memories including Intel, Micron, the joint Intel and Micron Fab (IMF) and Samsung. Even with continued strong demand growth, the various FABs seem to have enough capacity at least for now. Likewise manufactures of SSD drive form factor products with SAS or SATA interfaces for attaching to existing servers, storage or appliances including Intel, Micron, Samsung, Seagate, STEC and SANdisk among others seem to be able to meet demand. Even PCIe SSD card vendors have come under pressure of supply and demand. For example the high flying startup FusionIO recently saw its margins affected due to competition which includes Adaptec, LSI, Texas Memory Systems (TMS) and soon EMC among others. In the SSD appliance and storage system space there are even more vendors with what amounts to about one every month or so coming out of stealth. Needless to say there will be some shakeout in the not so distant future.


On the other hand, if there is a demand however limited supply, assuming that the market will support it, prices can be increased from what discounts had applied. Assuming that costs are kept inline any subsequent increase in average selling price (ASP) minus costs should result in higher margins.


Another variation is if there is strong demand and shortage of supply such as what is occurring with hard disk drives (HDD) due to recent flooding in Thailand, not only prices increase, there can also be changes to warranties or other services and incentives. Note some of HDD manufactures such as Western Digital were more affected by the flooding than Seagate. Likewise the Thailand flooding was not limited to just HDD having also affected other electronic chip and component suppliers. Even though HDDs have been declared dead by many in the SSD camps along with their supporters, record number of HDDs are produced every year. Note that economics 101 also tells us that even though more devices are produced and sold, that may not show a profit based on their cost and price. Like the CPU processor chips produced by AMD, Broadcom, IBM and Intel among others that are high volume, with varying margins, the HDD and nand flash SSD market is also high volume with different margins.


As an example, Seagate recently announced strong profits due to a number of factors even though enterprise drive supply and shipments were down while desktop drives were up. Given that many industry pundits have proclaimed a disaster for those involved with HDDs due to the shortage, they forgot about economics 101 (supply and demand). Sure marketing 101 says that HDDs are dead and if there is a shortage then more people will buy SSDs however that also assumes that people are a) ready to buy more SSDs (e.g. demand) and b) vendors or manufactures have supply and c) that those same vendors or manufactures are willing to give up margin while reducing costs to boost profits.


Note that costs typically include selling, general and administrative, cost of goods, manufacturing, transportation and shipping, insurance, research and development among others. If it has been awhile since you looked at one, take a few minutes sometime to look at public companies and their quarterly securities exchange commission (SEC) financial filings. Those public filing documents are a treasure trove of information for those who sift through them and where many reporters, analysts and researchers find information for what they are working or speculating on. These documents show total sales, costs, profits and losses among other things. Something that vendors may not show in these public filings which means you have to look or read between the lines or get the information elsewhere is how many units were actually shipped or the ASP to get an idea of the amount of discounting that is occurring. Likewise sales and marketing expenses often get lumped into or under general selling and administration (SGA). A fun or interesting metric is to look at the percentage of SGA dollars spent per revenue and profits.


What I find interesting is to get an estimate of what it is costing an organization to do or sustain a given level of revenue and margin. For example, while some larger vendors may seem to spend more on selling and marketing, on a percentage basis, they can easily be out spent by smaller startups. Granted the larger vendor may be spending more actually dollars however those are spread out over a larger sales and revenue basis.


What does this all mean?


Look at multiple metrics that have both a future trend or forecast as well as trailing or historical perspective view. Look at both percentages as well as dollar amounts as well as both revenue and margin while keeping units or number of devices (or copies) sold also into perspective. For example its interesting to know if a vendors sales were down 10% (or up) quarter over quarter, or versus the same quarter a year ago or year over year. It is also interesting to keep the margin in perspective along with SGA costs in addition to cost of product acquired for sale. Also important is to get a gauge of if sales were down, yet margins are up, how many devices or copies were sold to get a gauge on expanding footprint which could also be a sign of future annuity (follow up sales opportunities). What Im watching is over the next couple of quarters is to see how some vendors leverage the Thailand flooding and HDD as well as other electronic component supply shortages to meet demand by managing discounts, costs and other items that contribute to enhanced margins.


Rest assured there is a lot more to IT and storage economics, including advanced topics such as Return on Investment (ROI) or Return on Innovation (The new ROI) and Total Cost of Ownership (TCO) among others that maybe we will discuss in the future.


Ok, nuff fun for now, lets get back to work.


Cheers gs


Greg Schulz - Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)


twitter @storageio

This is the second of a two part series pertaining to EMC VFCache, you can read the first part here.


In this part of the series, lets look at some common questions along with comments and perspectives.


Common questions, answers, comments and perspectives:


Why would EMC not just go into the same market space and mode  as FusionIO, a model that many other vendors seam eager to follow? IMHO many vendors are following or chasing FusionIO thus most  are selling in the same way perhaps to the same customers. Some of those vendors can very easily if they were not already also  make a quick change to their playbook adding some new moves to reach broader audience. Another smart move here is that by taking a  companion or complimentary approach is that EMC can continue selling existing storage systems to customers, keep those investments  while also supporting competitors products. In addition, for those customers who are slow to adopt the SSD based techniques, this is a  relatively easy and low risk way to gain confidence. Granted the disk drive was declared dead several years (and yes also  several decades) ago, however it is and will stay alive for many years due to SSD helping to close the IO storage and performance gap.


Storage IO performance and capacity gap
Data center and storage IO performance capacity gap (Courtesy of Cloud and Virtual Data Storage Networking (CRC Press))


Has this been done before? There have been other vendors  who have done LUN caching appliances in the past going back over a decade. Likewise there are PCIe RAID cards that support flash SSD  as well as DRAM based caching. Even NetApp has had similar products and functionality with their PAM cards.


Does VFCache work with other PCIe SSD cards such as FusionIO? No, VFCache is a combination of software IO intercept and intelligent cache driver along with a PCIe SSD flash card (which could be supplied as EMC has indicated from different manufactures). Thus VFCache to be VFCache requires the EMC IO intercept and intelligent cache software driver.


Does VFCache work with other vendors storage? Yes, Refer to the EMC support matrix, however the product has been architected and designed to install and coexist into a customers existing environment which means supporting different EMC block storage systems as well as those from other vendors. Keep in mind that a main theme of VFCache is to compliment, coexist, enhance and protect customers investments in storage systems to improve their effectiveness and productivity as opposed to replacing them.


Does VFCache introduce a new point of vendor lockin or stickiness? Some will see or place this as a new form of vendor lockin, others assuming that EMC supports different vendors storage systems downstream as well as offer options for different PCIe flash cards and keeps the solution affordable will assert it is no more lockin that other solutions. In fact by supporting third party storage systems as opposed to replacing them, smart sales people and marketeers will place VFCache as being more open and interoperable than some other PCIe flash card vendors approach. Keep in mind that avoiding vendor lockin is a shared responsibility (read more here).


Does VFCache work with NAS? VFCache does not work with NAS (NFS or CIFS) attached storage.


Does VFCache work with databases? Yes, VFCache is well suited for little data (e.g. database) and traditional OLTP or general business application process that may not be covered or supported by other so called big data focused or optimized solutions. Refer to this EMC document (and this document here) for more information.


Does VFCache only work with little data? While VFCache is well suited for little data (e.g. databases, share point, file and web servers, traditional business systems) it also able to work with other forms of unstructured data.


Does VFCache need VMware? No, While VFCache works with VMware vSphere including a vCenter plug in, however it does not need a hypervisor and is practical in a physical machine (PM) as it is in a virtual machine (VM).


Does VFCache work with Microsoft Windows? Yes, Refer to the EMC support matrix for specific server operating systems and hypervisor version support.


Does VFCache work with other unix platforms? Refer to the EMC support matrix for specific server operating systems and hypervisor version support.


How are reads handled with VFCache? The VFCache software (driver if you prefer) intercepts IO requests to LUNs that are being cached performing a quick lookup to see if there is a valid cache entry in the physical VFCache PCIe card. If there is a cache hit the IO is resolved from the closer or local PCIe card cache making for a lower latency or faster response time IO. In the case of a cache miss, the VFCache driver simply passes the IO request onto the normal SCSI or block (e.g. iSCSI, SAS, FC, FCoE) stack for processing by the downstream storage system (or appliance). Note that when the requested data is retrieved from the storage system, the VFCache driver will based on caching algorithms determinations place a copy of the data in the PCIe read cache. Thus the real power of the VFCache is the software implementing the cache lookup and cache management functions to leverage the PCIe card that complements the underlying block storage systems.


How are writes handled with VFCache? Unless put into a write cache mode which is not the default, VFCache software simply passes the IO operation onto the IO stack for downstream processing by the storage system or appliance attached via a block interface (e.g. iSCSI, SAS, FC, FCoE). Note that as part of the caching algorithms, the VFCache software will make determinations of what to keep in cache based on IO activity requests similar to how cache management results in better cache effectiveness in a storage system. Given EMCs long history of working with intelligent cache algorithms, one would expect some of that DNA exists or will be leveraged further in future versions of the software. Ironically this is where other vendors with long cache effectiveness histories such as IBM, HDS and NetApp among others should also be scratching their collective heads saying wow, we can or should be doing that as well (or better).


Can VFCache be used as a write cache? Yes, while its default mode is to be used as a persistent read cache to compliment server and application buffers in DRAM along with enhance effectiveness of downstream storage system (or appliances) caches, VFCache can also be configured as a persistent write cache.


Does VFCache include FAST automated tiering between different storage systems? The first version is only a caching tool, however think about it a bit, where the software sits, what storage systems it can work with, ability to learn and understand IO paths and patterns and you can get an idea of where EMC could evolve it to, similar to what they have done with recoverpoint among other tools.


Changing data access patterns and lifecycles
Evolving data access patterns and life cycles (more retention and reads)


Does VFCache mean all or nothing approach with EMC? While the complete VFCache solution comes from EMC (e.g. PCIe card and software), the solution will work with other block attached storage as well as existing EMC storage systems for investment protection.


Does VFCache support NAS based storage systems? The first release of VFCache only supports block based access, however the server that VFCache is installed in could certainly be functioning as a general purpose NAS (NFS or CIFS) server (see supported operating systems in EMC interoperability notes) in addition to being a database or other other application server.


Does VFCache require that all LUNs be cached? No, you can select which LUNs are cached and which ones are not.


Does VFCache run in an active / active mode? In the first release it is active passive, refer to EMC release notes for details.


Can VFCache be installed in multiple physical servers accessing the same shared storage system? Yes, however refer to EMC release notes on details about active / active vs. active / passive configuration rules for ensuring data integrity.


Who else is doing things like this? There are caching appliance vendors as well as others such as NetApp and IBM who have used SSD flash caching cards in their storage systems or virtualization appliances. However keep in mind that VFCache is placing the caching function closer to the application that is accessing it there by improving on the locality of reference (e.g. storage and IO effectiveness).
Does VFCache work with SSD drives installed in EMC or other storage systems? Check the EMC product support matrix for specific tested and certified solutions, however in general if the SSD drive is installed in a storage system that is supported as a block LUN (e.g. iSCSI, SAS, FC, FCoE) in theory it should be possible to work with VFCache. Emphasis, visit the EMC support matrix.

What type of nand flash SSD memory is EMC using in the PCIe card? The first release of VFCache is leveraging enterprise class SLC (Single Level Cell) nand flash which has been used in other EMC products for its endurance, long duty cycle to minnimize or eliminate concerns of wear and tear while meeting read and write performance. EMC has indicated that they will also as part of an industry trend leverage MLC along with Enterprise MLC (EMLC) technologies on a go forward basis.


Doesnt nand ssd flash cache wear out? While nand flash SSD can wear out over time due to extensive write use, the VFCache approach mitigates this by being primarily a read cache reducing the number or program / erase cycles (P/E cycles) that occur with write operations as well as initially leveraging longer duty cycle SLC flash. EMC also has several years experience from implementing wear leveling algorithms into the storage systems controllers to increase duty cycle and reduce wear on SLC flash which will play forward as MLC or Enterprise MLC (EMLC) techniques are leveraged. This differs from vendors who are positioning their SLC or MLC based flash PCIe SSD cards for mainly write operations which will cause more P/E cycles to occur at a faster rate reducing the duty or useful life of the device.


How much capacity does the VFCache PCIe card contain? The first release supports a 300GB card and EMC has indicated that added capacity and configuration options are in their plans.


Does this mean disks are dead? Contrary to popular industry folk lore (or wish) the hard disk drive (HDD) has plenty of life left part of which has been increased by being complimented by VFCache.


Various options and locations for SSD along with different usage scenarios
Various SSD locations, types, packaging and usage scenario options


Can VFCache work in blade servers? The VFCache software is transparent to blade, rack mount, tower or other types of servers. The hardware part of VFCache is a PCIe card which means that the blade server or system will need to be able to accommodate a PCIe card to compliment the PCIe based mezzaine IO card (e.g. iSCSI, SAS, FC, FCOE) used for accessing storage. What this means is that for blade systems or server vendors such as IBM who have a PCIe expansion module for their H series blade systems (it consumes a slot normally used by a server blade), PCIe cache cards like those being initially released by IBM could work, however check with the EMC interoperability matrix, as well as your specific blade server vendor for PCIe expansion capabilities. Given that EMC leverages Cisco UCS for their vBlocks, one would assume that those systems will also see VFCache modules in those systems. NetApp partners with Cisco using UCS in their FlexPods so you see where that could go as well along with potential other server vendors support  including Dell, HP, IBM and Oracle among others.


What about benchmarks? EMC has released some technical documents that show performance improvements in Oracle environments such as this here. Hopefully we will see EMC also release other workloads for different applications including Microsoft Exchange Solutions Proven (ESRP) along with SPC similar to what IBM recently did with their systems among others.


How do the first EMC supplied workload simulations compare vs. other PCIe cards? This is tough to gauge as many SSD solutions and in particular PCIe cards are doing apples to oranges comparisons. For example to generate a high IOPs rating for marketing purposes, most SSD solutions are stress performance tested at 512 bytes or 1/2 of a KByte or at least 1/8 of a small 4Kbyte IO. Note that operating systems such as Windows are moving to 4Kbyte page allocation size to align with growing IO sizes with databases moving from the old average of 4Kbytes to 8Kbytes and larger. What is important to consider is what is the average IO size and activity profile (e.g. reads vs. writes, random vs. sequential) for your applications. If your application is doing ultra small 1/2 Kbyte IOs, or even smaller 64 byte IOs (which should be handled by better application or file system caching in DRAM), then the smaller IO size and record setting examples will apply. However if your applications are more mainstream or larger, then those smaller IO size tests should be taken with a grain of salt. Also keep latency in mind that many target or oppourtunity applications for VFCache are response time sensitive or can benefit by the improved productivity they enable.


What is locality of reference? Locality of reference refers to how close data is to where it is being requested or accessed from. The closer the data to the application requesting the faster the response time or quick the work gets done. For example in the figure below L1/L2/L3 on board processor caches are the fastest, yet smallest while closest to the application running on the server. At the other extreme further down the stack, storage becomes large capacity, lower cost, however lower performing.


Locality of reference data and storage memory


What does cache effectiveness vs. cache utilization mean? Cache utilization is an indicator of how much the available cache capacity is being used however it does not give an indicator of if the cache is being well used or not. For example, cache could be 100 percent used, however there could be a low hit rate. Thus cache effectiveness is a gauge of how well the available cache is being used to improve performance in terms of more work being done (IOPS or bandwidth) or lower of latency and response time.


Isnt more cache better? More cache is not better, it is how the cache  is being used, this is a message that I would be disappointed in HDS if they  were not to bring up as a point of messaging (or rebuttal) given their history of  emphasis cache effectiveness vs. size or quantity (Hu, that is a hint btw ;).


What is the performance impact of VFCache on the host server? EMC is saying greatest of 5 percent or less CPU consumption which they claim is several times less than the competitions worst scenario, as well as claiming 512MB to 1GB of DRM on the server vs. several times that of their competitors. The difference could be expected to be via more off load functioning including flash translation layer (FTL), wear leveling and other optimization being handled by the PCIe card vs. being handled in the servers memory and using host server CPU cycles.


How does this compare to what NetApp or IBM does? NetApp, IBM and others have done caching with SSD in their storage systems, or leveraging third party PCIe SSD cards from different vendors to be installed in servers to be used as a storage target. Some vendors such as LSI have done caching on the PCIe cards (e.g. CacheCaid which in theory has a similar software caching concept to VFCache) to improve performance and effectiveness across JBOD and SAS devices.


What about stale (old or invalid) reads, how does VFCache handle or protect against those? Stale reads are handled via the VFCache management software tool or driver which leverages caching algorithms to decide what is valid or invalid data.


How much does VFCache cost? Refer to EMC announcement pricing, however EMC has indicated that they will be competitive with the market (supply and demand).


If a server shutdowns or reboots, what happens to the data in the VFCache? Being that the data is in non volatile SLC nand flash memory, information is not lost when the server reboots or loses power in the case of a shutdown, thus it is persistent. While exact details are not know as of this time, it is expected that the VFCache driver and software do some form of cache coherency and validity check to guard against stale reads or discard any other invalid cache entries.


Industry trends and perspectives


What will EMC do with VFCache in the future and on a larger scale such as an appliance? EMC via its own internal development and via acquisitions  has demonstrated ability to use various clustered techniques such as RapidIO for VMAX nodes, InfiniBand for connecting Isilon  nodes. Given an industry trend with several startups using PCIe flash cards installed in a server that then functions as a IO storage  system, it seems likely given EMCs history and experience with different storage systems, caching, and interconnects that they  could do something interesting. Perhaps Oracle Exadata III (Exadata I was HP, Exadata II was Sun/Oracle) could be an EMC based  appliance (That is pure speculation btw)?


EMC has already shown how it can use SSD drives as a  cache extension in VNX and CLARiiON servers ( FAST CACHE ) in addition to as a target or storage tier combined with Fast for tiering. Given their  history with caching algorithms, it would not be surprising to see other instantiations of the technology deployed in complimentary  ways.


Finally, EMC is showing that it can use nand flash SSD in different ways, various packaging forms to apply to diverse applications or customer environments. The companion or complimentary approach EMC is currently taking contrasts with some other vendors who are taking an all or nothing, its all SSD as disk is dead approach. Given the large installed base of disk based systems EMC as well as other vendors have in place, not to mention the investment by those customers, it makes sense to allow those customers the option of when, where and how they can leverage SSD technologies to coexist and complement their environments. Thus with VFCache, EMC is using SSD as a cache enabler to discuss the decades old and growing storage IO to capacity performance gap in a force multiplier model that spreads the cost over more TBytes, PBytes or EBytes while increasing the overall benefit, in other words effectiveness and productivity.


Additional related material:
Part I: EMC VFCache respinning SSD and intelligent caching
IT and storage economics 101, supply and demand
  2012 industry trends perspectives and commentary (predictions)
  Speaking of speeding up business with SSD storage
  New Seagate Momentus XT Hybrid drive (SSD and HDD)
  Are Hard Disk Drives (HDDs) getting too big?
  Unified storage systems showdown: NetApp FAS vs. EMC VNX
  Industry adoption vs. industry deployment, is  there a difference?
  Two companies on parallel tracks moving like trains offset by time: EMC and NetApp
  Data Center I/O Bottlenecks Performance Issues and Impacts
  From bits to bytes: Decoding Encoding
  Who is responsible for vendor lockin
  EMC VPLEX: Virtual Storage Redefined or Respun?
  SSD options for Virtual (and Physical) Environments: Part I Spinning up to speed on SSD
  EMC interoperabity support matrix


Ok, nuff said for now, I think I see some storm clouds rolling in...


Cheers gs


Greg Schulz - Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)


twitter @storageio

This is the first part of a two part series covering EMC VFCache, you can read the second part here.


EMC formerly announced VFCache (aka Project Lightning) an  IO accelerator product that comprises a PCIe nand flash card (aka Solid State Device or SSD) and intelligent cache management  software. In addition EMC is also talking about the next phase of the flash business unit and project Thunder. The approach EMC is taking with vFCache should not be a surprise given their history of starting out with memory and SSD  evolving it into an intelligent cache optimized storage solution.


Storage IO performance and capacity gap
Data center and storage IO performance capacity gap (Courtesy of Cloud and Virtual Data Storage Networking (CRC Press))


Could we see the future of where EMC will take VFCache along with  other possible solutions already being hinted at by the EMC flash business unit by looking where they have been already?


Likewise by looking at the  past can we see the future or how VFCache and sibling product solutions could evolve?


After all, EMC is no stranger to caching with both nand flash SSD (e.g. FLASH CACHE, FAST and SSD drives) along with DRAM based across their product portfolio not too mention being a core part of their company founding products that evolved into HDDs and more recent nand flash SSDs among others.


Industry trends and perspectives


Unlike others who also offer PCIe SSD cards such as  FusionIO with a focus on eliminating SANs or other storage (read their marketing), EMC not surprisingly is marching to a  different beat. The beat EMC is marching too or perhaps leading by example for others to follow is that of going mainstream and using  PCIe SSD cards as a cache to compliment theirs as well as other vendors storage systems vs. replacing them. This is similar to  what EMC and other mainstream storage vendors have done in the past such as with SSD drives being used as flash cache  extension on CLARiiON or VNX based systems as well as target or  storage tier.


Various options and locations for SSD along with different usage scenarios
Various SSD locations, types, packaging and usage scenario options


Other vendors including IBM, NetApp and Oracle  among others have also leveraged various packaging options of Single Level Cell (SLC) or Multi Level Cell (MLC) flash as caches in the past. A different example of SSD being used as a cache is the Seagate Momentus XT which is a desktop, workstation consumer type device. Seagate has shipped over a million of the Momentus XT which use SLC flash as a cache to compliment and enhance the integrated HDD performance (a 750GB with 8GB SLC memory is in the laptop Im using to type this with).


One of the premises of solutions such as those mentioned above for caching is to discuss changing data access patterns and life cycles shown in the figure below.


Changing data access patterns and lifecycles
Evolving data access patterns and life cycles (more retention and reads)


Put a different way, instead of focusing on just big data  or corner case (granted some of those are quite large) or ultra large cloud scale out solutions, EMC with VFCache is also  addressing their core business which includes little data. What will be interesting to watch and listen too is how some vendors  will start to jump up and down saying that they have done or enabling what EMC is announcing for some time. In some cases those  vendors will be rightfully doing and making noise on something that they should have made noise about before.


EMC is bringing the SSD message to the mainstream  business and storage marketplace showing how it is a compliment to, vs. a replacement of existing storage systems. By doing so,  they will show how to spread the cost of SSD out across a larger storage capacity footprint boosting the effectiveness and  productive of those systems. This means that customers who install the VFCache product can accelerate the performance of both their  existing EMC as well as storage systems from other vendors preserving their technology along with people skills investment.


Key points of VFCache

  • Combines PCIe SLC nand flash card (300GB) with intelligent caching management software driver for use in virtualized and traditional servers

  • Making SSD complimentary to existing installed block based disk (and or SSD) storage systems to increase their effectiveness

  • Providing investment protection while boosting productivity of existing EMC and third party storage in customer sites

  • Brings caching closer to the application where the data is accessed while leverage larger scale direct attached and SAN block storage

  • Focusing message for SSD back on to little data as well as big data for mainstream broad customer adoption scenarios

  • Leveraging benefit and strength of SSD as a read cache and scalable of underlying downstream disk for data storage

  • Reducing concerns around SSD endurance or duty cycle wear and tear by using as a read cache

  • Off loads underlying storage systems from some read requests enabling them to do more work for other servers


Additional related material:
Part II: EMC VFCache respinning SSD and intelligent caching
IT and storage economics 101, supply and demand
2012 industry trends perspectives and commentary (predictions)
Speaking of speeding up business with SSD storage
New Seagate Momentus XT Hybrid drive (SSD and HDD)
Are Hard Disk Drives (HDDs) getting too big?
Unified storage systems showdown: NetApp FAS vs. EMC VNX
Industry adoption vs. industry deployment, is  there a difference?
Two companies on parallel tracks moving like trains offset by time: EMC and NetApp
Data Center I/O Bottlenecks Performance Issues and Impacts
From bits to bytes: Decoding Encoding
Who is responsible for vendor lockin
EMC VPLEX: Virtual Storage Redefined or Respun?
SSD options for Virtual (and Physical) Environments: Part I Spinning up to speed on SSD

EMC interoperabity support matrix


Ok, nuff said for now, I think I see some storm clouds rolling in...


Cheers gs


Greg Schulz - Author Cloud and Virtual Data Storage Networking (CRC Press, 2011), The Green and Virtual Data Center (CRC Press, 2009), and Resilient Storage Networks (Elsevier, 2004)

twitter @storageio