Data storage: Sink or swim

The weather is changing. So is data storage. What is the connection? Weather inspired tech industry metaphors, for one. Starting with the cloud itself! And some more spawned by the accelerated pace of data growth. Consider “deluge” or “flood” of big data. How about being “inundated” by data?

Of course the sheer scale of data growth is lending itself to the dramatization. Here are some oft cited stats. 90% of the world’s data has been generated over the last two years is a quote from 2013. Every day, we create 2.5 quintillion bytes of data is another one. An aside: it appears that Americans and British have their own definitions of a quintillion — 10^18 in the US and 10^30 in Great Britain (per Dictionary.com). Unstructured data accounts for most of this data growth:  An analyst firm graph that depicted storage usage over time showed 70 exabytes of the 2014 projection of 80EB of storage coming from unstructured data (an exabyte (EB) is 10^ 18, which seems to match the US definition of a quintillion).

With all this action happening on the data growth front, something’s gotta give when it comes to storing it. That something includes the legacy SAN and NAS solutions based on block and file storage that are slowly but surely giving way to newer storage technologies. At this time it may be instructive to take a quick detour into the history of storage: DAS, SAN and NAS. In the beginning, there was DAS directly attaching drives to the server. Got to attach the storage somewhere after all. Then the storage moved away from the servers onto the network. NAS was the first network storage option with remote file access, arriving during the 1980s. Shifting to block storage, Fibre Channel came in during the 1990s and iSCSI in the 2000s. Accelerating Ethernet speeds have led to a decline in Fibre Channel adoption. FC folks introduced Fibre Channel Over Ethernet (FCOE) but that turned out to be somewhat of a non-starter. One of the reasons appears to have been that there were as many FCOE versions as there were vendors. RAID arrays too are falling by the wayside. Given the rate of data growth, the time to rebuild disks started to become unmanageable.

Today there are lots of terms at play in what can seem like a hyperactive, somewhat confusing market. We have hyperconverged, with tightly integrated compute and storage, and hyperscale where compute and storage scale out independently, indefinitely in a cloud-like fashion. Indeed hyperscale is what the web giants, the Googles, Amazons, and Facebooks use. To add to the confusion, a term that sounds similar is Server SAN that is not to be mixed up with the traditional SAN, this represents software-defined storage on commodity servers with directly attached storage (DAS) aggregating all the locally connected storage. Going back to the future with the humble DAS.

But hyperscale storage is where things appear to be headed. Object storage is the technology underlying hyperscale storage, which is already deployed in the public cloud. It is for storing unstructured data, which is clearly where the need is. Not suitable though for structured, transactional data, which is the realm of block storage. Hence block storage will likely not go away anytime soon. On the other hand, legacy NAS is quite another thing; Object storage is chipping away at its share since it does a much better job with the millions and billions of files.

Toward reining in the runaway growth in data, object storage seems just what is needed. However as in most things in life, there is a catch. For getting data in and out, object storage uses a RESTful API. Yet there does not appear to be a widely accepted industry standard specification for this API. As the 800 pound gorilla in the room, Amazon gets to position the S3 RESTful API as the defacto standard. Cloud Data Management Interface (CDMI) is the actual industry standard created by the SNIA. This should have been the standard that everyone aligned with, but it seems to have few takers. OpenStack Swift appears to have had more luck with its RESTful API. Therefore there seem to be at least three different standards and vendors get to pick and choose. The question is what happens when Amazon decides to change the S3 standard, which they are perfectly entitled to do given that the standard belongs to them. Presumably there will be a scramble among vendors to adapt and comply. Amazon literally seems to have the industry on a leash. Since they were pretty much the first cloud based infrastructure services provider, they could say with some justification they invented the space.

Perhaps users need to step up and use the power of the purse to straighten things out.  Unlikely we can solve this via regulatory fiat! When the data torrents come rushing in and the time comes to start swimmin’, hopefully Amazon is not the only game in town to stop storage systems from sinking like a stone.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s