There is big data. And then there is the relatively more familiar type, let’s call it small data for now. The former goes the Hadoop/ MapReduce way. Whereas the latter is what the likes of Oracle, SQL Server, and DB2 have built their reputations on.
Coming to Hadoop, there appears to be unanimous agreement that it is complex. That taming its complexity requires specialized skills, falling into the realm of hard-to-find data scientists. Folks who have a handle on traditional data stores are clearly much easier to locate.
But today we are beginning to see tools that lower the challenge in using Hadoop by masking some of its complexity. Opening up SQL to Hadoop is an example. Another is the addition of RDBMS features to the Hadoop platform.
Is a convergence of big data and small data platforms on the horizon?
At this time, it might be a good idea to step back and review what is it that we do with data: storage, master data management, running of transactions, and performing analysis, to name a few. What if it did not matter whether the data was structured or semi-structured or unstructured? How about a unified approach to data processing that treats all data types equally efficiently? Meaning storage products, MDM systems, OLTP systems, and BI/Analytics applications handle all types within the same platform. While taking on the size and complexity of big data, to a large extent.
Evidently the results would be profound. For example, consider the prospect of analyzing sensor data and feeding the results to operational processes for generating real-time responses.
Basically we are talking about a framework that brings together all types of data: a direction in which it does seem that we are headed. Fast forward a few years and it does appear likely that all data, big and small, could converge in one common platform.
Assuming that is the case, there will be just “data” – with the systems that process it being indifferent to whether it is structured or unstructured, while managing its size and complexity.
It will be interesting to see what might be the predominant data platforms in vogue at that time. Would they include a version of Hadoop? Or would it be an entirely new range of platforms?
Clearly the amount of hidden value in unstructured data vastly surpasses that in structured data. After all it has been the least understood, yet has the largest volumes. There is just so much to be discovered.
Also that should mark the end of the big data hype.