Data Warehousing is close to my heart as I’ve spent a significant part of my career building, modelling or architecting large and complex data warehouses and analytics systems. If I am not building one, I am certainly using one. Finding the data sources, cleaning and loading them into structured star schemas and deriving intelligence has been great fun, besides solving some unique problems along the way.
The data that we handled (a terabyte a month) then in comparison to the data that we are handling now (a terabyte a day) just tells the story in terms of data explosion. 12 TB of tweets and about 500 million call records every day are just a few things that I keep reading from time to time.
The more I look back, however, the more I feel that the Data warehouse approach was (and still is) too expensive, too complicated and too slow to get the data to where it’s needed. The thought of waiting for the data to get into a structured database to even begin acting on it is like waiting for the vegetables to be carved up intricately before it’s served. A certain amount of latency was understood by the business as necessary in the case of a Data warehouse.
And the worst part is that some data (e.g. call recordings or unstructured email messages) that don’t lend themselves to traditional relational databases and cannot be dealt with in traditional ways are simply brushed aside or in some cases expensive technologies are used to mine them separately. A certain amount of constraint (e.g. space, structured data etc.) is again understood as necessary to operate a Data warehouse. Some would argue these things can easily be fixed (like throwing more hardware). Well, I agree, if you have deep pockets.
Data warehousing, and in particular relational databases, for the last two decades has been seen as the only solution for pulling together disparate data sources to make business sense, until of course Google and Facebook came along and disrupted things a bit with Big Data technologies. They have led the way for a new form of storing and searching data that don’t use any databases, runs on cheap commodity hardware and the best part is: it doesn’t cost half as much and is not constrained by size, volume, where the data lives or whether the data is structured or unstructured.
Well then, is this the end of the road for big Nellie? Well, with some reservation, we think it is. We say this with much excitement, of course, as it will give way to new technologies and will force the traditional database vendors to innovate (to be fair they already are!). We don’t think the Data warehouses or relational databases would be out any time soon and there will always be a place for it standing shoulder to shoulder other Big Data technologies. In future, however, data warehousing or relational databases may not be the default choice or the only choice available to data crunchers.