I've been digesting all the activity from the 7th Annual Hadoop Summit, and there has been a lot of activity, both during the Summit and over the past year.


My experiences at the Summit made me think that the current state of Apache Hadoop reminds me of the early days of the web, just at the point where the LAMP stack emerged, and web development went mainstream.

Since its creation in 2005, Hadoop seems to have followed a similar pattern to the evolution of the LAMP stack, and has been referred to as a SMAQ stack - Storage, MapReduce, and Query. The core Hadoop stack of the Java virtual machine that include HDFS for storage, and MapReduce for programming have been the mainstays of the platform. Together, they have been able to solve a large class of problems in the Big Data space. We have seen the evolution of programming tools and database functionality. There are systems in production, and entire businesses based on this model.

But after 9 years of extensive development, Hadoop has not yet seen the same mainstream adoption curve as the LAMP stack.   I believe there’s a good reason for this delayed adoption.  In hindsight, the bar for mainstream adoption of the LAMP model was not that high – it was obviously better, and not all that much harder – once the early innovators showed the way. With Hadoop, the requirements bar has been much higher – the industry has 35 or so years of investment in data infrastructure. 

Based on the Summit and the activity of the past year, I'm convinced that we're crossing the chasm, and 2014 is the turning point where Hadoop goes mainstream.   I also think it will happen quickly.

There are a few key reasons why:

  • YARN now enables Hadoop as a true multipurpose parallel data platform.
  • Almost 10 years of development in other systems is coalescing around the Hadoop ecosystem.
  • There is a huge amount of diversity in the ecosystem already.
  • Security has matured, and the community is doing it the right way.
  • SQL Query on Hadoop is here, now.
  • New, interactive environments like Spark are changing the way Hadoop is used.
  • The economics are compelling.

What does this all mean for Hadoop adopters?

  • Expect the rate of change in the Hadoop ecosystem to accelerate as independent projects start to coalesce around YARN and the Hadoop platform.
  • Start looking at the options for all the layers of the Big Data stack. Whether you are looking for storage, stream processing, query, search, graph, in-memory processing, or machine learning, there are multiple options to choose from and many are mature.
  • Watch what the Hadoop distro vendors include, since they are all deeply involved in the core projects, and have strong opinions on their stacks.
  • Don't be afraid to try some alternatives, even if they're not in the distros yet. Diversity is the key to innovation.

When the elephant finally crosses that chasm, it's going to be a stampede.