I had the good fortune of attending another Strata+Hadoop World 2015 in beautiful NYC in the end of September, and although the weather wasn’t as great as it had been in past year’s conferences in the city, the show itself was a great success with many new announcements. Big Data events are always leading-edge and interesting because the technology is changing at such a rapid pace and so for me it’s always exciting to see the new companies that are constantly popping up, pointing to the vibrancy and general health of the Big Data ecosystem.
As someone who has followed this space quite closely for many years, I always like to get a feel for where the technology might be heading. For instance, at Strata+Hadoop World 2014 in NYC, was the first Hadoop conference where I felt that Spark was clearly becoming a mature product ready for primetime, so much so that, there was more activity focused on Spark (than on Map Reduce) even though it was a Hadoop conference. While not the answer for every use case, clearly the interest and applicability of the technology clearly signifies a win for Spark and also a win for Hadoop, and more importantly, a win for the entire Big Data community. I’ve always been a proponent for Spark (in-memory is so much more important in many use cases, that although they will not replace today’s MapReduce and HDFS, having the option is increasingly important), especially now that Streaming Analytics and Fast Data are quickly becoming part of the Big Data vernacular.
And although I agree that we don’t need more terms and catch words in space that is so chock full of them and spreading misinformation, I’m more interested in understanding why these terms are gaining popularity – and to me, that is the biggest insight of not only where the technology is heading but ultimately of how folks in this space are actually using the technology. To those nay-sayers of Big Data, let me say one word to you: IoT or the Internet of Things where literally any object on the planet can generate and transmit data. A temperature sensor is a perfect example of this, as are the tire pressure sensors in a vehicle or a cow fitted with a biometric sensor for that matter. And if I were to pick a theme from this year’s conference, IoT would be it.
For those of you wondering, why does IoT need to be separated? Isn’t it just a variant of the traditional Big Data problem? Well mostly yes, but with a few crucial differences.
Big Data, especially the volume aspect of it, can mean very different things to different people and it’s a very relative statement. For some folks, 100GB of data is Big Data and for some, 10PB! But the definition I’ve always gone with is: If you don’t know what to do with your data, irrespective of its volume, you have a Big Data problem! With IoT, I think we can all agree that there will be more data than we know what to do with, and this data is going to be coming at you fast so you better have an idea of how to do something useful with it quickly because it’s likely you will only be able to store just a small portion of it.
As with all new technologies, IoT data management and analytics bring with many challenges, but aside from the scale of the data (both volume and velocity) what’s really been lacking is a flexible platform that helps bring all these big data technologies together. And this is where I was pleased to see our good partner Intel release TAP or the Trusted Analytics Platform at Strata+Hadoop.
You can find more details on TAP here (https://trustedanalytics.org/) and I’m not going to go into all the details of it but let’s just say it’s going to make the life of Data Scientists the world over much easier to build an entire packaged data workflow. For starters its open source, works seamlessly with Cloud-based applications (both public and private), includes a series of algorithms and tools to analyze and collaborate, supports an open analytics layer that can support predictive API’s, and last but not least, a data layer that’s been optimized for Apache Hadoop and Spark. And since this is being released by Intel, it runs even better on their hardware.
We’ve heard these promises before and I’m as skeptical as you when it comes to tall claims but I really do think this one has much merit. So how does it do all of this? Well by streamlining and automating the processes needed to assemble the necessary tools and steps required to build applications and publish them. the devil is in the details but so far I’ve only heard positive things about it.
So if previous year’s conferences were more about improving the underlying technology and making them more efficient, I think we’re finally getting to the point where facilitating the deployment of the solutions is gaining more traction and to me, it doesn’t get much better than that. Better yet, Dell World 2015 was just held in beautiful Austin TX this week (http://dellworld.com/), but aside from it being my first ever Dell World, I was even more excited to see the number of sessions focused on both Big Data and IoT. This is clearly going to dominate many discussions in the Enterprise space in the coming months.
My biggest challenge at most conferences is – how do I pack in as many sessions as I possibly can? Not a big data problem but a problem nevertheless. I’d love to hear your thoughts so feel free to send me your comments at firstname.lastname@example.org or via twitter @AdnanKhaleel.