Originally published at HPCatDell on Mar. 3, 2014
by Joey Jablonski
The High Performance Computing market and associated technologies can trace their roots back to the 1970s when performance was the only metric, and pioneers began to rethink how complex systems were defined. The HPC space has continued to drive innovation through performance improvements and has led to many technologies now commonly used in enterprise computing.
The Big Data market began to evolve in 2003 with Google publishing a series of papers on the internal methods they used to store, process and analyze massively large data sets. This set the tone for others to build open source solutions around these ideas, enabling them to scale to the level of Google, without starting from the beginning. Most of this discussion will focus about Hadoop and the capabilities it has brought to the Big Data space, but Big Data is more than just Hadoop, the advanced technologies and products that make up the Big Data landscape are endless.
Both the HPC and Big Data worlds are characterized by the common availability and use of open source software. This open nature and common collaboration has enabled a rapid pace of innovation and enhancements that can quickly be consumed by a broad audience. Both also allow data to be used more effectively to make better, data driven decisions.
There are two primary differences between HPC and Big Data, one technical one organizational:
The biggest separation between how HPC and Big Data technologies are used in any environment are the types of workloads and use cases that are targeted for each model. HPC and Big Data each provide capabilities for specific types of data analysis, with each providing large performance enhancements for certain types of workloads.
Most importantly, HPC and Big Data are solutions for different problems. While there are inevitably computational challenges that both can solve, they inherently do not compete with one another, nor will one displace the other for the foreseeable future. HPC will continue to be the dominant method for analysis of data related to sciences – chemistry, physics and their implications on manufacturing and the environment. Big Data will continue to grow as an easy to adopt method for people to analyze the continually growing portions of human-generated content available.
In future postings we look forward to exploring how HPC and Big Data complement one another through sharing of technologies and best practices. We will explore what technologies from the HPC space have enabled Big Data to become as popular as it has, as well as what technologies from the Big Data space are making their way to traditional HPC environments.