Originally published at HPCatDell on Mar. 3, 2014

by Joey Jablonski

The High Performance Computing market and associated technologies can trace their roots back to the 1970s when performance was the only metric, and pioneers began to rethink how complex systems were defined.  The HPC space has continued to drive innovation through performance improvements and has led to many technologies now commonly used in enterprise computing.

The Big Data market began to evolve in 2003 with Google publishing a series of papers on the internal methods they used to store, process and analyze massively large data sets.  This set the tone for others to build open source solutions around these ideas, enabling them to scale to the level of Google, without starting from the beginning.  Most of this discussion will focus about Hadoop and the capabilities it has brought to the Big Data space, but Big Data is more than just Hadoop, the advanced technologies and products that make up the Big Data landscape are endless. 

Both the HPC and Big Data worlds are characterized by the common availability and use of open source software.  This open nature and common collaboration has enabled a rapid pace of innovation and enhancements that can quickly be consumed by a broad audience.  Both also allow data to be used more effectively to make better, data driven decisions.

 There are two primary differences between HPC and Big Data, one technical one organizational:

  • Technical – While both HPC and Big Data make use of commodity hardware and distributed computing paradigms, the most common difference is which method is used for the distribution, coordination and communication between the distributed workloads.  In HPC communication is commonly implemented via MPI.  In Big Data, it varies depending on which of the V of the 3-Vs is being solved. MapReduce, implemented with Hadoop is the most common.  MapReduce, coupled with a distributed file system, provides an HPC-similar model for distributed computing.
  • Organizational – The dynamics behind HPC and Big Data have affected which types of organizations have adopted the different technologies.  HPC has been dominant in the government, oil & gas, automotive and education spaces.  This is mostly a result of the level of expertise in both computer science as well as core sciences that are needed to deploy, and effectively use the capabilities of HPC platforms.  Big Data on the other hand has been adopted by a much broader audience because of the interfaces that enable it to be used by a much broader group of computer science professionals.

The biggest separation between how HPC and Big Data technologies are used in any environment are the types of workloads and use cases that are targeted for each model.  HPC and Big Data each provide capabilities for specific types of data analysis, with each providing large performance enhancements for certain types of workloads.

  • HPC traditionally has been dominant in the government, research, oil & gas and financial services markets.  The use cases for HPC are endless and include research around physics and chemistry, modeling financial markets and modeling oil reservoirs.
  • Big Data has penetrated new markets that previously were not of the right size to adopt HPC capabilities, including marketing analytics, customer analytics, search optimization and fraud analysis.

Most importantly, HPC and Big Data are solutions for different problems.  While there are inevitably computational challenges that both can solve, they inherently do not compete with one another, nor will one displace the other for the foreseeable future.  HPC will continue to be the dominant method for analysis of data related to sciences – chemistry, physics and their implications on manufacturing and the environment.  Big Data will continue to grow as an easy to adopt method for people to analyze the continually growing portions of human-generated content available.

In future postings we look forward to exploring how HPC and Big Data complement one another through sharing of technologies and best practices.  We will explore what technologies from the HPC space have enabled Big Data to become as popular as it has, as well as what technologies from the Big Data space are making their way to traditional HPC environments.