Originally published at HPCatDell on Mar. 7, 2014
by Armando Acosta
Big Data and High Performance Computing are two disruptive technologies that have changed the game for large scale data analysis. The two technologies are built for different purposes filling different needs. HPC grew out of a need for large scale computational speed and high performance for scientific research, Hadoop was born from the Web 2.0 space based on the need to process large volumes of data. My colleague, Joey Jablonski, does a fantastic job of going into more detail, read more here.
These differences have led to barriers of entry and are making it very inefficient for HPC customers to experiment with big data. In this second installment, we'll look at how Dell and Intel are working to break down those barriers by providing a solution allowing customers to utilize most of the infrastructure they have today. This reduces the required up-front expense for dedicated hardware and skill set reserved for Hadoop-only jobs, while preserving current high performance file systems, management tools and resource schedulers.
The file system is the logical starting point to try and help solve this pain. Today, many HPC environments utilize Lustre, a type of parallel distributed file system that provides higher performance compared to the Hadoop distributed file system. Lustre provides high I/O throughput in clusters and shared-data environments providing independence from data location on the physical storage, eliminates single points of failure, and provides fast recovery from cluster reconfiguration and server or network outages. The Intel Apache Hadoop Distribution (IDH) 3.1 will support a Lustre plug in allowing customers to run Hadoop jobs with Lustre as the file system instead of HDFS. This allows customers to use the expertise and file system they have in place today without modification, saving time and money. Dell will also provide a Reference Architecture for IDH 3.1 giving customers a how-to guide for building Hadoop environments.
Resource schedulers are another key cog in the equation. SLURM is a prime example of a commonly used resource manager designed for Linux clusters. It provides three key functions: allocating access to resources (computer nodes) to users so they can perform work; providing a framework for starting, executing, and monitoring work (typically a parallel job); and arbitrating contention for resources by managing a queue of pending work. HPC customers have invested time and resources designing dozens of optional plugins specific to their environment, and having to utilize a separate resource manager for Hadoop interrupts their process thus wasting time and investment. In IDH 3.1, Intel Hadoop Manager will support SLURM integration giving customers the benefit of using one job scheduler for HPC and Hadoop. This work will enable the two technologies to come together in the same environment providing more variations of data analysis delivered via the tool set most HPC customers are accustomed to using.
Please stay tuned for additional blogs as we continue our discussion about the evolution of Big Data and HPC.