When it comes to processing big data platforms, Hadoop has become the go-to platform. It allows vast amounts of data, especially unstructured or very diverse data, to be quickly processed. As the de facto open sources parallel file system for HPC environments, Lustre provides compute clusters with efficient storage and fast access to large data sets. Together these technologies help to solve big data problems. However, they also present some disadvantages, including a need for HTTP calls, added overhead, reduced efficiency, slower speed, and a requirement for fairly large local storage on each Hadoop node.
There is, however, a way to overcome those obstacles. As a Hadoop software adaptor, Intel Enterprise Edition for Lustre (IEEL) provides direct access to Lustre during MapReduce computations, improving performance.
A presentation by J. Mario Gallegos, at the Recent LUG 15 conference highlighted some of the advantages gained and some of the best practices to follow when adding IEEL.
Among the advantages observed:
You can read about Mario's other findings and see his LUG presentation here.