Over the past several years, Larry Smarr, director of the California Institute for Telecommunications and Information Technology (Calit2) at the University of California San Diego, has been exploring how advanced data analytics tools can find patterns in microbial distribution data. These discoveries can lead to newer and better clinical applications. However, with 100 trillion microorganisms in the human body, more than 10 times the number of human cells, and 100 times the number of DNA as human genes, microbes are the epitome of big data! To unlock their secrets, Dr. Smarr employs high performance computing with impressive results.
His interest in this topic arose from a desire to improve his own health. To obtain the data he needed, he began using himself as a "guinea pig." Working with his doctors, and quantifying a wide-range of his personal microbial data, Smarr was finally able to obtain a diagnosis of Crohn's disease, which several traditional examinations had dismissed.
From this discovery, Dr. Smarr is now working to unlock what role microorganisms may play in other intestinal diseases.
You can hear his presentation on this fascinating topic at insideHPC.
by Quy Ta
This blog will explore a hybrid computing environment that takes Lustre®, a high performance parallel file system and integrates it with Hadoop®, a framework for processing and storing big data in a distributed environment. We will explore some reasons and benefits of such a hybrid approach and provide a foundation on how to easily and quickly implement the solution using Bright Cluster Manager® (BCM) to deploy and configure the hybrid cluster.
First, let’s establish some definitions and technologies for our discussion. Hadoop is a software framework for distributed storage and processing of typically very large data sets on compute clusters. The Lustre file system is a parallel distributed file system that is often the choice for large scale computing clusters. In the context of this blog, we define a hybrid cluster as taking a traditional HPC cluster and integrating a Hadoop computing environment capable of processing MapReduce jobs using the Lustre File System. The hybrid solution that we will use as an example in this blog was jointly developed and consists of components from Dell, Intel, Cloudera and Bright Computing.
Why would you want to use the Lustre file system with Hadoop? Why not just use the native Hadoop file system, HDFS? Scientists and researchers have been looking for ways to use both Lustre and Hadoop from within a shared HPC infrastructure. This hybrid approach will allow them to use Lustre as both the file system for Hadoop analytics work as well as the file system for their general HPC workloads. They can also avoid standing up two different clusters (HPC and Hadoop), and the associated resources required, by allowing the re-purposed provisioning of the existing HPC cluster resources into a small to medium sized self-contained Hadoop cluster. This solution would typically target those HPC users that have a need to run periodic Hadoop specific jobs.
A key component to connecting the Hadoop and Lustre ecosystems is the Intel Hadoop Adapter for Lustre plug-in or Intel HAL for short. Intel HAL is bundled with the Intel Enterprise Edition for Lustre software. It allows the users to run MapReduce jobs directly on a Lustre file system. The immediate benefit is that Lustre is able to deliver faster, stable and easily managed storage for the MapReduce applications directly. A potential long term benefit using Lustre as the underlying Hadoop storage would be a higher raw capacity available when compared to HDFS due to the three time replication as well as the performance benefits of running Lustre on InfiniBand connectivity. The following architectural diagram will illustrate a typical topology for the hybrid solution.
The following will be a high level recount of how we easily implement the solution using the BCM tool to deploy and configure.
The first thing we did was to establish an optimized and fully functional Lustre environment. For this solution, we used the Dell Storage for HPC with Intel Enterprise Edition (EE) for Lustre software as the Lustre solution, Cloudera CDH as the Hadoop distribution and Bright Cluster Manager (BCM) as the imaging and cluster deployment tool.
Using the Intel Manager for Lustre (IML) GUI interface, we verified the MDT and OST objects are healthy and in an optimal state. We also verified that the LNet interface and the Lustre Kernel modules are loaded and Lustre NIDS are accessible.
Verify contents of /etc/modprobe.d/iml_lnet_module_parameters.conf are correct for each MDS and OSS server. Example below.
[root@boulder_mds1 ~]# cat /etc/modprobe.d/iml_lnet_module_parameters.conf
# This file is auto-generated for Lustre NID configuration by IML
# Do not overwrite this file or edit its contents directly
options lnet networks=o2ib0(ib0)
### LNet Configuration Data
## "state": "lnet_unloaded",
## "modprobe_entries": [
## "network_interfaces": [
Using the IML GUI, verify status of MDT and OST objects. There should be no file system alerts and all MDT and OST objects should have green status.
Configuration > File Systems > Current File Systems > “lustrefs”
(Click on all images to enlarge.)
Verify that UIDs and GIDs are consistent on Lustre clients. This must be done before installing Hadoop software. In particular, the following users and groups should be checked:
users: hdfs, mapred, yarn, hbase, zookeeper
groups: hadoop, zookeeper, hbase
We used the following script to set up our Hadoop users prior to installing Hadoop:
for i in hive hbase hdfs mapred yarn;
VALUE=$(expr $VALUE + 1);
groupadd -g $VALUE $i;
adduser -u $VALUE -g $VALUE $i;
groupadd -g 10006 hadoop;
groupmems -g hadoop -a yarn;
groupmems -g hadoop -a mapred;
groupmems -g hadoop -a hdfs;
usermod -d /var/lib/hive -s /sbin/nologin hive;
usermod -d /var/run/hbase -s /sbin/nologin hbase;
usermod -d /var/lib/hadoop-yarn -s /sbin/nologin yarn;
usermod -d /var/lib/hadoop-mapreduce -s /sbin/nologin mapred;
usermod -d /var/lib/hadoop-hdfs -s /bin/bash hdfs
As a sanity check, we verified the nodes we wanted to re-provision as Hadoop nodes that were able to read/write to the Lustre file system.
Once we verified all the pre-requisite items above and established that we had a working Lustre environment, we proceeded with the following steps to build, configure and deploy the Hadoop nodes that mount and use the Lustre file system.
Steps we took to build the hybrid solution:
1) Created a software image ‘ieel-hadoop’ using BCM. You can clone an existing software image.
2) Created a node category ‘ieel-hadoop’ using BCM. You can clone an existing node category.
3) We assigned the selected nodes to be provisioned as Hadoop nodes to the ieel-hadoop node category.
4) Installed Cloudera CDH 5.1.2 and the Intel Hadoop Adapter for Lustre (HAL) plug-in into the ieel-hadoop software image.
5) We installed the Intel EE for Lustre client software onto the ieel-hadoop software image.
6) We prepared the Lustre directory for Hadoop on the ieel-hadoop software image.
#chmod 0777 /mnt/lustre/Hadoop
#setfacl –R –m group:hadoop:rwx /mnt/lustre/hadoop
#setfacl –R –d –m group:hadoop:rwx /mnt/lustre/hadoop
7) Added the Lustre file system mount point to the ieel-hadoop node category for automatic mounting upon bootup.
Example: 192.168.4.140@tcp0:192.168.4.141@tcp0:/lustre /mnt/lustre lustre defaults,_netdev 0 0
8) We added several tuning parameters to /etc/sysctl.conf in the ieel-hadoop software image.
lctl set_param osc.*.max_dirty_mb=512
lctl set_param osc.*.max_rpcs_in_flight=32
To further optimize the solution, you can edit the core-site.xml and mapred-site.xml with the following Hadoop configuration for Lustre.
As high performance computing becomes more prevalent, the challenges posed increase as well. Take, as just one example, the major challenges the biomedical field faces as researchers integrate and analyze an ever-growing amount of diverse data. Now consider the vast array of industries and academic departments increasingly reliant on big data and high performance computing.
Recently, Jay Etchings, the director of operations and research computing, and senior HPC architect at Arizona State University, began a series of thought pieces for HPCWire addressing how ASU is meeting those challenges by building a Next Generation Cyber Capability (NGCC.) The university is forging a path quite different from the traditional academic HPC model.
Their model is sustainable, collaborative, elastic, and distributed. Promising to overcome legacy barriers while opening new avenues into research sciences, it embraces 7 foundational components.
The NGCC provides a novel synergism that integrates big data platforms and traditional supercomputing technologies with software-defined networking, inter-intra-university high-speed interconnects, workload virtualization, coprocessors, and cloud-bursting capacity. The result heralds the next important wave of analytics innovation for life sciences research.
You can read all of Jay's article in HPCWire here.
Congratulations to Merle Giles on co-editing Industrial Applications of High Performance Computing: Best Global Practices. The new book offers the insights of 11 experts from around the world, each addressing high performance computing and the various other issues surrounding HPC today. You can buy the book at CRC Press.
Giles, who is the director of Business and Economic Development at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, was also named one of the People to Watch 2015 by HPCWire.
Under his leadership, the NCSA's Private Sector Program is helping companies across every industry to discover competitive advantages through better application performance.
Again, many congratulations to a well deserving leader in HPC.
This week developers, researchers, technologists and academics from around the world are gathering in Silicon Valley for the GPU Technology Conference. It offers some impressive keynotes, over 500 sessions covering a wide range of topics, and the opportunity to network with other professionals.
At the Dell booth (Booth #319), attendees can view on-site demonstrations, meet with industry experts, and learn more about virtualization. Be sure to stop by!
The GPU Technology Conference runs from Tuesday, March 17 through Thursday, March 19. You can learn more here.
by Jimmy Pike
Is there be any better news than knowing that progress is being made in the treatment of pediatric cancers in America? How about learning that those efforts are expanding to a global scale? Thanks to the efforts of Translational Genomics Research Institute (TGen) and the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC), with help from Dell and others, valuable genomic information is being unlocked faster than ever before. This, in turn, is helping researchers in their quest for better treatments, allowing doctors to offer safer, more targeted care to patients half-way around the world.
A new TGen and NMTRC collaborative international trial is under way. It is the third precision medical trial stemming from the collaboration since TGen, NMTRC, Dell and others first began to partner on pediatric cancer research in 2011. The goal is to treat an additional 150 patients, approximately 5 of whom live in Lebanon.
This goal to help even more children with pediatric cancers is possible thanks to the ability of researchers around the world to share data through Spectrum Health, quickly analyze that data, and collaborate on the findings to design personalized care plans for treating each individual patient's tumor.
Naturally, the computing power needed to study the 3 billion bases of a human genome is massive. A single experiment at TGen can easily produce 30 terabytes of data. However, since the introduction of high performance computing four years ago, TGen's gene sequencing capacity has grown by an impressive 1,200-percent.
Additionally, the time needed for gene sequencing has been trimmed in half, while the analytical process needed to customize treatments has been slashed from seven days to approximately four hours. In turn, these improvements have boosted the efficiency of researchers and staff, allowing them to dedicate more time and resources to treatment.
It is, after all, treatment that saves lives.
You can read more about the fight against pediatric cancer in our earlier blog, and learn more about TGen and Dell's partnership, in this whitepaper.
by Saeed Iqbal and Deepthi Cherlopalle
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD can benefit greatly from the Intel Xeon Phi family of coprocessors. Intel’s MIC is based on the x86- technology and it runs on Linux OS. Among the several coprocessors in this family, the 7120P with 61 cores has one of the highest performance ratings of 1.22 TFLOPS. The 7120P also has the highest memory per card of 16 GB and maximum memory bandwidth of 352 GB/s.
In this blog we evaluate NAMD performance with 7120P coprocessors on PowerEdge C4130 Server. Three proteins ApoA1, F1ATPase and STMV are used in our study due to their relatively large problem size. ApoA1 is high density lipoprotein found in plasma, which helps extraction of cholesterol from tissues to liver. F1ATPase is one of the most abundant, responsible for the synthesizing of the molecule adenosine tri-phosphate. STMV stands for satellite tobacco mosaic which worsens the symptoms of infections by tobacco mosaic virus. The performance measure is “days/ns,” that shows the number of days required to simulate 1 nanosecond of real-time. Table 1 shows the problem size for different systems. (Click on all images to enlarge)
NAMD tests are performed on Dell PowerEdge C4130, which can accommodate up to four 7120P Phis in a 1U form factor. The PowerEdge C4130 also offers a configurable system design (“A” through “E”), potentially making it a better fit for MD codes in general. The three configurations “C” (a four Phi configuration), “D” (a two Phi configuration) and “E” (a two Phi configuration) are compared. Part of the goal is to see performance variations with different configurations for NAMD. Table 2 gives more information about the hardware configuration and application details used for the tests.
Figure 1 illustrates the performance of NAMD on the PowerEdge C4130 Server. NAMD can benefit performance boost by adding coprocessors to the server. Configuration C has 2 CPU’s with 4 coprocessors, 2 coprocessors are connected to each CPU. Configuration D has 2 CPU’s with 2 coprocessors; each coprocessor is connected to a single CPU whereas Configuration E has 2 coprocessors connected to 1 CPU. In all these configurations PHIs are connected directly to the CPU. Configuration C and D are the most balanced configurations with equal distribution of phi to CPU. The CPU-only configuration is just shown for reference. ECC and TURBO mode are disabled for all the coprocessors across all the runs.
The F1ATPase and STMV show big performance advantage with PHI. APOA1 shows no significant performance gain when compared to F1ATPASE and STMV because of the small dataset. Results show an additional gain of 2.6X for F1ATPase dataset and 2.5X for STMV dataset with C configuration. For Configuration D an additional gain of 2.4X for F1ATPase dataset and 2.3X for STMV dataset is observed. For configuration E which has only 1 CPU does not show much performance gain because NAMD is CPU-sensitive. Results show that using two additional coprocessors in configuration E does not boost up the performance significantly.
As shown in Figure 2, the power consumption for C configuration is about 2.4X whereas for Configuration D it is 1.7X. Power consumption is higher in configuration C when compared to Configuration D because of two additional coprocessors in the server. Power consumption for Configuration E is 1.4X which is low when compared to Configuration D this is because “E” does not have an additional CPU. Configuration “C” and “D” offer good performance per watt.
All of the student teams attending this summer's HPC Advisory Council's International Supercomputing Conference (HPCAC-ISC) have the same goal in mind: win the student cluster competition. But the new team from South Africa may feel some added pressure when they arrive in Frankfurt this July. They hope to become the third consecutive champions from their country.
Team "Wits-A" from the University of Witwatersrand in Johannesburg won the right to defend South Africa's title at ISC '15 during the South African Center for High Performance Computing's (CHPC) Ninth National Meeting held in December at Kruger National Park. The students bested 7 other teams from around South Africa.
As part of their victory, the South Africans recently traveled to the United States. On their itinerary was a tour of the Texas Advanced Computing Center (TACC) where they had the opportunity to see the Visualization Laboratory (Vislab) and Stampede Supercomputer, while gaining insights about how to best compete at the ISC '15 Student Cluster Challenge in July. Also on the itinerary was a Texas tradition - sampling some down home BBQ!
Hoping for that three-peat win are Ari Croock, James Allingham, Sasha Naidoo, Robert Clucas, Paul Osel Sekyere, and Jenalea Miller, with reserve team members are comprised of Vyacheslav Schevchenko and Nabeel Rajab.
You can learn more about Team South Africa here.
by Suzanne Tracy
Some 4,100 genetic diseases affect humans. Tragically, they are also the primary cause of death in infants, but identifying which specific genetic disease is affecting an inflicted child is a monumental task. Increasingly, however, medical teams are turning to high performance computing and big data to uncover the genetic cause of pediatric illnesses.
Through the adoption of HPC and big data, clinicians are now able to accelerate the delivery of new diagnostic and personalized medical treatment options. Successful personalized medicine is the result of analyzing genetic and molecular data from both patient and research databases. The usage of high performance computing allows clinicians to quickly run the complex algorithms needed to analyze the terabytes of associated data.
The marriage of personalized medicine and high performance computing is now helping to save the lives of pediatric cancer patients thanks to a collaboration between Translational Genomics Research Institute (TGen) and the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC).
The NMTRC conducts various medical trials, generating literally hundreds of measurements per patient, which then must be analyzed and stored. Through a ground-breaking collaboration between TGen, Dell and Intel, NMTRC is now using TGen’s highly-specialized software and tools, which include Dell’s Genomic Data Analysis Platform and cloud technology, to decrease the data analysis time from 10 days to as little as six hours. With this information, clinicians are able to quickly treat their patients, and dramatically improve the efficacy of their trials.
Thanks to the collaboration, NMTRC has launched personalized pediatric cancer medical trials to provide near real-time information on individual patients' tumors. This allows clinicians to make faster and more accurate diagnoses, while determining the most effective medications to treat each young patient. Clinicians are now able to target the exact malignant tumor, while limiting any potential residual harm to the patient.
You can read more about this inspiring collaboration here.
by Saeed Iqbal and Deepthi Cherlopalle
The Intel Xeon Phi Series can be used to accelerate HPC applications in the C4130. The highly parallel architecture on Phi Coprocessors can boost the parallel applications. These coprocessors work seamlessly with the standard Xeon E5 processors series to provide additional parallel hardware to boost parallel applications. A key benefit of the Xeon Phi series is that these don’t require redesigning the application, only compiler directives are required to be able to use the Xeon Phi coprocessor.
Fundamentally, the Intel Xeon series are many-core parallel processors, with each core having a dedicated L2 cache. The cores are connected through a bi-directional ring interconnects. Intel offers a complete set of development, performance monitoring and tuning tools through its Parallel Studio and VTune. The goal is to enable HPC users to get advantage from the parallel hardware with minimal changes to the code.
The Xeon Phi has two modes of operation, the offload mode and native mode. In the offload mode designed parts of the application are “offloaded” to the Xeon Phi, if available in the server. Required code and data is copied from a host to the coprocessor, processing is done parallel in the Phi coprocessor and results move back to the host. There are two kinds of offload modes, non-shared and virtual-shared memory modes. Each offload mode offers different levels of user control on data movement to and from the coprocessor and incurs different types of overheads. In the native mode, the application runs on both host and Xeon Phi simultaneously, communication required data among themselves as needed. A good reference on Xeon Phi and modes can be found here.
The Intel Xeon Phi 7120P coprocessor has the highest performance among the Phi series. It has 61 cores and is rated at 1.2 TFLOPS and can handle 244 threads. The 7120P also has the Intel Turbo Boost technology. The bulk of the compute intensive calculations are done on the coprocessors.
The PowerEdge C4130 offers five configurations “A” through “E”. Among these configurations there are two balanced configurations. The two balanced configurations “C” and “D” are considered for acceleration in this blog. Configuration “C” is the balanced four coprocessor option with two coprocessors attached to each host processor, and configuration “D” has a single Xeon Phi attached to the each host processor. Table 1 gives more details of these configurations. The details of the two configurations are shown in Table 1 and the block diagram (Figure 1) below.
This blog shows the results of acceleration observed on the C4130 with Intel Xeon Phi 7120P in configuration “C” and “D”. (Click on images to enlarge.)
Table 1: Two Balanced C4130 Configurations C and D
Figure 1: PE C4130 Configuration Block Diagram
Table 2 gives more information about the hardware configuration used for the tests.
Table 2: Hardware Configuration
Figure 2: HPL Acceleration (FLOPS compared to CPU only) and Efficiency on the C4130 Configurations
Figure 2 illustrates the HPL performance on the PowerEdge C4130 Server. The Offload execution mode was used for all the runs. In this mode the application splits the workload where highly-parallel code is offloaded to the coprocessor, and the Xeon host processors primarily run serial code. Configuration C has 2 Phis connected to each CPU, and configuration D has single Phi connected to each CPU. ECC is enabled and the turbo mode is disabled across all the runs.
Intel Xeon Phi coprocessor provides more efficient performance for highly parallel applications like HPL. In the above graphs the CPU only performance is shown for reference. The compute efficiency for CPU-only configuration is 91.6% whereas Configuration C has a compute efficiency of 75.6% and configuration D has 81.2%. It is already known that the CPU-only configurations in general have higher efficiency when compared to CPU plus Phi configurations. Higher efficiency is observed in configuration D compared to C. Compared to the CPU-only configuration, the HPL acceleration for configuration C with 4 Xeon Phis is 5.3X and for configuration D with 2 Xeon Phis, it is 3.3X.
Figure 3: Total power and performance/watt on the C4130 configurations
Figure 3 shows the associated power consumption data of the HPL runs for CPU-only configuration, Configuration C and D. In general, accelerators can consume substantial power when loaded with compute-intensive workloads. The power consumption of CPU-only configurations is 520W whereas the power consumption increases for configurations C and D. Each Intel Xeon PHI 7120P co-processor can consume power up to 300 watts. The power consumption for configurations C and D is 3.3X and 2.1X respectively when compared to the CPU-only configuration.
The Intel Xeon Phi 7120P co-processor provides high performance, memory capacity and good performance-per-watt metrics. Configuration C shows a performance-per-watt of 2.44 GFLOPS/w and configuration D shows 2.34 GFLOPS/w whereas the CPU-only configuration gives 1.56 GFLOPS/w.