By Garima Kochhar and Kihoon Yoon. Dell EMC HPC Innovation Lab. October 2016
This blog presents performance results for the 2D alignment and 2D classification phases of the Cryo-electron microscopy (Cryo-EM) data processing workflow using the new Intel Knights Landing architecture, and compares these results to the performance of the Intel Xeon E5-2600 v4 family. A quick description of Cryo-EM and the different phases in the process of reconstructing 3D molecular structures with electron microscopy is provided below, followed by the specific tests conducted in this study and the performance results.
Cryo-EM allows molecular samples to be studied in near-native states and down to nearly atomic resolutions. Studying the 3D structure of these biological specimens can lead to new insights into their functioning and interactions, especially with proteins and nucleic acids, and allows structural biologists to examine how alterations in their structures affect their functions. This information can be used in system biology research to understand the cell signaling network which is part of a complex communication system. This communication system controls fundamental cell activities and actions to maintain normal cell homeostasis. Errors in the cellular signaling process can lead to diseases such as cancer, autoimmune disorders, and diabetes. Studying the functioning of the proteins responsible for an illness enables a biologist to develop specific drugs that can interact with the protein effectively, thus improving the efficacy of treatment.
The workflow from the time a molecular sample is created to the creation of a 3D model of its molecular structure involves multiple steps. These steps are briefly (and simplistically!) described below.
As is now clear, the Cryo-EM processing workflow must comprehend a lot of data, requires rich compute algorithms and considerable compute power for the 2D and 3D phases, and must move data efficiently across the multiple phases in the workflow. Our goal is to design a complete HPC system that can support the Cryo-EM workflow from start to finish and is optimized for performance, energy efficiency and data efficiency.
Performance Tests and Configuration
Focusing for now on the 2D phases of the workflow, this blog presents results for the steps #7 and #8 listed above - the 2D alignment and 2D classification phases. Two software packages in this domain, ROME and RELION were benchmarked on the Knights Landing (KNL, code name for the Intel Xeon Phi 7200 family) and Broadwell (BDW, code name for Intel Xeon E5-2600 v4 family) processors.
The tests were run on systems with the following configuration.
12 * Dell PowerEdge C6320
Intel Xeon E5-2697 v4. 18 cores per socket, 2.3 GHz
128 GB at 2400 MT/s
Intel Omni-Path fabric
12 * Dell PowerEdge C6320p
Intel Xeon Phi 7230. 64 cores, 1.3 GHz
96 GB at 2400 MT/s
Red Hat Enterprise Linux 7.2
Intel 2017, 17.0.0.098 Build 20160721
Intel MPI 5.1.3
Set1. Inflammasome data: 16306 images of NLRC4/NAIP2 inflammasome with a size of 2502 pixels
Set4. RP-a: 57001 images of proteasome regulatory particles (RP) with a size of 1602 pixels
Set2. RP-b: 35407 images of proteasome regulatory particles (RP) with a size of 1602 pixels
ROME performs the 2D alignment (step #7 above) and the 2D classification (step #8 above) in two separate phases called the MAP phase and the SML phase respectively. For our tests we used “-k” for MAP equal to 50 (i.e. 50 initial classes) and “-k” for SML equal to 1000 (i.e. 1000 final 2D classes).
The first set of graphs below, Figure 1 and Figure 2 show the performance of the SML phase on KNL. The compute portion of the SML phase scales linearly as more KNL systems are added into the test bed, from 1 to 12 servers as shown in Figure 1. The total time to run shown in Figure 2 is slightly lower than linear, and includes an I/O component as well as the compute component. The test bed used in this study did not have a parallel file system and used just local disks on the KNL servers. Future work for this project includes evaluating the impact of adding a Lustre parallel file system to this test bed and its effect on total time for SML.
Figure 1 - ROME SML scaling on KNL, compute time
Figure 2 - ROME SML scaling on KNL, total time
The next set of graphs compare the ROME SML performance on KNL and Broadwell. Figure 3, Figure 4 and Figure 5 plot the compute time for SML on 1 to 12 servers. The black circle on the graph shows the improvement in KNL runtime when compared to BDW. For all three datasets that were benchmarked, KNL is about 3x faster than BDW. Note we’re comparing one single-socket KNL server to a dual-socket Broadwell server, so this is a server to server comparison (not socket to socket). KNL is 3x faster than BDW across different numbers of servers, showing that ROME SML scales well on Omni-Path on both KNL and BDW, but the absolute compute time on KNL is 3x faster irrespective of the number of servers in test.
Considering total time to run on KNL versus BDW, we measured KNL to be 2.4x to 3.3x faster than BDW at all node counts. Specifically, DATA6 is ~ 2.4x faster on KNL, DATA8 is 3x faster on KNL and RING11_ALL is 3.4x faster on KNL when considering total time to run. As mentioned before, the total time includes an I/O component and one of the next step in this study is to evaluate the performance improvement if adding a parallel file system to the test bed.
Figure 3 - DATA8 ROME SML on KNL and BDW
Figure 4 - DATA6 ROME SML on KNL and BDW.
Figure 5 - RING11_ALL ROME SML on KNL and BDW
RELION accomplishes the 2D alignment and classification steps mentioned above in one phase. Figure 6 shows our preliminary results on RELION on KNL across 12 servers and on two of the test datasets. The “--K” parameter for RELION was set to 300, i.e., 300 classes for 2D classification. There are several things to be still tried here – the impact of a parallel file system on RELION (as we discussed for ROME earlier) and dataset sensitivity to the parallel file system. Additionally we plan to benchmark RELION on Broadwell, across different node counts and with different input parameters.
Figure 6 - RELION 2D alignment and classification on KNL
The next steps in this project include adding a parallel file system to measure the impact on the workflow, tuning the test parameters for ROME MAP, SML and RELION, and testing on more datasets. We also plan to measure the power consumption of the cluster when running Cryo-EM workloads to analyze performance per watt and performance per dollar metrics for KNL.