by Saeed Iqbal and Deepthi Cherlopalle
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD can benefit greatly from the Intel Xeon Phi family of coprocessors. Intel’s MIC is based on the x86- technology and it runs on Linux OS. Among the several coprocessors in this family, the 7120P with 61 cores has one of the highest performance ratings of 1.22 TFLOPS. The 7120P also has the highest memory per card of 16 GB and maximum memory bandwidth of 352 GB/s.
In this blog we evaluate NAMD performance with 7120P coprocessors on PowerEdge C4130 Server. Three proteins ApoA1, F1ATPase and STMV are used in our study due to their relatively large problem size. ApoA1 is high density lipoprotein found in plasma, which helps extraction of cholesterol from tissues to liver. F1ATPase is one of the most abundant, responsible for the synthesizing of the molecule adenosine tri-phosphate. STMV stands for satellite tobacco mosaic which worsens the symptoms of infections by tobacco mosaic virus. The performance measure is “days/ns,” that shows the number of days required to simulate 1 nanosecond of real-time. Table 1 shows the problem size for different systems. (Click on all images to enlarge)
NAMD tests are performed on Dell PowerEdge C4130, which can accommodate up to four 7120P Phis in a 1U form factor. The PowerEdge C4130 also offers a configurable system design (“A” through “E”), potentially making it a better fit for MD codes in general. The three configurations “C” (a four Phi configuration), “D” (a two Phi configuration) and “E” (a two Phi configuration) are compared. Part of the goal is to see performance variations with different configurations for NAMD. Table 2 gives more information about the hardware configuration and application details used for the tests.
Figure 1 illustrates the performance of NAMD on the PowerEdge C4130 Server. NAMD can benefit performance boost by adding coprocessors to the server. Configuration C has 2 CPU’s with 4 coprocessors, 2 coprocessors are connected to each CPU. Configuration D has 2 CPU’s with 2 coprocessors; each coprocessor is connected to a single CPU whereas Configuration E has 2 coprocessors connected to 1 CPU. In all these configurations PHIs are connected directly to the CPU. Configuration C and D are the most balanced configurations with equal distribution of phi to CPU. The CPU-only configuration is just shown for reference. ECC and TURBO mode are disabled for all the coprocessors across all the runs.
The F1ATPase and STMV show big performance advantage with PHI. APOA1 shows no significant performance gain when compared to F1ATPASE and STMV because of the small dataset. Results show an additional gain of 2.6X for F1ATPase dataset and 2.5X for STMV dataset with C configuration. For Configuration D an additional gain of 2.4X for F1ATPase dataset and 2.3X for STMV dataset is observed. For configuration E which has only 1 CPU does not show much performance gain because NAMD is CPU-sensitive. Results show that using two additional coprocessors in configuration E does not boost up the performance significantly.
As shown in Figure 2, the power consumption for C configuration is about 2.4X whereas for Configuration D it is 1.7X. Power consumption is higher in configuration C when compared to Configuration D because of two additional coprocessors in the server. Power consumption for Configuration E is 1.4X which is low when compared to Configuration D this is because “E” does not have an additional CPU. Configuration “C” and “D” offer good performance per watt.