by Shawn Gao and Saeed Iqbal
General purpose Graphic Processor Units (GPUs) have proven suitable for accelerating compute-intensive applications e.g., Computational Fluid Dynamics (CFD), Molecular Dynamics (MD), Quantum Chemistry (QC), Computational Finance (CF) and Oil & Gas applications, etc. However, among the available areas, Molecular Dynamics (MD) and N-body Simulation have benefited tremendously due to GPU acceleration; and equally important, there are freely available sophisticated GPU-enabled simulators. NAMD and NBODY are such GPU-accelerated simulators. More detailed information about NAMD, NBODY and GPUs can be found in the reference section [1, 2].
In this blog we evaluate improved NAMD and NBODY performance due to GPU accelerated compute nodes. The proteins STMV, which consist of 1066K atoms, is chosen due to its relatively large problem size. The performance measure is “days/ns”, that shows the number of days required to simulate 1 nanosecond of real-time.
Figure 1: Relative performance of NAMD benchmarks on single-node R720.
Figure 1 shows the relative performance of the NAMD benchmarks on the single-node R720. STMV is accelerated about 3.4X on Tesla K40 and 2.9X on Tesla K20. Figure 2 shows the additional power required for GPUs; there is about 1.5X increase in total power consumption. From the power efficiency point of view running STMV with dual internal GPUs is beneficial as the performance gain is 3.4X for an additional 1.5X power with K40. We also observe that the problem size is a key factor in determining how much a particular simulation gets accelerated. For more detailed analysis, please refer to our previous studies [2, 3].
Figure 2: Relative Power Consumption of NAMD benchmarks on single-node R720.
NBODY simulation comes from CUDA Sample Pack. To eliminate software overhead, we ran the benchmark multiple times and chose the number of bodies equal to 1 million. Figure 3 shows the average performance improvement is about 20% from K20 to K40 in both cases (1 GPU per node or 2 GPUs per node). Also, dual GPUs configuration can significantly increase total performance up to 4X.
Figure 3: Relative Performance of NBODY benchmarks (double precision) on single-node R720.
PowerEdge R720 has dual Intel Xeon E5-2600 series processors. The details of the hardware and software components are given below:
Number of Compute Nodes
Compute Node processor
Two Intel @ 2.7 GHz, 95W (Xeon E5-2697 v2)
128 GB 1600 MHz
NVIDIA Tesla K20/K40
Number of GPUs
Number of cores
Peak Performance(SP): Single Precision
Peak Performance(DP): Double Precision