Authors:  Saeed Iqbal and Shawn Gao

High Performance Linpack (HPL) is a commonly used reference benchmark for HPC systems. HPL stresses the compute and memory subsystems of the test systems and provides insights into the performance of these systems. Nowadays, General purpose Graphic Processor Units (GPUs) are widely used to accelerate such compute-intensive HPC applications across various disciplines in the HPC community.  Several research centers around the world are investigating GPUs for accelerating compute-intensive applications enabling faster research and discovery.  To compare various alternatives the HPL performance is of key importance.

GPUs are attached inside the servers to provide the extra compute horsepower required for application acceleration. Dell now offers a full-featured GPU solution based on the PowerEdge R720 servers (shown in Figure 1). Two of the latest Tesla M2090 GPUs can be added to each PowerEdge R720 server.  In this blog, we will present the performance and power results of a GPU-accelerated HPL on an 8-node PowerEdge R720 Cluster.

Figure 1: HPL performance and efficiency on an eight node cluster. Results are presented for different number of GPUs per node.

Figure 1 shows the performance of HPL on an eight node R720 cluster with different number of GPUs per node.  Compared to a CPU only configuration, an acceleration of 2X is obtained by using one GPU per node and an acceleration of 3.5X with two GPUs per node.  Figure 4 shows the power consumption results.  As shown, the power efficiency, i.e. the useful work delivered for every watt of power consumed, improves by adding GPUs.  With two M2090 GPUs the power efficiency is almost 1.5X compared to the CPU only configuration.

Figure 2: Total Power and Power Efficiency of the eight node cluster.

In conclusion, first, using GPUs can substantially accelerate HPL. As shown in Figure 1, using CPUs only, each compute node delivers about 250 GFLOPs of sustained performance, by adding GPUs the sustained performance improves to about 875 GFLOPS per node.  Second, using GPUs improves the performance/watt ratio as well. The power consumption due to GPUs increases but not as much as the corresponding performance improvement.  As show in Figure 2, a CPU only cluster consumes about 3000 Watts and operates at 0.72GFLOPS/Watt, adding GPUs the power consumption increases to about 6600 Watts but now operates at 1.07GFLOPS/Watt, which represents an increase of about 48% in performance/Watt. 

Configuration and Installation

Each PowerEdge R720 has a dual Intel Xeon E5-2600 series processor. Please note installing two NVIDIA Tesla M2090 GPUs requires the use of a GPU enablement kit, the x16 option on the 3rd riser, and dual, redundant 1100W power supplies, shown in Figure 3. The details of the hardware and software components are given below:

Figure 3: Two M2090 GPUs can be attached inside the R720 using a riser and associated power cables.