By Nishanth Dandapanthula and Garima Kochhar
In the last six months there have been a variety of new servers available in the market. These servers have several architectural differences as well as support for different amounts of memory, PCI-E slots, hard disks, and so on. All these models are good candidates for High Performance Computing clusters, but certain questions remain unanswered: Which server model is best suited for a specific application? What features of the architecture make it the ideal choice?
This study evaluates and provides recommendations for the performance of several HPC applications across three server models, the Dell PowerEdge M620, M420, and R820. These three servers are chosen to be representative of three system architectures based on the Intel Sandy Bridge EP, Sandy Bridge EN and Sandy Bridge EP 4 socket platforms. The configuration details of each platform’s test bed are shown in Table 1. A cluster-level, quantitative study is undertaken analyzing both performance and energy efficiency across the different server types. The applications evaluated in this study include High Performance Linpack (HPL), LU from the NAS Parallel Benchmark Suite, ANSYS Fluent, MILC, NAMD and WRF. The applications chosen are a mix of open source and commercial applications.
Table 1. Test bed details
PowerEdge M420 Cluster
PowerEdge M620 Cluster
PowerEdge R820 Cluster
PowerEdge M420 blade server (32) in a PowerEdge M1000e chassis
PowerEdge M620 blade server (16) in a PowerEdge M1000e chassis
PowerEdge R820 rack server (4)
Sandy Bridge EN
Sandy Bridge EP
Sandy Bridge EP – 4S
Dual Intel Xeon
E5-2470 @ 2.3GHz
E5-2680 @ 2.7GHz
Quad Intel Xeon
E5-4650 @ 2.7GHz
6 * 8GB @ 1600MT/s
8 * 8GB @ 1600 MT/s
16 * 8GB @ 1600 MT/s
1 DIMM Per Channel at 1600 MHz
Mellanox ConnectX-3 FDR10
Two Mellanox M4001T FDR10 IO modules for the PowerEdge M1000e blade chassis
Mellanox ConnectX-3 FDR
Mellanox M4001F FDR IO module for the PowerEdge M1000e blade chassis
Mellanox FDR rack switch SX6036
32 Servers, 512 Cores
16 Servers, 256 Cores
4 Servers, 128 Cores
1*146GB 15K SAS
Mellanox OFED 1.5.3-3.0.0
RHEL 6.2 - 2.6.32-220.el6.x86_64
From Table 1 it’s apparent that the three clusters are not identical in terms of number of servers, core speed, etc. Details of the test bed and explanation of the choices made are provided in the associated white paper available at this link.
A sample of the results is shown below. Figure 1 shows the performance comparison using high performance linpack (HPL), a popular computation intensive application. From the figure it is clear that the Dell PowerEdge M620 and the PowerEdge R820 perform similarly when comparing the same number of cores. This performance is attributed to the similar core frequency and memory frequency of these two configurations. HPL scales well and the interconnect is not a bottleneck at these core counts. This is apparent from the graph because the number of PowerEdge R820 servers needed to achieve a certain core count is half that of the PowerEdge M620 servers, but the performance of both clusters is similar.
The PowerEdge M420s perform consistently lower than the M620s by ~15 to 19 percent irrespective of core count. The difference in core frequency between the PowerEdge M420 (2.3 GHz) and the PowerEdge M620 (2.7 GHz) is 15 percent. The PowerEdge M420 also has a lower total memory configuration, and uses InfiniBand FDR10, which is slower than the InfiniBand FDR used in the PowerEdge M620s. This explains the consistent lower performance of the PowerEdge M420s.
Figure 1. HPL Performance
A similar performance comparison and analysis for the other applications that were studied can be found in the white paper at this link.
Table 2 summarizes the characteristics of each platform with respect to performance and energy efficiency. For the applications studied and within the limits of the test bed, it was found that the PowerEdge M620 provides the best performance for most cases; the PowerEdge R820 has the best overall memory density and the PowerEdge M420 has the best performance per watt and is the densest configuration of the three.
Table 2. Characteristics of each platform
@ 2.7 GHz
+Dense configuration (16 servers / 10 U)
+Scalable and Efficient
+Best energy efficiency for some cases
+Large memory capacity, can support up to 1.5 TB per server
+Large core counts in single server
+Up to 7 PCI-E slots per server for large-enterprise use cases
@ 2.3 GHz
+Best energy efficiency or performance/Watt for most cases
+Most dense configuration (32 servers / 10 U)
*Energy efficiency is computed as performance obtained for each watt of power consumed.
The details of the performance and energy efficiency of each application are available in the white paper. For more details and analysis, the white paper can be accessed here.
Additional resources on performance tuning for HPC: