By Nishanth Dandapanthula and Garima Kochhar

In the last six months there have been a variety of new servers available in the market. These servers have several architectural differences as well as support for different amounts of memory, PCI-E slots, hard disks, and so on. All these models are good candidates for High Performance Computing clusters, but certain questions remain unanswered: Which server model is best suited for a specific application? What features of the architecture make it the ideal choice?

This study evaluates and provides recommendations for the performance of several HPC applications across three server models, the Dell PowerEdge M620, M420, and R820. These three servers are chosen to be representative of three system architectures based on the Intel Sandy Bridge EP, Sandy Bridge EN and Sandy Bridge EP 4 socket platforms. The configuration details of each platform’s test bed are shown in Table 1. A cluster-level, quantitative study is undertaken analyzing both performance and energy efficiency across the different server types. The applications evaluated in this study include High Performance Linpack (HPL), LU from the NAS Parallel Benchmark Suite, ANSYS Fluent, MILC, NAMD and WRF. The applications chosen are a mix of open source and commercial applications.

Table 1. Test bed details

Component

PowerEdge M420 Cluster

PowerEdge M620 Cluster

PowerEdge R820 Cluster

Server Configuration

PowerEdge M420 blade server (32) in a PowerEdge M1000e chassis

PowerEdge M620 blade server (16) in a PowerEdge M1000e chassis

PowerEdge R820 rack server (4)

Architecture

Sandy Bridge EN

Sandy Bridge EP

Sandy Bridge EP – 4S

Processor

Dual Intel Xeon

E5-2470 @ 2.3GHz

Dual Intel Xeon

E5-2680 @ 2.7GHz

Quad Intel Xeon

E5-4650 @ 2.7GHz

Memory

6 * 8GB @ 1600MT/s

8 * 8GB @ 1600 MT/s

16 * 8GB @ 1600 MT/s

Memory Configuration

1 DIMM Per Channel at 1600 MHz

 

InfiniBand

Mellanox ConnectX-3 FDR10

Two Mellanox M4001T FDR10  IO modules for the PowerEdge M1000e blade chassis

Mellanox ConnectX-3 FDR

Mellanox M4001F FDR IO module for the PowerEdge M1000e blade chassis

Mellanox ConnectX-3 FDR

Mellanox FDR rack switch SX6036

Cluster Size

32 Servers, 512 Cores

16 Servers, 256 Cores

4 Servers, 128 Cores

Disk

1*50GB SSD

1*146GB 15K SAS

1*146GB 15K SAS

Disk Controller

PERC H310

OFED

Mellanox OFED 1.5.3-3.0.0

OS

RHEL 6.2  - 2.6.32-220.el6.x86_64

From Table 1 it’s apparent that the three clusters are not identical in terms of number of servers, core speed, etc. Details of the test bed and explanation of the choices made are provided in the associated white paper available at this link.

A sample of the results is shown below. Figure 1 shows the performance comparison using high performance linpack (HPL), a popular computation intensive application. From the figure it is clear that the Dell PowerEdge M620 and the PowerEdge R820 perform similarly when comparing the same number of cores. This performance is attributed to the similar core frequency and memory frequency of these two configurations. HPL scales well and the interconnect is not a bottleneck at these core counts. This is apparent from the graph because the number of PowerEdge R820 servers needed to achieve a certain core count is half that of the PowerEdge M620 servers, but the performance of both clusters is similar.

The PowerEdge M420s perform consistently lower than the M620s by ~15 to 19 percent irrespective of core count. The difference in core frequency between the PowerEdge M420 (2.3 GHz) and the PowerEdge M620 (2.7 GHz) is 15 percent. The PowerEdge M420 also has a lower total memory configuration, and uses InfiniBand FDR10, which is slower than the InfiniBand FDR used in the PowerEdge M620s. This explains the consistent lower performance of the PowerEdge M420s.

Figure 1. HPL Performance

A similar performance comparison and analysis for the other applications that were studied can be found in the white paper at this link.

Table 2 summarizes the characteristics of each platform with respect to performance and energy efficiency. For the applications studied and within the limits of the test bed, it was found that the PowerEdge M620 provides the best performance for most cases; the PowerEdge R820 has the best overall memory density and the PowerEdge M420 has the best performance per watt and is the densest configuration of the three.

Table 2.  Characteristics of each platform

PowerEdge M620

@ 2.7 GHz

+Best performance

+Dense configuration (16 servers / 10 U)

+Scalable and Efficient

 

PowerEdge R820

@ 2.7 GHz

+Best energy efficiency for some cases

+Large memory capacity, can support up to 1.5 TB per server

+Large core counts in single server

+Up to 7 PCI-E slots per server for large-enterprise use cases

PowerEdge M420

@ 2.3 GHz

+Best energy efficiency or performance/Watt for most cases

+Most dense configuration (32 servers / 10 U)

*Energy efficiency is computed as performance obtained for each watt of power consumed.

 The details of the performance and energy efficiency of each application are available in the white paper. For more details and analysis, the white paper can be accessed here.

Additional resources on performance tuning for HPC:

  1. http://www.dellhpcsolutions.com/assets/pdfs/Optimal_BIOS_HPC_Dell_12G.v1.0.pdf
  2. http://en.community.dell.com/techcenter/high-performance-computing/b/hpc_storage_and_file_systems/archive/2012/10/09/unbalanced-memory-performance.aspx
  3. http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/11g-memory-selection-guidelines.pdf