This blog explores the HPL (High Performance LINPACK) performance and power analysis on Intel Xeon Phi 7120P cluster with current generation PowerEdge R730 servers. All the runs were carried out with Hyper Threading (logical Processors) disabled.
Test Cluster Configuration:
The test cluster consisted of four PowerEdge R730 servers with two Intel Xeon Phi 7120P co-processors each. Each PowerEdge R730 had two Intel Xeon E5-2695v3 @ 2.3GHz CPU and eight 16GB DIMMS of 2133MHz making it a total of 128GB of memory. Each PowerEdge R730 consisted of one Mellanox FDR Infiniband HCA card in the low-profile x8 PCIe Gen3 slot (Linked with CPU2).
Compute node configuration
The BIOS options selected for this blog were as below:
High Performance LINPACK is a benchmark that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed memory systems. HPL performed with block size of NB=192 for CPU only and NB=1280 for Intel Xeon Phi (offload) with different problem sizes of N=118272 (NB=1280) for single node N=172032 (NB=1280) for two node and N=215040 (NB=1280) for four node cluster runs.
Compared to the Intel CPU only configuration, the acceleration was about 3 times with Intel Xeon Phi 7120Ps.
On a single node, with CPUs only, the PowerEdge R730 achieved 802.09 GFOLPS, while with two 7120Ps it was 2.553 TFLOPS. So the 7120P provides 3.26X performance increase. Similarly, two node and four node demonstrated performance increase of 3.25X.
The HPL power consumption analysis is shown among CPU only, CPU with one Intel Xeon Phi and CPU with two Intel Xeon Phi.
The power consumption of single node CPUs-only was about 398.72 watts. With two 7120Ps and CPUs, it was increased to 983.5 watts. It showed the power consumption of the CPUs-only configuration was lower than system with Intel Xeon Phi. while the performance per watt for the configurations with Intel Xeon Phi was 1.31 times of CPUs-only configuration.
The Intel Xeon Phi 7120P showed sustained performance and power-efficiency gains in comparison to CPUs only. With two Intel Xeon Phi 7120Ps, HPL benchmark showed three fold performance increase in comparison to CPUs only and the performance per watt was improved by more than one fold, resulting in a powerful and energy efficient HPC platform.