by Ashish Kumar Singh

This blog explores performance analysis of WRF (Weather Research and Forecasting) model on a cluster of PowerEdge R730 servers with Intel Xeon Phi 7120Ps Coprocessors. All the runs were carried out with Hyper Threading (logical Processors) disabled.

The WRF (Weather Research and Forecasting) model is a next-generation mesoscale numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs. The model serves a wide range of metrological applications across scales from tens of meters to thousands of kilometers. WRF allows for atmospheric simulations based on real data (observations, analysis) or idealized conditions to be generated.

Test Cluster Configuration:

The test cluster consisted of four PowerEdge R730 servers with two Intel Xeon Phi 7120P co-processors each. Each PowerEdge R730 had two Intel Xeon E5-2695v3 @ 2.3GHz CPU and eight 16GB DIMMS of 2133MHz making it a total of 128GB of memory. Each PowerEdge R730 consisted of one Mellanox FDR Infiniband HCA card in the low-profile x8 PCIe Gen3 slot (Linked with CPU2).

               Compute node configuration


The BIOS options selected for this blog were as below:

WRF performance analysis was run for Conus-2.5km data. The Conus-2.5km data set was a single domain,the large size 2.5KM is equal to the continental US, which had the final 3hr simulation for hours 3-6, starting from a provided restart file. It may also be performed for the full 6hrs starting from a cold start.


All the runs on CPU with Intel Xeon Phi configuration were performed in symmetric mode. For single node CPUs-only configuration, the average time was 7.425 seconds. However on CPUs and two Intel Xeon Phi configurations, the average time taken was 6.093 seconds, which showed improvement of 1.2 times. With a two node cluster of CPUs and Intel Xeon Phi, the average time was 2.309 seconds, an improvement of 3.2 times. For a four node cluster of CPUs and Intel Xeon Phi configuration, a performance improvement was increased to 5.7 times.

The power consumption analysis for WRF with Conus-2.5KM benchmark is shown below. On single node, with CPU only configuration, the power consumption was 395.4 watts. On CPUs with one Intel Xeon Phi configuration, power consumption was at 526.3 watts, while on CPUs with two Intel Xeon Phi configuration, the power consumption was 688.2 watts. 

Results showed power consumption increase in addition of Intel Xeon Phi. However, results also showed increase in performance per watt to the order of 2.6 times on a CPUs with two Intel Xeon Phi configuration.

Conclusion:

The configuration of CPUs with Intel Xeon Phi 7120P showed sustained performance and power-efficiency gains in comparison to CPUs-only configuration. With two Intel Xeon Phi 7120Ps WRF with Conus-2.5KM benchmark showed 1.2 fold increase and performance per watt improved by more than 2.6 times too, resulting in a powerful, easy-to-use and energy efficient HPC platform.