Dell Community

Blog Group Posts
Application Performance Monitoring Blog Foglight APM 105
Blueprint for HPC - Blog Blueprint for High Performance Computing 0
CommAutoTestGroup - Blog CommAutoTestGroup 1
Custom Solutions Engineering Blog Custom Solutions Engineering 6
Data Security Data Security 8
Dell Big Data - Blog Dell Big Data 68
Dell Cloud Blog Cloud 42
Dell Cloud OpenStack Solutions - Blog Dell Cloud OpenStack Solutions 0
Dell Lifecycle Controller Integration for SCVMM - Blog Dell Lifecycle Controller Integration for SCVMM 0
Dell Premier - Blog Dell Premier 3
Dell TechCenter TechCenter 1,853
Desktop Authority Desktop Authority 25
Featured Content - Blog Featured Content 0
Foglight for Databases Foglight for Databases 35
Foglight for Virtualization and Storage Management Virtualization Infrastructure Management 256
General HPC High Performance Computing 226
High Performance Computing - Blog High Performance Computing 35
Hotfixes vWorkspace 57
HPC Community Blogs High Performance Computing 27
HPC GPU Computing High Performance Computing 18
HPC Power and Cooling High Performance Computing 4
HPC Storage and File Systems High Performance Computing 21
Information Management Welcome to the Dell Software Information Management blog! Our top experts discuss big data, predictive analytics, database management, data replication, and more. Information Management 229
KACE Blog KACE 143
Life Sciences High Performance Computing 6
OMIMSSC - Blogs OMIMSSC 0
On Demand Services Dell On-Demand 3
Open Networking: The Whale that swallowed SDN TechCenter 0
Product Releases vWorkspace 13
Security - Blog Security 3
SharePoint for All SharePoint for All 388
Statistica Statistica 24
Systems Developed by and for Developers Dell Big Data 1
TechCenter News TechCenter Extras 47
The NFV Cloud Community Blog The NFV Cloud Community 0
Thought Leadership Service Provider Solutions 0
vWorkspace - Blog vWorkspace 510
Windows 10 IoT Enterprise (WIE10) - Blog Wyse Thin Clients running Windows 10 IoT Enterprise Windows 10 IoT Enterprise (WIE10) 3
Latest Blog Posts
  • Dell TechCenter

    iSM 2.1 for Ubuntu and Debian

    The Dell OpenManage team has published iDRAC Service Module version 2.1 for Ubuntu and Debian. Builds are available for Ubuntu 12.04 and 14.04 as well as Debian Wheezy and Jessie. It is available via the apt repositories at <http://linux.dell.com/repo/community/ubuntu/>. Please refer to the whitepaper at <http://en.community.dell.com/techcenter/extras/m/white_papers/20441098> for details.

  • General HPC

    WRF benchmarking on 4 nodes cluster with Intel Xeon Phi 7120P Coprocessors

    by Ashish Kumar Singh

    This blog explores performance analysis of WRF (Weather Research and Forecasting) model on a cluster of PowerEdge R730 servers with Intel Xeon Phi 7120Ps Coprocessors. All the runs were carried out with Hyper Threading (logical Processors) disabled.

    The WRF (Weather Research and Forecasting) model is a next-generation mesoscale numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs. The model serves a wide range of metrological applications across scales from tens of meters to thousands of kilometers. WRF allows for atmospheric simulations based on real data (observations, analysis) or idealized conditions to be generated.

    Test Cluster Configuration:

    The test cluster consisted of four PowerEdge R730 servers with two Intel Xeon Phi 7120P co-processors each. Each PowerEdge R730 had two Intel Xeon E5-2695v3 @ 2.3GHz CPU and eight 16GB DIMMS of 2133MHz making it a total of 128GB of memory. Each PowerEdge R730 consisted of one Mellanox FDR Infiniband HCA card in the low-profile x8 PCIe Gen3 slot (Linked with CPU2).

                   Compute node configuration


    The BIOS options selected for this blog were as below:

    WRF performance analysis was run for Conus-2.5km data. The Conus-2.5km data set was a single domain,the large size 2.5KM is equal to the continental US, which had the final 3hr simulation for hours 3-6, starting from a provided restart file. It may also be performed for the full 6hrs starting from a cold start.


    All the runs on CPU with Intel Xeon Phi configuration were performed in symmetric mode. For single node CPUs-only configuration, the average time was 7.425 seconds. However on CPUs and two Intel Xeon Phi configurations, the average time taken was 6.093 seconds, which showed improvement of 1.2 times. With a two node cluster of CPUs and Intel Xeon Phi, the average time was 2.309 seconds, an improvement of 3.2 times. For a four node cluster of CPUs and Intel Xeon Phi configuration, a performance improvement was increased to 5.7 times.

    The power consumption analysis for WRF with Conus-2.5KM benchmark is shown below. On single node, with CPU only configuration, the power consumption was 395.4 watts. On CPUs with one Intel Xeon Phi configuration, power consumption was at 526.3 watts, while on CPUs with two Intel Xeon Phi configuration, the power consumption was 688.2 watts. 

    Results showed power consumption increase in addition of Intel Xeon Phi. However, results also showed increase in performance per watt to the order of 2.6 times on a CPUs with two Intel Xeon Phi configuration.

    Conclusion:

    The configuration of CPUs with Intel Xeon Phi 7120P showed sustained performance and power-efficiency gains in comparison to CPUs-only configuration. With two Intel Xeon Phi 7120Ps WRF with Conus-2.5KM benchmark showed 1.2 fold increase and performance per watt improved by more than 2.6 times too, resulting in a powerful, easy-to-use and energy efficient HPC platform. 

     

  • General HPC

    NAMD benchmarking on 4 nodes cluster with Intel Xeon Phi 7120P Coprocessors

    by Ashish Kumar Singh

    This blog explores the application performance analysis of NAMD (NAnoscale Molecular Dynamics) for large data sets on cluster of PowerEdge R730 servers with Intel Xeon Phi 7120Ps. All the runs were carried out with Hyper Threading (logical processors) disabled. IB verbs version of NAMD was used for all the runs. 

    Test Cluster Configuration:

    The test cluster consisted of four PowerEdge R730 servers with two Intel Xeon Phi 7120P co-processors each. Each PowerEdge R730 had two Intel Xeon E5-2695v3 @ 2.3GHz CPU and eight 16GB DIMMS of 2133MHz making it a total of 128GB of memory per server. Each PowerEdge R730 consisted of one Mellanox FDR Infiniband HCA card in the low-profile x8 PCIe Gen3 slot (Linked with CPU2).

                     Compute node configuration

    The BIOS options selected for this blog are as below:

    NAMD (NAnoscale Molecular Dynamics) is a parallel, object-oriented simulation package written using the Charm++ parallel programming model, designed for high performance simulation of large bimolecular systems. Charm++ is developed with simplified parallel programming and also provides automatic load balancing, which is crucial to the performance of NAMD.

    All the runs with STMV (virus) benchmark were run with ibverbs version of NAMD. The performance analysis with STMV benchmark shown below. STMV (Satellite Tobacco Mosaic Virus) is a small, icosahedral plant virus. On single node, we observed performance improvement of 2.5 times on CPUs with Intel Xeon Phi configuration in comparison to CPUs-only configuration. 

     

    STMV showed performance of 0.2ns/day with CPUs-only configuration. With CPUs and two Intel Xeon Phi performance was 0.5ns/day, which showed performance increase of 2.5 times. While on a four node cluster with the CPUs and Intel Xeon Phi 7120P performance increase was 8.5 times. Scaling from one node to four node resulted in almost 3.5 times scale-up.

    The Power analysis was done for single node among CPUs-only configuration, CPUs with one Intel Xeon Phi 7120P configuration and CPUs with two Intel Xeon Phi 7120P configuration. With CPUs and two Intel Xeon Phi configuration, the power consumption increased along with the performance per watt, which was 2.4 times in comparison to CPU-only configuration. The power efficiency increase showed in below picture. 

    Conclusion:

    With CPUs and two Intel Xeon Phi 7120Ps, the STMV benchmark demonstrated increase of 2.5 times in performance and 2.4 times in power efficiency when compared to CPUs-only configuration, resulting in a powerful and energy efficient HPC platform. 

     

  • General HPC

    LINPACK benchmarking on a 4 nodes cluster with Intel Xeon Phi 7120P Coprocessors


    This blog explores the HPL (High Performance LINPACK) performance and power analysis on Intel Xeon Phi 7120P cluster with current generation PowerEdge R730 servers. All the runs were carried out with Hyper Threading (logical Processors) disabled.

    Test Cluster Configuration:

    The test cluster consisted of four PowerEdge R730 servers with two Intel Xeon Phi 7120P co-processors each. Each PowerEdge R730 had two Intel Xeon E5-2695v3 @ 2.3GHz CPU and eight 16GB DIMMS of 2133MHz making it a total of 128GB of memory. Each PowerEdge R730 consisted of one Mellanox FDR Infiniband HCA card in the low-profile x8 PCIe Gen3 slot (Linked with CPU2).

                                       Compute node configuration

    The BIOS options selected for this blog were as below:

    High Performance LINPACK is a benchmark that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed memory systems. HPL performed with block size of NB=192 for CPU only and NB=1280 for Intel Xeon Phi (offload) with different problem sizes of N=118272 (NB=1280) for single node N=172032 (NB=1280) for two node and N=215040 (NB=1280) for four node cluster runs.

    Compared to the Intel CPU only configuration, the acceleration was about 3 times with Intel Xeon Phi 7120Ps.

    On a single node, with CPUs only, the PowerEdge R730 achieved 802.09 GFOLPS, while with two 7120Ps it was 2.553 TFLOPS. So the 7120P provides 3.26X performance increase. Similarly, two node and four node demonstrated performance increase of 3.25X.

    The HPL power consumption analysis is shown among CPU only, CPU with one Intel Xeon Phi and CPU with two Intel Xeon Phi. 

    The power consumption of single node CPUs-only was about 398.72 watts. With two 7120Ps and CPUs, it was increased to 983.5 watts. It showed the power consumption of the CPUs-only configuration was lower than system with Intel Xeon Phi. while the performance per watt for the configurations with Intel Xeon Phi was 1.31 times of CPUs-only configuration.

    Conclusion:

    The Intel Xeon Phi 7120P showed sustained performance and power-efficiency gains in comparison to CPUs only. With two Intel Xeon Phi 7120Ps, HPL benchmark showed three fold performance increase in comparison to CPUs only and the performance per watt was improved by more than one fold, resulting in a powerful and energy efficient HPC platform.

  • General HPC

    LAMMPS benchmarking on 4 nodes cluster with Intel Xeon Phi 7120P Coprocessors

    by Ashish Kumar Singh

    This blog explores the application performance analysis of LAMMPS on a cluster of PowerEdge R730 servers with Intel Xeon Phi 7120Ps. All the runs were carried out with Hyper Threading (logical processors) disabled.

    LAMMPS (Large Scale Atomic/Molecular Massively Parallel Simulator) is a classical molecular dynamics code, capable of doing simulation for solid-state materials (metals, semi-conductors), soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or more generically as a parallel particle simulator at the atomic, meso or continuum scale.

    Test Cluster Configuration:

    The test cluster consisted of four PowerEdge R730 servers with two Intel Xeon Phi 7120P co-processors each. Each PowerEdge R730 had two Intel Xeon E5-2695v3 @ 2.3GHz CPU and eight 16GB DIMMS of 2133MHz making it a total of 128GB of memory. Each PowerEdge R730 consisted of one Mellanox FDR Infiniband HCA card in the low-profile x8 PCIe Gen3 slot (Linked with CPU2).

                            Compute node configuration

    The BIOS options selected for this blog were as below:


    LAMMPS was run for Rhodopsin benchmark. Rhodopsin benchmark simulates the movement of protein in the retina which in turn plays an important role in the perception of light. The protein is solvated lipid bilayer using the CHARMM force field with particle-particle particle-mesh long-range electrostatics and SHAKE constraints. The simulation was performed with 2,048,000 atoms at the temperature of 300K and pressure of 1 atm. The results for single node, two nodes and four nodes are as shown below. On one node with CPU only configuration, the loop-time was 66.5 seconds, while configuration of CPUs and two Intel Xeon Phi 7120Ps had a loop-time of 34.8 seconds. This demonstrated a performance increase of 1.9X. In comparison to CPUs only, CPUs + co-processors from one node to four nodes showed performance increase of 5.2X.

    The LAMMPS power consumption analysis with RHODOPSIN benchmark is shown below. On single node, the power consumption by a CPU-only configuration was 442.4 watts, while configuration with CPUs and one co-processor consumed around 423W and subsequently configuration with CPUs and two co-processors consumed 450.8W.



    All the LAMMPS runs on co-processors used the auto-balance mode. The performance per watt demonstrated 2 fold increase with CPUs + 2 co-processors than CPUs only.

    Conclusion:

    The Intel Xeon Phi 7120Ps cluster with Dell PowerEdge R730 showed sustained performance increase of two fold. The power-efficiency was increased by 2X with two Intel Xeon Phi 7120Ps in comparison to CPUs only, resulting in a powerful, energy-efficient HPC platform.   

     

  • General HPC

    The Student Cluster Competition - An Integral Part of SC15

    One of the highlights of the Supercomputing Conference is always  the Student Cluster Competition. It's an opportunity to see some of the future superstars of our industry demonstrate their skill under some pretty intense (but fun) pressure!

    The student cluster competition also provides the competitors with a variety of opportunities. For some undergrads it is their first chance to focus exclusively on HPC. For other students the competition affords them the opportunity to make important networking and mentoring connections.

    College teams from around the world are already gearing up for the Student Cluster Competition at SC 15 this November in Austin. The hometown favorites are no doubt feeling the pressure to fourpeat! It's an exciting time for everyone involved - especially the students.

    Threepeat winners from SC14 in New Orleans.

    You can learn more about the Student Cluster Competition and its positive impact on the lives of participants in this video.