High Performance Computing - Blog - High Performance Computing - Dell Community

High Performance Computing - Blog

High Performance Computing Blogs

High Performance Computing

High Performance Computing
A discussion venue for all things high performance computing (HPC), supercomputing, and the technologies that enables scientific research and discovery.
  • Cambridge University's Wilkes Air-Cooled Supercomputer Among "Greenest"

    According to a story in HPCwire, the number two supercomputer on the Green500 list is a little different from all the other honorees - it's the only one that is air-cooled.  

    Cambridge University's Wilkes - as the supercomputer has been named - is based on 128 Dell T620 servers, which provide greater energy efficiencies.

    Capable of 240 teraflops LINPACK, Wilkes  is a part of Cambridge's "Square Kilometer Array (SKA) Open Architecture Lab," which is participating in a multinational collaboration to build the world's largest radio telescope.  As was mentioned in his SC13 video, it's his work with SKA that has Paul Calleja's son convinced dad is looking for aliens in outer space!

    You can learn more about Cambridge's air-cooled supercomputer at HPCwire.

  • Accelerating High Performance LINPACK (HPL) with Kepler K20X GPUs

    by Saeed Iqbal and Shawn Gao 

    The NVIDIA Tesla K20 GPU has proven performance and power efficiency across many HPC applications in the industry.  The K20 is based on the latest Kepler GK110 architecture by NVIDIA, and incorporates several innovative features and micro-architectural enhancements implemented in the Kepler design. Since the K20 release, NVIDIA has launched an upgrade to K20 called the K20X.  The K20X has a higher number of processing units and higher memory and memory bandwidth. This blog quantifies the performance and power efficiency improvements of K20X compared to K20. The information presented in this blog is beneficial in making an informed decision between the two powerful GPU options.

    High Performance LINPACK (HPL) is an industry standard compute intensive benchmark. HPL is traditionally used to stress the compute and memory subsystem.  Now, with the increasingly common use of GPUs, a GPU-enabled version of HPL is developed and maintained by NVIDIA.  The GPU-enabled version of HPL utilizes the traditional compute subsystem of CPUs and compute accelerator of GPUs. We used the Kepler GPU-enabled HPL version 2.0 for this study.

    We use the Dell PowerEdge R720 for the performance comparisons.  The PowerEdge R720 is a dual socket server and can have up to two internal GPUs installed. We keep the standard test configuration to be two GPU per server.  The PowerEdge R720 is a versatile full-featured server with a large memory capacity.    

    Hardware Configuration and Results

    The Server and GPU configuration details are compared in the tables below.

    Table 1:  Server Configuration



    PowerEdge R720



    Two Intel Xeon E5-2670 @ 2.6 GHz



    128GB ( 16x8G)  1600MHz  2 DPC



    NVIDIA Tesla K20 and K20X


    Number of GPUs installed






    Benchmark : GPU-accelerated HPL


    Version 2.1


    CUDA, Driver

    5.0, 304.54



    RHEL 6.4

     Table 2:  K20 and K20X: Relevant parameter comparison

    GPU Model



    Improvement (K20X)

    Number of cores




    Memory (VRAM)

    6 GB

    5 GB


    Memory bandwidth

    250 GB/s

    208 GB/s


    Peak Performance(SP)

    3.95 TFLOPS

    3.52 TFLOPS


    Peak  Performance(DP)

    1.31 TFLOPS

    1.17 TFLOPS






    Figure 1: HPL performance and efficiency on R720 for K20X and K20 GPUs. 

    Figure 1 illustrates the HPL performance on the PowerEdge R720. The CPU-only performance is shown for reference. Clearly, there is a performance improvement with K20X of about 11.2% on HPL GFLOPS, compared to K20. Compared to the CPU-only configuration the HPL acceleration with K20X GPUs is 7.7X. Similarly with the K20 GPUs, it is 6.9X.  In addition to improved performance, the compute efficiency on K20X is slightly better than K20. As shown in Figure 1, K20X has a compute efficiency of 82.6% and K20 an efficiency of 82.1%.  It is typical for CPU-only configurations to have higher efficiency than heterogeneous CPU+GPU configurations, as in Figure 1.  The CPU-only configuration is 94.6%, and the CPU+GPU configurations are in the lower 80s. 

    Figure 2: Total Power and Power Efficiency on PowerEdge R720 for K20 and K20X GPUs. 

    Figure 2 illustrates the total system power consumption of the different configurations of the PowerEdge R720 server.  The first thing to note from Figure 2 is that GPUs consume substantial power.  The CPU-only configuration power consumption is about 450W, which increases to above 800W when K20/K20X GPUs are installed in the server. This represents an increase of up to 80% in power consumption. This should be taken into account during the power budgeting of large installations and the power system supply.   However, once the power is delivered to the GPUs, they are much better than CPUs alone in converting the energy to useful work. This is clear from the improved performance per watt numbers shown in Figure 2.  The K20X shows a performance per watt of 2.79 GFLOPS/W, which is about 4X better than the CPU-only configuration.  Similarly, the K20 has 2.68 GFLOPS/W power efficiency, which is about 3.8X better than the CPU-only configuration.    It is interesting to note that K20X shows a 7% improvement over its predecessor K20.


    The K20X delivers about 11% higher performance and consumes 7% more power than the K20 for the HPL benchmark.   These results are in line with the expected increase when the theoretical parameters are compared.


  • Dell HPC Solution Refresh: Intel Xeon Ivy Bridge-EP, 1866 DDR3 memory and RHEL 6.4

    by Calvin Jacob and Ishan Singh

    Support for Intel Xeon Ivy Bridge-EP processors, 1866 DDR3 memory and RHEL 6.4 has been added to the current Dell HPC Solution. This solution is based on Bright Cluster Manager 6.1, with RHEL 6.4 as the base OS supported on Ivy Bridge processors. BCM is a complete HPC solution from Dell which can automate, deploy and manage an HPCC. Recommended BIOS settings for the supported platforms, along with BMC/iDRAC settings are scripted and made available to the user, if he chooses them. Dell system management tools are bundled with Bright Cluster Manager and used to set, configure and manage Dell hardware.

     The highlights of this release are additional support for:

    1.       Intel Xeon Ivy Bridge-EP (E5-26xx v2) processors.

    2.       Redhat Enterprise Linux 6.4 (kernel-2.6.32-358.el6.x86_64).

    3.       Mellanox OFED 2.0-3.

    4.       CUDA 5.5.

    5.       PEC Tools for systems management of PE-C servers.

    6.       Hardware Match Check by BCM. 

    Ivy Bridge-EP processors

    Support for Intel Xeon Ivy Bridge-EP (E5-26xx v2) processors has been added to the refreshed and existing servers: R620, R720, M620 and C8000 series and C6220II. Intel Ivy bridge-EP processors have Tri-Gate transistors which use 3-D (non-planar) architecture to package more transistors into less space. These processors have up to 12 cores, 30MB Last-Level Cache (LLC), DDR3 memory with speeds up to 1866 MHz, QPI speeds of 8 GT/s, up to 40 PCIe 3.0 lanes and a TDP up to 130W. The previous generation Intel Xeon Sandy Bridge processors used 32 nm technology, whereas the new Intel Xeon Ivy Bridge-EP are a 22 nm thus delivering higher density and more performance within the same power envelope. 

    Redhat Enterprise Linux 6.4 (kernel-2.6.32-358.el6.x86_64)

    RHEL 6.4 (kernel 2.6.32-358.el6.x86_64) is the minimum supported Operating System for Intel Ivy Bridge-EP processors. Some of the highlights of RHEL 6.4:

    1.       Updated Resource Management Capabilities.

    2.       New Tools and Improved Productivity Support.

    3.       Updated network drivers and fixes for supported Intel, Broadcom network adapters. 

    Mellanox OFED 2.0-3

    Support for Mellanox OFED 2.0-3 has been added. MLNX OFED 2.0-3 OFED includes drivers for mlx4 devices (ConnectX3 and ConnectX2) and mlx5 devices (ConnectIB). Devices that would be officially supported are ConnectX3 and ConnectX2 with signaling rates of 20, 40 and 56Gbps in the Infiniband mode. 

    PEC Tools for systems management of PE-C servers

    PEC Tools are the official tools for systems management on PowerEdge-C servers. These tools can be used for configuring BMC and BIOS parameters. The tools are included in the BCM 6.1 under the folder /opt/dell/pec.

    Examples of tool usage:

    /opt/dell/pec/setupbios setting save > filename (To save the BIOS settings)

    /opt/dell/pec/setupbios setting readfile filename (To read and apply settings from a saved file)

    /opt/dell/pec/setupbios setting set [setting] [value] (To set value for a particular option)  

    Hardware Match Check by BCM

    This tool is available with the Bright Cluster Manager. It is used to compare a large number of nodes that want to be added to a cluster. BCM provides this feature to check the consistency of the hardware across nodes. This is done by comparing the hardware profile of one of the nodes with the rest of the nodes in the same node group. The hardware match check can be automated, in case of any mismatch, the cluster admin is notified accordingly. The monitoring sweeping rate on how often the alerts should be visible can be adjusted.

  • America's Cup: Real Time HPC Aims To Build a Faster Boat

    It wasn't THAT long ago, that going to college clearly delivered an advantage to kids growing up and looking to enter the workplace. Today, it seems attending college has become a basic requirement to even be considered for many professions. You could argue the same is true in the use of technology in competition.

    Photo Credit: Emirates Team New Zealand website.

    Take America's Cup as an example. The America's Cup is a sailing race of yachts that features the world's best sailors, and the world's fastest ships. Below is the definition I pulled from Wikipedia: 

    The America's Cup, affectionately known as the "Auld Mug", is a trophy awarded to the winner of the America's Cup match races between two sailing yachts. One yacht, known as the defender, represents the yacht club that currently holds the America's Cup and the second yacht, known as the challenger, represents the yacht club that is challenging for the cup. The timing of each match is determined by an agreement between the defender and the challenger. The America's Cup is the oldest international sporting trophy.

    I can only imagine the level of skill, and training required to become a sailor on one of the two competing yachts. It's truly amazing to watch the sailors in America's Cup to respond with such strength and precision during the race. The other part of this competition is the challenge of building the fastest boat. Now this is where things have gotten interesting - and where high performance computing (HPC) is playing a large role. 

    Emirates Team New Zealand needed to design an entirely new boat, based on new multihull requirements of this year's America's Cup. Amazingly, with the help of a Dell HPC cluster, Emirates Team was able to perform 300-400 accurate computer test boat designs in their quest to build the fastest boat possible. This is in contrast to the 30-40 physical designs they were able to do for the 2007 competition! Simply amazing.

    The use of HPC clearly paid off early in the design phases, as the computer modeling allowed Emirates Team to develop a boat that could hydrofoil, providing an advantage by allowing the boat to lift out of the water while staying within the regulations of the competition. The competition later was also able to achieve hydrofoil, but later than Emirates Team, which allowed them to focus on other boat design models. 

    Photo Credit: ANSYS Website.

    The key tools used by Emirates Team to design the boat included ANSYS simulation software running on the Dell HPC Cluster, and Latitude laptops. This marked the first time they were able to rely completely on numerical analysis and digital prototyping, which the team believes has helped them create a boat design that was 30-40 percent faster than their original concepts. 

    In the end, Oracle Team USA was able to make the biggest comeback in history of America's Cup, winning an unbelievable eight straight races. This competition is where technology and design, meet the talent, experience, and strength of sailors. I would guess that the competition for next year's America's Cup is probably already underway. With the stakes so high, and the difference between winning and losing so slim, it's probably a good bet that HPC will play a larger role in the future.  

    Other news coverage:

    TechHive: The America's Cup: nerves, skill, and computer design

    Video: ANSYS CFD in Action: Emirates Team New Zealand Profile 

    HPC Wire: America's Cup Challenger Emirates Team New Zealand Transform Boat Design with Dell Solution

    Attached Case Study: Team New Zealand takes on the America’s Cup with game-changing technology (see bottom of blog post to download)

    If you haven't seen how exciting America's Cup can be to watch, I embedded a short video below.