High Performance Computing Blogs

High Performance Computing

High Performance Computing
A discussion venue for all things high performance computing (HPC), supercomputing, and the technologies that enables scientific research and discovery.
  • Kevin Shinpaugh of Virginia Tech Shares his Thoughts on Why HPC Matters

    Kevin Shinpaugh, Director of Information Technology and Computing Services at Virginia Tech, shares his thoughts on why HPC matters and other issues facing the industry.

  • William Edsall of Dow Chemical Provides His Thoughts on Why HPC Matters

    Up next providing his thoughts on why HPC matters and other issues facing the industry is William Edsall, HPC Lead Analyst at Dow Chemical.

  • Clemson's Boyd Wilson Offers His Thoughts on Why HPC Matters

    Boyd Wilson, software CTO  at Clemson University, and Executive Director of Omnibond, offered his insights on why HPC matters, as well as some of the other important issues in the industry, during SC14.

  • Jimmy Pike of Dell Offers His Insights on Why HPC Matters at SC14

    Senior Fellow and Chief Architect and Technologist at Dell, Jimmy Pike, offered his insights on why HPC matters, as well as some of the other important issues in the industry, during SC14.

  • TACC's Dan Stanzione Offers His Thoughts on Why HPC Matters

    Dan Stanzione, Ph.D., the executive director of the Texas Advanced Computing Center offers his thoughts at SC14 on why HPC matters, and some of the other important issues in the industry.

  • Skip Garner of Virginia Tech on Why HPC Matters

    Next up, Virginia Tech's Skip Garner, Ph.D. offers his thoughts on why HPC matters, and other important issues in the industry, at SC14.  Dr. Garner is also the chief scientist at Genomeon and Heliotext.

  • Dell PowerEdge C4130 Performance with K80 GPUs - HPL

    by Saeed Iqbal and Mayura Deshmukh 

    There is an ever increasing demand for compute power. This demand has pushed server designs towards higher hardware accelerator density. However, most such designs have a standard system configuration, which may not be optimal for maximum performance across all application classes. The latest high density design from Dell, the PowerEdge C4130, offers up to four GPUs in a 1U form factor. Also the uniqueness of PowerEdge C4130 is that it offers a configurable system design, potentially making it a better fit, for the wider variety of extreme HPC applications. 

    This blog is about performance characterization of the C4130 on HPL. We present data on performance achieved, power consumption and performance per watt on various system configurations.

    The latest HPC focused Tesla series General Purpose Graphic Units (GPU) released from NVIDIA is the Tesla K80. From the HPC prospective the most important improvement is the 1.87 TFLOPs (double precision) compute capacity, which is about 30% more than K40, the previous Tesla card.  The K80 auto-boost feature automatically provides additional performance if additional power head room is available. The internal GPUs are based on the GK210 architecture and have a total of 4,992 cores, which represent a 73% improvement over K40. The K80 has a total memory of 24GBs divided equally between the two internal GPUs; this is a 100% more memory capacity compared to the K40.  The memory bandwidth in K80 is improved to 480 GB/s. The rated power consumption of a single K80 is a maximum of 300 watts.

    The C4130 offers five configurations “A” through “E”. Since GPUs provide the bulk of compute horsepower, the configurations can be divided into groups based on expected performance, the first group of three configurations, “A”, “B” and “C”, with four GPUs each and the second group of two configurations, “D” and “E”, with two GPUs each. The first two quad GPU configurations have an internal PCIe switch module. The details of the various configurations are shown in the Table 1 and the block diagram (Figure 1) below (Click on all images to enlarge):

    Table 1: C4130 Configurations



    Figure 1: C4130 Configuration Block Diagram

    Table 2 gives more information about the hardware configuration and the benchmark details used for the tests. For HPL the problem size used was ~85%-~90% of the system memory.

    Table 2: Hardware Configuration and Benchmark Details

                     

    Figure 2: HPL Performance, Efficiency and Acceleration on the Five C4130 Configurations

    Figure 2 shows the HPL performance characterization of PowerEdge C4130.  Configurations “A”, “B” and “C” are four GPU configurations with performance from 6.5 to 7.3 TFLOPS.  The difference from “A” to “B” is due to the extra CPU in configurations “B”. Overall the “C” configuration has the highest performance of 7.3 TFLOPS. The difference from “B” to “C” is due to different GPU to CPU ratios; both have the same number of compute resources. Configuration “C” is balanced with two GPUs per CPU while “B” has the all four GPU attached to a single CPU. On the two GPU configurations, “D” is higher with 3.8 TFLOPS and “E” with 3.6 TFLOPS. The difference can be explained due to one less CPU with configuration “E”.

    Compared to a CPU-only performance, an acceleration of 9X is obtained by using four K80 and an acceleration of 4.7X with two K80 GPUs. The HPL efficiency is significantly higher on K80 (low to upper 80s) compared to previous generation of GPUs.

                

    Figure 3: Total Power and Performance/Watt on the Five C4130 Configurations

    Figure 3 shows the power consumption data for the HPL runs in Figure 2. In general, GPUs can consume substantial power when loaded with compute intensive workloads. As shown above, the power consumption of configurations “A”,”B” and “C” is significantly higher (2.9X to 3.3X) compared to CPU-only runs; this is due to the four K80 GPUs. Power consumption of “D” and “E” is lower (1.8X to 2.0X compared to CPU-only runs).

    The power efficiency, i.e. the useful work delivered for every watt of power consumed, is in the 4+ GFLOPS/w range for quad GPU configurations and about 1.8X to 2X range for dual GPU configurations. Configuration “C” offers the highest Performance per watt at about 4.23 GFLOPS/W

    Compared to the CPU-only performance per watt of just 1.5 GFLOPS/w, the quad GPU configurations show a 2.7X and dual GPU configurations show a 2.3X improvement in the overall performance/watt.

    In conclusion, the C4130 meets the current challenges of a high-density accelerator-enabled compute node. Targeted specifically towards the HPC market, it offers world class performance and unique configurability options to fit extreme HPC requirements.  

     

  • Tulane's Charlie McMahon on Why HPC Matters

    Offering his thoughts on why HPC matters and other important issues in our industry is Charlie McMahon of Tulane University.

  • Larry Smarr from UCSD Offers his Thoughts on Why HPC Matters

    Next up offering his views on why HPC matters and other hot topics of the day is Larry Smarr, Ph.D. from the University of California, San Diego's Cal IT 2.

  • John D'Ambrosia Tells Us Why HPC Matters

    John D'Ambrosia of Dell offers his views on why HPC matters and other hot topics of the day.