High Performance Computing Blogs

High Performance Computing

High Performance Computing
A discussion venue for all things high performance computing (HPC), supercomputing, and the technologies that enables scientific research and discovery.
  • Application Performance Study on Intel Haswell EX Processors

    by Ashish Kumar Singh

    This blog describes, in detail, the performance study carried out on the E7-8800 v3 family of processors (architecture codenamed as Haswell-EX). The performance on Intel Xeon E7-8800 v3 has been compared to Intel Xeon E7-4800 v2 to ascertain the generation over generation performance improvement. The applications used for this study are HPL, STREAM, WRF and ANSYS Fluent. The Intel Xeon E7-8890v3 processors have 18 cores/36 threads with 45MB of L3 cache (2.5MB/slice). With AVX workloads the clock speed of Intel E7-8890 v3 reduced from 2.5GHz to 2.1GHz. These processors support QPI speed of 9.6 GT/s.

    Server Configuration                                                                                                                                         


    PowerEdge R920

    PowerEdge R930


    4 x Intel Xeon E7-4870v2 @ 2.3GHz (15 cores) 30MB L3 cache 130W

    4 x Intel Xeon E7- 8890v3 @2.5GHz (18 cores) 45MB L3 cache 165W


    512GB = 32 x 16GB DDR3 @ 1333MHz RDIMMS

    1024 GB = 64 x 16GB DDR4 @1600MHz RDIMMS

    BIOS Settings


    Version 1.1.0

    Version 1.0.9

    Processor Settings > Logical Processors



    Processor Settings > QPI Speed

    Maximum Data Rate

    Maximum Data Rate

    Processor Settings > System Profile



                                                               Software and Firmware          

    Operating System

    RHEL6.5 x86_64

    RHEL 6.6 x86_64

    Intel Compiler

    Version 14.0.2

    Version 15.0.2

    Intel MKL

    Version 11.1

    Version 11.2

    Intel MPI

    Version 4.1

    Version 5.0

    Benchmark and Applications


    V2.1 from MKL 11.1

    V2.1 from MKL 11.2


    v5.10, Array Size 1800000000, Iterations 100

    v5.10, Array Size 1800000000, Iterations 100


    v3.5.1, Input Data Conus12KM, Netcdf-

    V3.6.1, Input Data Conus12K, Netcdf-4.3.2

    ANSYS Fluent

    v15, Input Data: eddy_417k, truck_poly_14m, sedan_4m, aircraft_2m

    v15, Input Data: eddy_417k, truck_poly_14m, sedan_4m, aircraft_2m


    The objective of this comparison was to show the generation-over-generation performance improvement in the enterprise 4S platforms. The performance differences between two server generations were because of the improvement in system architecture, greater number of cores and higher frequency memory. The software versions were not a significant factor.


    High Performance LINPACK is a benchmark that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed memory systems. HPL benchmark was run on both PowerEdge R930 and PowerEdge R920 with block size of NB=192 and problem size of N=90% of total memory size.


    As shown in the graph above, LINPACK showed 1.95X performance improvement with four Intel Xeon E7-8890 v3 processors on R930 server in comparison to four Intel Xeon E7-4870 v2 processors on R920 server. This was due to substantial increase in number of cores, memory speed, flops/cycle of the processor and processor architecture.


    STREAM is a simple synthetic program to measure sustained memory bandwidth used COPY, SCALE, SUM and TRAID programs to measure memory bandwidth.

    Operations of these programs are shown below:

    COPY:       a(i) = b(i)
    SCALE:      a(i) = q*b(i)
    SUM:        a(i) = b(i) + c(i)
    TRIAD:      a(i) = b(i) + q*c(i)

    This chart showed the comparison of sustained memory bandwidth between PowerEdge R920 and PowerEdge R930 servers. STREAM showed 231GB/s on PowerEdge R920 and 260GB/s on PowerEdge R930, which is 12% improvement in memory bandwidth. This increase is because of the improvement in DIMM speed available on PowerEdge R930.


    The WRF (Weather Research and Forecasting) model is a next-generation mesoscale numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs. The model serves a wide range of metrological applications across scales from tens of meters to thousands of kilometers. WRF allows to generate atmospheric simulations based on real data (observations, analysis) or idealized conditions.

    WRF performance analysis was run for conus12KM dataset. Conus12KM data is a single domain, medium size 48-hours 12KM resolution case over continental US (CONUS) domain with a time step of 72seconds.


    With Conus12KM dataset, WRF showed 0.22seconds average time on PowerEdge R930 server, while 0.26seconds on PowerEdge R930 server, which is an 18% improvement.

    ANSYS Fluent

    ANSYS Fluent contains the broad physical modeling capabilities for model flow, turbulence, heat transfer, and reactions for industrial applications ranging from air flow over an aircraft wing to combustion in a furnace, from bubble columns to oil platforms, from blood flow to semiconductor manufacturing, and from clean room design to wastewater treatment plants.



    We used four different datasets for Fluent. We considered ‘Solver rating’ (higher is better) as the performance metric. For all the test cases with PowerEdge R930 Fluent showed 24% to 29% performance improvement in-comparision to PowerEdge R920.


    PowerEdge R930 server outperforms its previous generation PowerEdge R920 server in both benchmarks and application comparison. Due to latest processors with higher number of cores, higher frequency memory and CPU architecture improvement PowerEdge R930 gave better performance than PowerEdge R920. PowerEdge R930 platform with four Intel Xeon EX processors is very good choice for those HPC applications, which can scale up to the large number of cores and memory.







  • The Right Mix for Today’s Data Environments

    Three takeaways from Dell’s John Whittaker on leveraging both big data analytics and traditional database management tools today...

    Delving into the results of a recent Unisphere Research survey of 300 database administrators (DBAs) and corporate data managers, Dell’s Executive Director of Information Management John Whittaker gives straight-forward advice for tackling today’s complex database environments via an Industry Perspectives article on the Data Center Knowledge site.  

    While organizations and DBAs have become hyper-focused on big data, analytics, and unstructured data tools, Whittaker gives a timely reminder that structured data still matters.

    Indeed, according to the Unisphere survey he references, structured data still accounts for 75 percent of the data stack at more than two-thirds of today’s enterprises. What’s more, nearly one-third of all organizations haven’t begun actively managing unstructured data at all to this point.

    That means paying attention to tools like Oracle and Microsoft SQL Server still needs to be a priority for DBAs, even as they try to incorporate Hadoop and NoSQL into their organizations.

    But that doesn’t mean Whittaker is turning a blind eye toward these more modern technologies. On the contrary, he makes a clear case for ramping up predictive analytics to allow an organization to see not only where it’s been, but where it’s going to stay a step ahead of the competition.

    The key to doing both is recognizing that even with the rise of big data, you need to leverage the right combination of both traditional and modern database tools today. Knowing which serves each situation best, and giving your team the tools it needs for each, is the balancing act DBAs and data managers must pull off today.

    Read the entire article on the Data Center Knowledge site here.


  • Integrating hooks and tools for easier management of HPC Cluster

    Managing tens of thousands of local and remote server nodes in a cluster is always a challenge. To reduce the cluster-management overhead and simplify setup of cluster of nodes, admins seek the convenience of a single snapshot view. Rapid changes in technology make management, tuning, customization, and settings updates an ongoing necessity, one that needs to be performed and easily as infrastructure is refactored and refreshed. 

    To simplify some of these challenges, it’s important to fully integrate hardware management and the cluster management solution.  The following integration detailed in this blog between that of server hardware and the cluster management solution provides an example of some of the best practices achievable today.

    Critical to this integration and design is the Integrated Dell Remote Access Controller (IDRAC).  Since IDRAC is embedded into the server motherboards for in-band and out-of-band system management, it can display and modify BIOS settings as well as perform firmware updates through the Life Cycle Controller and remote-console. Collectively, each server’s in-depth system profile information is gathered using system tools and utilities and is available in a single graphical user interface for ease of administration, thus reducing the need to physically access the servers themselves. 

    Figure 1. BIOS-level integration between Dell PowerEdge servers and cluster management solution (Bright 7.1)

    Figure 1 (above) depicts the configuration setup for a single node in the cluster. The fabric can be accessed via the dedicated iDRAC port or shared with the LAN-on-Motherboard capability. The cluster administration fabric is configured at the time of deployment with the help of built-in scripts in the software stack that help automate this. The system profile of the server is captured in an XML-based schema file that gets imported from the iDRAC using the racadm commands. Thus relevant data such as optimal system BIOS settings, boot order, console redirection and network configuration are parsed and displayed on the cluster dashboard of the graphical user interface.  By reversing this process, it is possible to change and apply other BIOS settings onto a server to tune and set system profiles from the graphical interface. These choices are then stored in an updated XML-based schema file on the head node, and pushed out to the appropriate nodes during reboots.

    Figure 2. Snapshot of the Cluster Node Configuration via cluster management solution.

    Figure 2 is a screenshot showing BIOS version and system profile information for a number of Dell PowerEdge servers of the same model. This is a particularly useful overview as inappropriate settings and versions can be easily and rapidly identified. 

    Typical use would be when new servers are added or replaced in a cluster. The above integration will help to ensure that all servers have similar homogenous performance, BIOS versions, firmware, system profile and other tuning configurations.

    This integration is also helpful for users who need custom settings – i.e. not the default settings - applied on their servers. For example codes that are latency sensitive may require custom profile with C-States disabled. These servers can be categorized into a node group, with specific BIOS parameters applied to that group.

    This tightly coupled BIOS level integration delivers capabilities that provide a significantly enhanced solution offering for HPC cluster maintenance that provides a single snapshot view for simplified updates and tuning.  As a validated and tested solutions on the given hardware, it provides seamless operation and administration of clusters at scale.   


    1. http://www.brightcomputing.com/Bright-Cluster-Manager
    2. http://en.community.dell.com/techcenter/systems-management/w/wiki/3204.dell-remote-access-controller-drac-idrac
    3. http://www.brightcomputing.com/Linux-Cluster-Architecture
    4. http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2014/09/23/bios-tuning-for-hpc-on-13th-generation-haswell-servers
    5. http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2015/04/29/linpack-benchmarking-on-a-4-nodes-cluster-with-intel-xeon-phi-7120p-coprocessors
  • Congratulations to Team South Africa

    by David Detweiler

    Congratulations to Team South Africa on their second place finish in the Student Cluster Competition at the International Supercomputing Conference (ISC) in Frankfurt, Germany earlier this month. The students hailing from the University of Witwatersrand narrowly missed three-peating as champions.

    The team was comprised of Ari Croock, James Allingham, Sasha Naidoo, Robert Clucas, Paul Osei Sekyere, and Jenalea Miller, with reserve team members Vyacheslav Schevchenko and Nabeel Rajab. Together, they represented the Centre for High Performance Computing (CHPC) at the competition.


    The South African students competed against teams from seven other nations over a sleep-depriving three days. During the competition, the teams were tasked with designing and building their own small cluster computers, and run a series of HPC benchmarks and applications. In addition, the students were assigned to optimize four science applications, three of which were announced before the competition, with the fourth introduced during the event.

    The competition was sponsored by ISC and the HPC Advisory Council. Each team was scored based on three criteria:

    • Performance of the HPCC benchmark run accounted for 10%
    • A suite of test applications worth 80%
    • Explanation before a panel of experts of the strategy they used and results achieved (10%)

    With young people like team South Africa entering the field, the future of HPC looks brighter than ever. Congratulations on a job well done!

  • Big Data is Unlocking New Opportunities in Scientific Research

    Scientific research has reaped the rewards offered by big data technologies. New insights have been discovered in a wide range of disciplines thanks to the collection, analysis and visualization of large data sets. In a recent series of articles, insideBigData examined some of the noteworthy benefits researchers are realizing when adopting big data technologies.

    The results of big data adoption have been impressive. Just a few examples of how big data is being employed across a variety of disciplines include:

    • Health sciences – Data around genes, proteins, small molecules and other important indicators can now be stored in single repositories. This data can then be shared with researchers around the world.
    • Neurosciences – The activities of neurons in the human brain are currently being mapped. Researchers hope to discover not only how the brain functions and develops, but also new ways to treat disease and trauma.
    • Climate sciences – Massive amounts of information about climate and weather are now being collected and analyzed. Climate and environmental scientists are gaining heretofore unimaginable insights into everything from warming oceans to severe weather.

    Data has afforded researchers tremendous opportunities. Researchers are collaborating across companies, disciplines and even continents in manners never before available to them. As with any rapid adoption of technology, some difficulties are to be expected. However, the benefits offered by big data far outweigh any potential issues.

    Big data has already ushered in exciting scientific advancements across a myriad of disciplines. The benefits for researchers – and society – are just beginning.

    You can learn more about Big Data and scientific research in this white paper.

  • HPC and Big Data are Growing More Aligned

    The alignment between high performance computing (HPC) and big data has steadily gained traction over the past few years. As analytics and big data continue to be top of mind for organizations of all sizes and industries, traditional IT departments are considering HPC solutions to help provide rapid and reliable information to business owners so they can make more informed decisions.

    This alignment is clearly seen by increasing sales of hyper-converged systems. IDC predicts sales of these systems will increase 116% this year compared 2014, reaching an impressive $807 million. This significant growth is expected to continue over the next few years. Indeed, the market is expected to experience nearly 60% compound annual growth rate (CAGR) from 2014 to 2019, at which time it will generate more than $3.9 billion in total sales.

    To meet this growing customer demand, more hyper-converged systems are being offered. For example, the latest offering in the 13th generation Dell PowerEdge server portfolio, the PowerEdge C6320, is now available.  These types of solutions help organizations meet their increasingly demanding workloads by offering improved performance, power improvements and cost-efficient compute and storage. This allows customers to optimize application performance and productivity while conserving energy use and saving traditional datacenter space.

    Among the top research organizations and enterprises utilizing the marriage between HPC and big data is San Diego Supercomputer Center (SDSC). Comet, it’s new, recently-deployed petascale supercomputer is leveraging 27 racks of PowerEdge C6320, totaling 1,944 nodes or 46,656 cores. This represents a five-fold increase in compute capacity versus their previous system. In turn this affords SDSC the ability to provide HPC to a much larger research community. You can read more about Comet and how it is being in this Q&A with Rick Wagner, SDSC’s high-performance computing systems manager. (LINK)

    Learn more about the PowerEdge C6320 here

  • The Democratization of HPC

    The democratization of HPC is under way. Removing the complexities traditionally associated with HPC, and focusing on making insightful data more easily accessible to a company’s users are the lynchpins to greater adoption of high performance computing for organizations beyond the more traditional groups.

    HPC is no longer simply about crunching information. The science has evolved to include predicting and developing actionable insights. That is where the smaller, newer adopters uncover the true value of HPC.

    However, these organizations can become overwhelmed by the amount, size, and types of information they’re collecting, storing, and analyzing. Increasingly, these enterprises are identifying HPC as an efficient and cost effective solution to quickly glean valuable insights from their big data applications.

    That cost-effective efficiency can yield impressive measureable results. In just one example, Onur Celebioglu, Dell’s director of HPC & SAP HANA solutions, Engineered Solutions and Cloud, cited how HPC has allowed life sciences using big data to slash genetic sequencing from four days to just four hours per patient. That reduction has provided an untold improvement in treatment plans, which has bettered the lives of patients and their families.

    Greater democratization also occurs when companies realize it is possible to leverage HPC, cloud, and big data to benefit their business without abandoning their existing systems. Having the ability to build onto an existing system as business needs warrant, allows more organizations that otherwise couldn’t reap the benefits of HPC to do so.

    You can read more about the democratization of HPC at EnterpriseTech.


  • New IT Academy in South Africa Helps Students Pursuing HPC Careers

    Promising students in South Africa will now have an exciting new opportunity to obtain greater, more in-depth experiences in high performance computing (HPC). A partnership between the South African Department of Trade and Industry (DTI), the Center for High Performance Computing (CHPC), and Dell Computers has resulted in a new IT academy.

    Slated to open in January of 2016, each year the Khulisa IT Academy will play host to promising students from economically disadvantaged areas throughout the country. "Khulisa" translates as "nurturing" in the isiZulu language.

    The purpose of the academy is to grow the skill set and experience of young South Africans pursuing careers in HPC. During their two-year terms at the academy, students will be able to marry the theoretical aspects of HPC they have learned in the classroom with real-life, practical experiences offered through various industry internships.

    To allow the students to concentrate on their education and future professions, each will receive a stipend for the duration of their time at the academy. Upon graduation, these rising HPC stars will be ready to enter into careers in any number of industries.

    Dell is honored to be able to play a small role in helping these worthy students. The company is investing financially in the academy, as well as offering startup funding for the ventures of students with proven entrepreneurial skills.



  • The Democratization of Genomics Continues: How Health IT Professionals Can Enable Genomic-Driven Precision Medicine

    by Seth Feder

    Genomics is no longer solely the domain of university research labs and clinical trials. Commercial entities such as tertiary care hospitals, cancer centers, and large diagnostics labs are now sequencing genomes. Perhaps ahead of the science, consumers are seeing direct marketing messages about genomic tumor assessments on TV.  Not surprising, venture capitalists are looking for their slice of the pie, last year investing approximately $248 million in personalized medicine startups. 

    So how can health IT professionals get involved? As in the past, technology coupled with innovation (and the right use-case) can drive new initiatives to widespread adoption. In this case, genomic medicine has the right use-case and IT innovation is driving adoption.   

    While the actual DNA and RNA sequencing takes place inside very sophisticated instrumentation, sequencing is just one step in the process. The raw data has to be processed, analyzed, interpreted, reported, shared, and then stored for later use.  Sound familiar?  It should, because we have seen this before in such fields as digital imaging which drove the wide spread deployment of Picture Archiving and Communicating Systems (PACS) in just about every hospital and imaging clinic around the world.  

    As in PACS, those in clinical IT must implement, operationalize, and support the workflow. The processing and analysis of genomic data is essentially a big data problem, solved by immense amounts of computing power.  In the past, these resources were housed inside large exotic supercomputers only available to elite institutions. But today HPC built on scale-out x86 architectures with multi core processors have made this power attainable to the masses – and thus democratized.  Parallel file systems that support HPC are much easier to implement and support, as are standard high bandwidth InfiniBand and Ethernet networks. Further, public cloud is emerging as a supplement to on-premise computing power.  Some organizations are exploring off-loading part of the work beyond their own firewall, either for added compute resources or as a location for long term data storage.

    For example, in 2012 myself and others at Dell worked with the Translational Genomics Research Institute (TGen) to tune its system for genomics input/output demands by scaling its existing HPC cluster to include more servers, storage and networking bandwidth. This allowed researchers to get the IT resources they needed faster without having to depend on shared systems. TGen worked with the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC) to develop methodology for fast sequencing of childhood cancer tumors, allowing NMTRC doctors to quickly identify appropriate treatments for young patients. 

    You can now get pre-configured HPCs to work with genomic software toolsets, which enabled clinical and translational research centers like TGen to do large-scale sequencing projects. The ROI and price per performance is compelling for anyone doing heavy genomic workloads.  Essentially, with one rack of gear, any clinical lab now has all the compute power needed to process and analyze multiple genome sequences per day, which is a clinically relevant pace. 

    Genomic medicine is here, and within a few years will become standard care to sequence many diseases in order to determine proper treatment.  As the science advances, the HPC community will be ready contribute in making this a reality. You can learn more here.


  • SDSC Transitions to Early Operations Stages of Comet

    by Tom Raisor

    The San Diego Supercomputer Center (SDSC) at the University of California, San Diego has transitioned into the early operations stages of its new Comet supercomputer. When it is fully functioning, the new cluster will have an overall peak performance approaching two petaflops.

    Comet has been designed as a solution for the "long tail" of science, which refers to the significant amount of research that is computationally-based, but modest-sized. Together, these projects represent a great amount of research and potential scientific impact. Much of this research is being conducted in disciplines that are new to high performance computing such as economics, genomics and social sciences.

    The Comet cluster includes:

    • An Intel Xeon® Processor E5-2600 v3 family, with two processors per node, and 12 cores per processor running at 2.5GHz.
    • 128 gigabytes (GB) of traditional DRAM and 320 GB of local flash memory on each compute node.
    • 27 racks of 72 nodes each (1,728 cores) with a full bisection InfiniBand FDR interconnect from Mellanox, and a 4:1 over-subscription across the racks.
    •  A total of 1,944 nodes or 46,656 cores

    You can learn more about Comet and its mission to serve the long tail of science here.