Authors:  Saeed Iqbal, Shawn Gao

Computational Structural Mechanics is commonly used by scientists and engineers to reduce the product development cycle time across various industries ranging from aerospace to structural biology.  One of the most successful techniques that lends itself to computational methods in structural analysis is the finite element method.  The finite element method is used to solve the resulting partial differential equations, inevitably making it a compute and memory intensive task.

ANSYS Mechanical is a well-known and widely used software package for computational structural mechanics. It can perform comprehensive static and dynamic analysis on structures.  It uses the finite element method to model the associated structure or process and offers various built-in solvers to solve the resulting linear system. In addition, it has a library of material models making it easy-to-use and perform coupled-physics simulations.

Typical, available processing power limits the size and number of ANSYS Mechanical simulations. Traditionally, parallel processing is used to reduce the simulation runtimes. Recently, the popularity of using Graphics Processors Units (GPU) to accelerate the simulations has generated interest in the ANSYS community because GPUs coupled with parallel processing can further reduce the simulation runtime significantly. ANSYS Mechanical version 13 has had support for GPU acceleration.  In this study we evaluate the acceleration with a single M2090 GPU on seven standard ANSYS Mechanical benchmarks.

Table 1, lists the benchmarks along with their problem sizes and the solver they use.

Table 1: Benchmarks

ANSYS Mechanical Benchmark

Problem Size in Degree of Freedom (DOFs), Solver

CG - 1

1100K, JCG solver

SP – 1

400K, Sparse solver

SP – 2

1000K, Sparse solver

SP – 3

2300K, Sparse solver

SP – 4

1000K, Sparse solver

SP – 5

2100K, Sparse solver

SP – 6

4900K, Sparse solver

Configuration

The Dell PowerEdge R720 is used for running the ANSYS benchmarks. The R720 is a feature rich dual-socket 2U server that can be configured with two internal GPUs as well as act as a host for external GPUs.  We have used the R720 in both internal and external configurations. For external GPUs we have used the Dell PowerEdge C410X. The C410X provides a unique, flexible and powerful 3U PCIe Expansion Chassis for housing up to 16 external GPUs.  The PE C410X can connect up to eight hosts simultaneously and share the GPUs among them by mapping 2, 4 or 8 GPUs per host. Table 2 shows the software and hardware configuration was used for this study.

Table 2: Hardware and Software Configuration

PowerEdge R720

Processor

Two Intel E-5 2660  2.2 GHz, 95W

Memory

128GB @ 1333 MHz

OS

RHEL 6.2

CUDA

4.0

GPU

Model

NVIDIA Tesla M2090

GPU cores

512

GPU Memory

6 GB

GPU Memory bandwidth

177 GB/s

Theoretical Peak Performance: Single Precision

1331 GFLOPS

Theoretical Peak Performance: Double Precision

665 GFLOPS

Power Capping

225W

Benchmark Suite

ANSYS Mechanical

Version 14

External GPU Chassis

Power Edge C410X

3U, sixteen GPUs

 

ANSYS has several license models which limit the number of CPU cores usable for ANSYS runs. The two core license is common; we use a two core license for this study.

Conclusion and Results

We measure the acceleration due to a single M2090 GPU of the benchmarks listed in Table 1.  The results are shown in Figure 1. The total runtime, including I/O, is selected as a performance metric in each case.  A lower time to run is better. From the graph, it is observed that the mean (geometric) acceleration by using a single GPU, across the seven benchmarks, is 79.1% for internal GPU configuration and 77.9% for the external GPU configuration. The slight difference is assumed to be due to the improved CPU to GPU bandwidth in the internal GPU configuration.

Figure 1: Runtimes of the seven ANSYS benchmarks.

For more information: