Computational Structural Mechanics is commonly used by scientists and engineers to reduce the product development cycle time across various industries ranging from aerospace to structural biology. One of the most successful techniques that lends itself to computational methods in structural analysis is the finite element method. The finite element method is used to solve the resulting partial differential equations, inevitably making it a compute and memory intensive task.
ANSYS Mechanical is a well-known and widely used software package for computational structural mechanics. It can perform comprehensive static and dynamic analysis on structures. It uses the finite element method to model the associated structure or process and offers various built-in solvers to solve the resulting linear system. In addition, it has a library of material models making it easy-to-use and perform coupled-physics simulations.
Typical, available processing power limits the size and number of ANSYS Mechanical simulations. Traditionally, parallel processing is used to reduce the simulation runtimes. Recently, the popularity of using Graphics Processors Units (GPU) to accelerate the simulations has generated interest in the ANSYS community because GPUs coupled with parallel processing can further reduce the simulation runtime significantly. ANSYS Mechanical version 13 has had support for GPU acceleration. In this study we evaluate the acceleration with a single M2090 GPU on seven standard ANSYS Mechanical benchmarks.
Table 1, lists the benchmarks along with their problem sizes and the solver they use.
Table 1: Benchmarks
ANSYS Mechanical Benchmark
Problem Size in Degree of Freedom (DOFs), Solver
CG - 1
1100K, JCG solver
SP – 1
400K, Sparse solver
SP – 2
1000K, Sparse solver
SP – 3
2300K, Sparse solver
SP – 4
SP – 5
2100K, Sparse solver
SP – 6
4900K, Sparse solver
The Dell PowerEdge R720 is used for running the ANSYS benchmarks. The R720 is a feature rich dual-socket 2U server that can be configured with two internal GPUs as well as act as a host for external GPUs. We have used the R720 in both internal and external configurations. For external GPUs we have used the Dell PowerEdge C410X. The C410X provides a unique, flexible and powerful 3U PCIe Expansion Chassis for housing up to 16 external GPUs. The PE C410X can connect up to eight hosts simultaneously and share the GPUs among them by mapping 2, 4 or 8 GPUs per host. Table 2 shows the software and hardware configuration was used for this study.
Table 2: Hardware and Software Configuration
Two Intel E-5 2660 2.2 GHz, 95W
128GB @ 1333 MHz
NVIDIA Tesla M2090
GPU Memory bandwidth
Theoretical Peak Performance: Single Precision
Theoretical Peak Performance: Double Precision
External GPU Chassis
Power Edge C410X
3U, sixteen GPUs
ANSYS has several license models which limit the number of CPU cores usable for ANSYS runs. The two core license is common; we use a two core license for this study.
We measure the acceleration due to a single M2090 GPU of the benchmarks listed in Table 1. The results are shown in Figure 1. The total runtime, including I/O, is selected as a performance metric in each case. A lower time to run is better. From the graph, it is observed that the mean (geometric) acceleration by using a single GPU, across the seven benchmarks, is 79.1% for internal GPU configuration and 77.9% for the external GPU configuration. The slight difference is assumed to be due to the improved CPU to GPU bandwidth in the internal GPU configuration.
Figure 1: Runtimes of the seven ANSYS benchmarks.
For more information: