We often look at containers today and see the potential of accelerated application delivery, scaling and portability and wonder how we ever got by without them. This blog discusses the performance of LSTC LS-DYNA® within Singularity containers and on bare metal. This blog is the third in the series of blogs regarding container technology by the HPC Engineering team. The first blog Containers, Docker, Virtual Machines and HPC explored the use of containers. The second blog Containerizing HPC Applications with Singularity provided an introduction to Singularity and discussed the challenges of using Singularity in HPC environments. This third blog will focus on determining if there is a performance penalty while running the application (LS-DYNA) in a containerized environment.
Containerizing LS-DYNA using Singularity
An application specific container is a lightweight bundle of an application and its dependencies. The primary advantages of an application container are its portability and reproducibility. When we started assessing what type of workloads/applications could be containerized using singularity, interestingly enough, the first application that we tried to containerize using singularity was Fluent®, which presented a constraint. Since Fluent bundles MPI libraries, mpirun has to be invoked from within the container, instead of calling mpirun from outside the container. Adoption of containers is difficult in this scenario and requires an sshd wrapper. So we shifted gears, and started working on LS-DYNA which is a general-purpose finite element analysis (FEA) program, capable of simulating complex real world problems. LS-DYNA consists of a single executable file and is entirely command line driven. Therefore, all that is required to run LS-DYNA is a command line shell, the executable, an input file, and enough disk space to run the calculation. It is used to simulate a whole range of different engineering problems using its fully automated analysis capabilities. LS-DYNA is used worldwide in multiple engineering disciplines such as automobile, aerospace, construction, military, manufacturing, and bioengineering industries. It is robust and has worked extremely well over the past 20 years in numerous applications such as crashworthiness, drop testing and occupant safety.
The first step in creating the container for LS-DYNA would be to have a minimal operating system (CentOS 7.3 in this case), basic tools to run the application and support for InfiniBand within the container. With this, the container gets a runtime environment, system tools, and libraries. The next step would be to install the application binaries within the container. The definition file used to create the LS-DYNA container is given below. In this file, the bootstrap references the kind of base to be used and a number of options are available. “shub” pulls images hosted on Singularity Hub, “docker” pulls images from Docker Hub and here yum is used to install Centos-7.
# basic-tools and dev-tools
yum -y install evince wget vim zip unzip gzip tar perl
yum -y groupinstall "Development tools" --nogpgcheck
# InfiniBand drivers
yum -y --setopt=group_package_types=optional,default,mandatory groupinstall "InfiniBand Support"
yum -y install rdma yum -y install libibverbs-devel libsysfs-devel
yum -y install infinipath-psm
# Platform MPI
mkdir -p /home/inside/platform_mpi
chmod -R 777 /home/inside/platform_mpi
chmod 777 platform_mpi-09.1.0.1isv.x64.bin
./platform_mpi-09.1.0.1isv.x64.bin -installdir=/home/inside/platform_mpi -silent
mkdir -p /home/inside/lsdyna/code
chmod -R 777 /home/inside/lsdyna/code
cd /home/inside/lsdyna/code/usr/bin/wget http://192.168.41.41/ls-dyna_mpp_s_r9_1_113698_x64_redhat54_ifort131_avx2_platformmpi.tar.gz
tar -xvf ls-dyna_mpp_s_r9_1_113698_x64_redhat54_ifort131_avx2_platformmpi.tar.gz
export PATH LSTC_LICENSE LSTC_MEMORY
It is quick and easy to build a ready to use singularity image using the above definition file. In addition to the environment variables specified in the definition file, the value of any other variable such as LSTC_LICENSE_SERVER, can be set inside the container, with the use of the prefix “SINGULARITYENV_”. Singularity adopts a hybrid model for MPI support and the ‘mpirun’ is called, from outside the container, as follows:
/home/nirmala/platform_mpi/bin/mpirun -env_inherit -np 80 -hostfile hostfile singularity exec completelsdyna.simg /home/inside/lsdyna/code/ls-dyna_mpp_s_r9_1_113698_x64_redhat54_ifort131_avx2_platformmpi i=Caravan-V03c-2400k-main-shell16-120ms.k memory=600m memory2=60m endtime=0.02
LS_DYNA car2car Benchmark Test
The car2car benchmark presented here is a simulation of a two vehicle collision. The performance results for LS-DYNA are presented by using the Elapsed Time metric. This metric is the total elapsed time for the simulation in seconds as reported by LS-DYNA. A lower value represents better performance. The performance tests were conducted on two clusters in the HPC Innovation Lab, one with Intel® Xeon® Scalable Processor Family processors (Skylake) and another with Intel® Xeon® E5-2600 v4 processors (Broadwell). The software installed on the Skylake cluster is shown in Table 1
The configuration details of the Skylake cluster are shown below in Table 2.
Figure1 shows that there is no perceptible performance loss while running LS-DYNA inside a containerized environment, both on a single node and across the four node cluster. The results for the bare-metal and container tests are within 2% of each other, which is within the expected run-to-run variability of the application itself.
Figure 1 Performance inside singularity containers relative to bare metal on Skylake Cluster.
The software installed on the Broadwell cluster is shown in Table 3
The configuration details of the Broadwell cluster are shown below in Table 4 .
Figure 2 shows that there is no significant performance penalty while running LS-DYNA car2car inside Singularity containers on Broadwell cluster. The performance delta is within 2% only.
Figure 2 Performance inside singularity containers relative to bare metal on Broadwell Cluster
Conclusion and Future Work
In this blog, we discussed how the performance of LS-DYNA within singularity containers is almost at par with running the application on bare metal servers. The performance difference while running LS-DYNA within Singularity containers remains within 2%, which is within the run-to-run variability of the application itself.. The Dell EMC HPC team will focus on containerizing other applications, and this blog series will be continued. So stay tuned for the next blog in this series!