Join us at Super Computing 2011!We invite you to visit us at the SC2011 conference in Seattle, Nov 14-17 at Booth #2040 See first-hand how we are enabling research discovery with Dell HPC solutions.
Introduction by Dell's Dr. Jeff Layton There is little doubt that GPUs (Graphical Processing Units) are having an impact on High Performance Computing (HPC). GPUs offer the potential of increased performance for many, but not all, algorithms through massive parallelism and very high memory bandwidth, at commodity level pricing. GPUs are also challenging in that they use a different programming model requiring many applications to be ported or re-examined to take advantage of the performance increase.One of the limitations of the architecture is that GPUs cannot directly perform IO to storage devices or communicate with network devices. Consequently data has to be sent from the CPUs to the GPUs through the PCIe bus. In the case of communicating with other nodes, the GPUs have to communicate with the CPU which then sends the data to the other CPU. This increases the run time of the application which is never something that anyone wants.This blog from Gilad Shainer at Mellanox describes a technique for reducing the communication time that Mellanox and Nvidia have been developing.-- Dr. Jeff Layton
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, has made graphics accelerators a compelling platform for computationally demanding tasks in a wide variety of application domains. Due to the great computational power of the GPU, the GPGPU method has proven valuable in various areas of science and technology. GPU based clusters are being used to perform compute intensive tasks, like finite element computations, Computational Fluids Dynamics, Monte-Carlo simulations etc. Several of the world leading supercomputers are using GPUs in order to achieve the desired performance. Since the GPUs provide high core count and floating point operations capability, a high-speed networking such as InfiniBand is required to connect between the GPU platforms, in order to provide the needed throughput and the lowest latency for the GPU to GPU communications.While GPUs have been shown to provide worthwhile performance acceleration yielding benefits to both price/performance and power/performance, several areas of GPU based clusters could be improved in order to provide higher performance and efficiency. One of the main performance issues with deploying clusters consisting of multi-GPU nodes involves the interaction between the GPUs, or the GPU to GPU communication model. Prior to the GPU-Direct technology, any communication between GPUs had to involve the host CPU and required buffer copy. The GPU communication model required the CPU to initiate and manage memory transfers between the GPUs and the InfiniBand network. Each GPU to GPU communication had to follow the following steps:1. The GPU writes data to a host memory dedicated to the GPU2. The host CPU copies the data from the GPU dedicated host memory to host memory available for the InfiniBand devices to use for RDMA communications 3. The InfiniBand device reads data from that open area and send it to the remote node
The requirement of having the CPU involved in the GPU communications, and the need for a buffer copy have created bottlenecks in the system and slow the data delivery between the GPUs.The new GPU-Direct technology from NVIDIA and Mellanox enables NVIDIA Tesla and Fermi GPUs to communicate faster by eliminating the need for a CPU to be involved in the communication loop and by eliminating the need for the buffer copy. The result is increased overall system performance and efficiency by reducing the GPU to GPU communication time by 30%. The GPU-Direct is based on a new interface between the GPU and the InfiniBand adapters that enables both devices to share the same system memory.
The performance gain for high-performance applications depends on the amount of GPU communication being used. Application that demonstrates good parallel execution can see performance gain or productivity increase of up to 42%. Other application can show lower increase in performance but all of them will show performance and efficiency improvements with Mellanox GPU-Direct technology. GPU-Direct is an essential technology for GPU-based systems. It delivers the capability to maximize the performance capability of the GPUs and the overall systems performance. Gilad Shainer Biographical SketchGilad Shainer is an HPC evangelist that focuses on high-performance computing, high-speed interconnects, leading-edge technologies and performance characterizations. He is a senior director of HPC and technical computing at Mellanox Technologies and the chairman of the HPC Advisory Council (www.hpcadvisorycouncil.com). Mr. Shainer holds an M.Sc. degree (2001, *** Laude) and a B.Sc. degree (1998, *** Laude) in Electrical Engineering from the Technion Institute of Technology in Israel. He also holds patents in the field of high-speed networking.