GPUs are arguably one of the hottest trends in HPC. They can greatly improve performance while reducing the power consumption of systems. However, because the area of GPU Computing is still evolving both for application development and for tool set creation, GPU systems need to be as flexible as possible. This blog presents some benchmarks of the configurations presented in Part 4 of this series. In some cases the benchmarks are compared to a Supermicro system that has 2 GPUs in 1U (the first GPU Computing server released) and in some cases, the benchmarks discuss scalability of certain applications that only Dell GPU configurations can deliver.First things first – Bandwidth TestingWhen you first develop a new GPU Computing solution or buy one, the first likely benchmark you are going to run is a simple bandwidth test between the host and GPU (and vice versa). You should know how the host node and the GPU are connected so you have an idea of the maximum bandwidth.For example, for a PCIe Generation 2 x16 connection has a theoretical bandwidth of 8 GB/s (http://en.wikipedia.org/wiki/PCI_Express) in one direction (which is exactly what we’re testing in the following bandwidth tests). Figure 1 below is a plot from a presentation by Dell’s Dr. Mark Fernandez at the 2010 Nvidia GTC Conference. It shows the bandwidth as a function of the number of lanes from a Dell C6100 with Intel processors and a single HIC connected to a single HIC and a single GPU in a Dell C410x.