By: Nishanth Dandapanthula, Munira Hussain
The new generations Intel Sandy Bridge Servers have PCIe Generation 3 bus slots available that offer many benefits in terms of bandwidth and latency which is useful for inter-node communication in a High Performance Computing Cluster. In terms of transfer rate, PCIe Gen3 offers up to 8 GT/s rate versus 5 GT/s provided by the older generation PCIe Gen2 slots. Additionally, the PCIe Gen3 uses a different encoding scheme that results in lower overhead and delivers greater bandwidth and lower latency.
In this blog we will focus on the performance comparison between PCIe Gen2 versus PCIe Gen3 and the impact it has on bandwidth and latency. The performance improvement will be measured for both Quad Data Rate (QDR) and Fourteen Data Rate (FDR) InfiniBand Adapters on the 12G servers.
The Fourteen Data Rate (FDR) InfiniBand from Mellanox  are PCIe Gen3 based cards delivering up to 54 Gbits/s of theoretical bandwidth on a 4X link lane slot. The Quad Data Rate (QDR) Infiniband adapters are both Gen2 based and Gen3 adapters. The theoretical bandwidth for the QDR adapters is 32 Gbits/s on a slot with 4X link lanes.
With the introduction of FDR InfiniBand adapters and PCIe Gen3 slots, the 12G servers provide an enormous improvement in bandwidth and latency from the interconnect perspective. We used the experimental setup as shown in Table 1 (12G) and Table 2 (11G) to quantify the advantage provided by 12G servers at a micro benchmark level, when compared to the previous 11G servers. To obtain the best possible latency, the BIOS options have been set as mentioned in Table 3. The servers were connected back to back without a switch in order to demonstrate the absolute performance improvement without considering the overhead introduced by the switch.
Table 1: R620 Configuration
Table 2: R610 Configuration
Table 3: BIOS Settings
The following results were obtained by using MVAPICH 1.2  and OSU Micro benchmarks 3.1.1 . We compared the performance of three different interconnect speeds; FDR PCIe Gen3, QDR PCIe Gen3 and QDR PCIe Gen2 using the latency, bandwidth and bi-directional bandwidth benchmarks from the OSU benchmarks suite.
From Figure 1, we can infer that:
- An 87 % improvement in bandwidth was obtained when QDR PCIe Gen2 (11G) and FDR PCIe Gen3 (12G) are compared.
- A 16 % improvement when QDR PCIe Gen2 (11G) and QDR PCIe Gen3 (12G) are compared. This can be attributed to the benefits provided by the PCIe Gen3 Slot.
Figure 1: OSU Bandwidth
Figure 2 represents the performance comparison using the OSU Bidirectional Bandwidth benchmark:
- A 69% improvement is seen when QDR PCIe Gen2 (11G) and FDR PCIe Gen3 (12G) are compared.
- A 20 % improvement when QDR PCIe Gen2 (11G) and QDR PCIe Gen3 (12G) are compared.
Figure 2: OSU Bidirectional Bandwidth
Figure 3 and Figure 4 depict the OSU Latency benchmark comparison over different interconnect speeds. With the new 12G servers, we hit the lowest micro benchmark level latency, when the FDR PCIe Gen3 adapters are used.
- For small message sizes, the latency is better by 40 % when QDR PCIe Gen2 (11G) and FDR PCIe Gen3 (12G) are compared.
- The latency numbers for small message sizes show minimal difference between QDR PCIe Gen3 and FDR PCIe Gen3 performance. For large message sizes, the difference in performance is significant when QDR PCIe Gen3 and FDR PCIe Gen3 are compared.
Figure 3: OSU Latency (Small Message Size)
Figure 4: OSU Latency (Large Message Size)
Studies comparing an FDR PCIe Gen3 Adapter and a QDR PCIe Gen2 Adapter using the OSU benchmark suite indicate that the:
- Bandwidth is better by 87%
- Bi-directional Bandwidth is better by 69%
- Latency is lower by 40%
In subsequent blogs, we plan on application level studies on a larger cluster to understand the performance at scale.