By Nishanth Dandapanthula

What can you expect from the latest servers from Dell? What kind of performance and energy efficiency do all those speeds and feeds translate to? We spent the last several weeks in the HPC lab at Dell putting the 12G servers through some tests and this blog captures some of those results.

Dell’s all new 12th generation (12G) dual socket PowerEdge® servers feature the Intel® Xeon® E5-2600 series processors. These processors are based on the latest Intel micro-architecture, codenamed Sandy Bridge. 12G servers include many features beyond SandyBridge. There have been enhancements in systems management, power efficiency, network adapters and options to mix 1GbE and 10GbE, SSD drives and so on.

In this blog, we focus on compute performance and energy efficiency. We quantify the performance improvement provided by the 12G servers when compared to the previous 11th generation (11G) servers. The 11G servers, released in 2009, were based on the Intel Xeon 5500 and 5600 series processors (Nehalem-EP and Westmere-EP). We use a variety of applications and micro benchmarks for our comparison. This article gives a detailed account of single server performance evaluation comparing 11G and 12G servers.

Sandy Bridge vs. Westmere

The 11G servers included the Xeon X5600 series processors (Westmere). Table 1 describes the basic differences between Sandy Bridge and Westmere. With the increased number of cores, memory channels, QPI links etc., it is not hard to perceive that Sandy Bridge will have a profound impact on performance when compared to Westmere. Intel also introduced Advanced Vector Extensions (AVX) [1] in Sandy Bridge. Among the plethora of advantages provided by AVX, it doubles the number of FLOPS/cycle when compared to Westmere or Nehalem. This provides a huge boost in performance. A complete description of AVX is provided in [2].

Table 1: E5-2600 Vs. X5600

Machine Configuration & Experimental Setup

To quantify the performance advantage provided by 12G servers over 11G servers for HPC compute workloads, we compare the PowerEdge R620 (12G) server to the PowerEdge R610 (11G) server. Both these servers have a 1 U form factor and are 2 socket systems. Table 2 and Table 3 provide the server configurations of both the machines. The versions of BIOS and iDRAC (integrated Dell Remote Access Controller) used were the latest revisions at the time of the experiments. Table 4 provides the BIOS settings used for the experiments. Note that the evaluation R620 used for the tests was an engineering prototype machine, with the latest test firmware and BIOS at the time.

Table 2: R620 Configuration

Table 3: R610 Configuration

Our base processor on the R620 is the Intel Xeon E5 -2680, which is a 2.7GHz (C1 stepping - proto), 130W processor. To match the core speed of the base processor on the R610, we picked the X5660 which is a 2.8 GHz, 95W Westmere processor. We also use the X5690 which is a 3.46 GHz, 130W Westmere processor to match the wattage of the base processor.

Table 4: BIOS Settings

Memory Bandwidth

Figure 1: Stream Triad Memory Bandwidth

The Stream [4] benchmark is used to measure the memory bandwidth of the system. Figure 1 shows that, relative to the previous generation PowerEdge R610 server:

-       There is an ~85 % increase in memory bandwidth when 1600 MHz DIMMs are used on the R620

-       A ~61 % increase in memory bandwidth is measured when 1333 MHz DIMMs are used on the R620.

-       Taking into account the additional cores on the R620, the memory bandwidth per core on the R620 is still better by 21-39% when compared to the R610.

HPL Performance and Energy Efficiency


To accomplish the HPL runs on both the server configurations, we use Intel MPI 4.0.3.008, Intel MKL 10.3. The problem size for each of the HPL iterations is maintained at a constant of 90% of the total server memory. HPL efficiency measured as the ratio of sustained performance to theoretical maximum performance shows a 6 % improvement on the R620 server when compared to the R610.

Figure 2 represents the results pertaining to a single server. Results are presented relative to the PowerEdge R610 configured with 2.8GHz processors.

-       There is a 175 % increase in absolute performance when similar core speed processors are used (bars for R610, 2.8 GHz, 95W and R620, 2.7Gz, 130W, 1333MHZ) with the 12G server performing significantly better.

-       There is not much of a difference in HPL performance on the 12G servers when 1333MHz DIMMS are switched with 1600MHz DIMMS. This validates the study made in [3] that shows that HPL is not sensitive to memory speed.

Figure 2: HPL Performance

Figure 3 shows the energy efficiency of a server when HPL is being run. Energy efficiency is measured as in terms of performance delivered for each watt of power consumed (Performance/W or GLFOPS/W). Results are presented relative to the PowerEdge R610 configured with 3.46GHz processors.

-       There is a 100% increase in GFLOPS/Watt when compared to R610. That is, a 12G server provides double the performance when consuming the same amount of power as an 11G server. This impact can be attributed to the 33% more cores on the Sandy Bridge processors when compared to the Westmeres, and to increase in number of Floating Point operations executed every clock on the Sandy Bridge. It is also indicative of the overall energy efficient design of the Dell PowerEdge R620.

Figure 3: Power Consumption when running HPL

Idle Power

Figure 4: Idle Power Consumption

Idle power is measured as the power consumed by a server after it has reached a stable state (Boot process is complete) but is idle with no jobs running on the system. Most data centers tend to have some downtime during off peak hours. Idle power is an important metric to determine the energy efficiency of the data center when no jobs are running. Figure 4, depicts the relative idle power usage for different configurations of the servers. Results are presented relative to the PowerEdge R610 configured with 3.46GHz processors.

-       The 12G servers consume 21% less power compared to a 11G server when processors with similar wattage are used (bars corresponding to the R610 with 3.46GHz, 130W processors, 1333 MHz DIMMs and the R620 with 2.7 GHz, 130 W processors, 1333 MHz DIMMs).

The idle power consumed by the 12G machines is startlingly low. This is due to the improvements made for energy efficiency not just in the Intel SandyBridge processors but also on the overall Dell platform. As mentioned before, 12G servers from Dell have several new features and enhancements beyond Intel’s processors.

Summary & Conclusion

Studies comparing a 12G server and an 11G server indicate that

  • HPL performance on the new 12G servers is better by
    • 175 % when machines with similar core speed processors are compared
    • 100% when GFLOPS/Watt are compared
    • Memory Bandwidth is better by 85 % on the 12G servers.

The subsequent blogs will give a detailed account of the communication aspect of the 12G servers and their advantages when compared to 11G servers. In future, we also plan on following it up with a blog that will provide an insight into application level performance.

References

  1. http://software.intel.com/en-us/avx/
  2. http://software.intel.com/en-us/articles/intel-avx-new-frontiers-in-performance-improvements-and-energy-efficiency/.
  3. http://i.dell.com/sites/content/business/solutions/whitepapers/ja/Documents/HPC_Dell_11g_BIOS_Options_jp.pdf
  4. http://www.cs.virginia.edu/stream/