13G PowerEdge Server Performance Sensitivity to Memory Configuration

13G PowerEdge Server Performance Sensitivity to Memory Configuration

TechCenter

TechCenter
DellTechCenter.com is a community for IT professionals that focuses on Data Center and End User Computing best practices. Here you can learn about and share knowledge about Dell products and solutions.

13G PowerEdge Server Performance Sensitivity to Memory Configuration

Author: Bruce Wagner, September 2016 (Solutions Performance Analysis Lab)

 

The goal of this blog is to illustrate the performance impact of DDR4 memory selection. Measurements were made on a Broadwell-EP CPU system configuration using the industry standard benchmarks listed in the following table 1.

 

Table 1: Detail of Server and Applications used with Intel Broadwell processor

Server

Dell PowerEdge R630

Processor

2 x E5-2699 v4 @2.2GHz, 22 core, 145W, 55M L3 Cache

Memory

DDR4 product offerings including:

8GB 1Rx8 2400MT/s RDIMM (DPN 888JG)

32GB 2Rx8 2400MT/s RDIMM (DPN CPC7G)

64GB 4Rx8 2400MT/s LR-DIMM (DPN 29GM8)

Power Supply

1 x 750W

Operating System

Red Hat Enterprise Linux 7.2 (3.10.0-327.el7.x86_64)

BIOS options

Memory Operating Mode – Optimizer

Node Interleaving – Disabled

Snoop mode – Opportunistic Snoop Broadcast

Logical Processor – Enabled

System profile – Performance

BIOS Firmware

2.1.7

iDRAC Firmware

2.30.30.30

SPECcpu2006

Intel optimized 16.0.0.101 linux64 binaries (http://www.spec.org/cpu2006)

STREAM

v5.10 source from https://www.cs.virginia.edu/stream/

Intel Parallel Studio 2016 update2 compilation

 

Table 2 and figure 1 detail the memory subsystem within the 13G PowerEdge R630 as containing 24 DIMM sockets split into two sets of 12, one set per processor. Each 12-socket set is organized into four channels with three DIMM sockets per channel.

 

Table 2: Memory channels

Processor

Channel 0 DIMM Slots

Channel 1 DIMM Slots

Channel 2 DIMM Slots

Channel 3 DIMM Slots

CPU 1

A1, A5, A9

A2, A6, A10

A3, A7, A11

A4, A8, A12

CPU 2

B1, B5, B9

B2, B6, B10

B3, B7, B11

B4, B8, B12

 

Figure 1: Memory socket locations

 

Figure 2: Performance Impact of Memory Type

 

From Figure 2 we see that a memory configuration based upon Registered DIMMs (RDIMMs) provides a comprehensive 3.1% performance advantage as compared to an equivalent sized one composed of Load-Reduced DIMMs (LR-DIMM) despite both running at 2400 MT/s. LR-DIMMs make larger capacity memory configurations possible, but their inherently higher access latency results reduced application performance. LR-DIMMs also impose a nearly 30% power consumption penalty over the equivalent size/speed RDIMM. LR-DIMM should be resorted to only when the total system memory capacity requirement dictates a 3DPC configuration.

 

 Table 3: Memory speed limits for 13G PowerEdge Models

 

 

Figure 3: Performance Impact of DIMM Rank Organization
 

From figure 3 we see that a 1DPC memory configuration composed of DIMMs of dual rank internal organization outperforms one composed of single rank DIMMs by 14%. This is due to DRAM’s large inherent delay when reversing read and write cycle access on a given rank leading to a significant reduction in throughput bandwidth on the memory channel. Given dual rank DIMMs or multiple DIMMs per channel, the CPU’s integrated memory controller can overlap schedule reads and writes on the memory channel to minimize RW turnaround time impact.


 

Figure 4: Performance Impact of Memory Speed

 

 Figure 4 shows that a 2400 MT/s memory configuration provides 14% higher overall application performance than a 2133 MT/s one all other factors being the same. Modern 8Mbit 1.2V DDR4 DIMM technology is such that the higher speed incurs only a nominal increase in power consumption and thermal dissipation. 2400 MT/s DIMMs pricing and availability is also rapidly trending to be the commodity sweet spot.

 

Figure 5: Performance Impact of DIMM Slot Population

Figure 5 shows that a 2DPC population results in a slight 0.9% workload performance uplift over a 1DPC one attributed to the same memory controller data transfer overlap efficiency improvements as discussed for figure 3. A 3DPC result is shown to further highlight the marked performance degradation that results from the necessity to down clock the memory subsystem from 2400 MT/s to 1866 MT/s.

Figure 6: Performance Impact of DIMM Population Balance

 

In figure 6 we see a wide disparity in overall system memory bandwidth as a result of DIMM population balance.

Although the default Optimizer (aka Independent Channel) Memory Operating Mode supports odd numbers of DIMMs per CPU, there is a severe performance penalty in doing so.

The full list of memory module installation guidelines can be found within the product owner’s manual available thru www.dell.com.

In summary, to maximize workload performance the recommendation for 13G 2 socket servers is to populate all available channels with (2) dual–rank registered, 2400 MT/s DIMMs per channel.

6
Comment Reminder

Unrelated comments or requests for service will be unpublished. Please post your technical questions in the Support Forums or for direct assistance contact Dell Customer Service or Dell Technical Support.. All comments must adhere to the Dell Community Terms of Use.