It seems like as long as I can remember the recommendation, and sometimes the requirement, for memory on a High Performance Computing (HPC) compute node has been 2GB/core.  Note that this is a capacity only requirement. Often the price of this memory would dominate the price of the compute node. I personally have struggled to meet this 2GB/core requirement within the budget; or worse, within the projected budget since memory prices are like the Dow Jones Industrial Average and go up and down almost daily. But things have been changing consistently and rapidly these past few years. And our tendency to stick to that old memory standard needs to change also.

I first noticed the trend in November of 2011, about a year and a half ago, while preparing to attend Supercomputing in Seattle, WA. The purchase price of a 2-GB DIMM was less than the price of a 1-GB DIMM. In mid-2012, the 1-GB DIMM entered End-Of-Life (EOL) at Dell. That is, the 1-GB DIMM was no longer even offered or available. If one wanted a memory capacity based upon 1-GB DIMMs, well, they got twice the memory at the same speed or faster but for a lower price. Being from Louisiana, we call that lagniappe!

To further complicate things, the Intel SandyBridge-EP processor (E5-2600) was introduced in early 2012. It featured 4 memory channels per socket vs. 3 memory channels for the previous generations of Westmere/Nehalem-EP (5500/5600). Based upon the core count of the processor selected, going with a hard-wired GB/core requirement without regard to the number of memory channels can lead to some very bad memory configurations in terms of memory performance. The icing on this cake of confusion was the introduction of Intel’s Flex Memory, which allows an infinite number of what I will call “unbalanced” configurations. [For this blog, I will focus on making optimal balanced memory recommendations. More on the pros and cons of unbalanced and near-balanced memory configurations can be found in the links below.]

Today in HPC we, in general, highly desire balanced memory configurations since they provide the best performance. The thumb rules for a balanced memory configuration with the maximum performance are simple:

  1. Always populate all memory channels with the same number of DIMMs (That is, on all processors use the same DIMMs Per Channel or DPC)
  2. Always use identical DIMMs across a memory bank.
  3. Always use 1 DPC (if possible to meet the required memory capacity)

By now you may have discovered that I am proposing turning things upside-down: create the “best” memory configuration for a given compute node and then see if the memory/core capacity is sufficient. In other words, in high performance computing take memory performance into account (first) in addition to the age-old capacity requirements. And as usual, price comes into play.

To make a memory recommendation, I first access Dell’s R620 webpage to get the current list prices of memory [ ]. The Dell R620 is a general purpose, workhorse, 1U, rack-mounted, 2-socket, Intel SandyBridge-EP (E5-2600) compute node platform. Below is a snapshot of the memory options and their prices taken on 12-July-2013.

Notice that 1-GB DIMMs are not available as explained above. But also notice that the trend has continued!  The 4-GB DIMM is now less expensive than the 2-GB one. I see another EOL coming…

Additionally, the $/GB leader is now the 16-GB DIMM as shown in the table and figures below.

This $/GB leader opens up a lot of possibilities, but for now let’s remain focused on the 2GB/core standard and address these other possibilities in a future blog.

Now suppose we use the least expensive DIMM and populate a Nehalem (5600) or Sandy Bridge (E5-2600) socket for optimal performance. In the figures below, I have followed the population thumb rules previously presented for balanced memory configurations. A socket is depicted with its corresponding memory channels. All the memory channels have been populated identically. There is 1 DPC in use. The corresponding GB/core for each possible core count of the possible processors to be placed in the socket is listed below each figure.

In all cases the 2GB/core capacity is met or exceeded in a configuration set up for optimal performance. This configuration uses the minimum amount of the lowest cost DIMMs to achieve the maximum memory performance. Additionally, this represents the lowest number of parts to fail and provides for memory capacity expansion if needed. This is today’s sweet spot memory configuration for HPC. The GB/core capacity is a side effect. Oh, and the sweet spot changes.

In contrast, one might naively pick the better $/GB DIMMs, partially populate the memory channels, and exploit Flex Memory to address the exact capacity requirements. Some options are shown below, separating the core counts and addressing them independently.

Yes, these will work, as in they will function and not generate any errors due to Flex Memory. Yes, they exactly meet the 2GB/core capacity-only characteristics.  Yes, they are also less expensive. And they will perform terribly. Please don’t do this! (unless you really know your memory bandwidth requirements).

Note that in all cases, one or more of the four memory channels are unused. This leaves memory bandwidth on the table. For an overview of memory performance see David Morse’s blog and see John Beckett’s whitepaper for much deeper details:

Especially notice Figure 29 of the whitepaper which indicates a potential 50% hit in memory bandwidth for unbalanced configurations, an understandable amount when using half the available memory channels.

So, there you have it. The “best” memory configuration today meeting or exceeding 2GB/core uses the 4GB DIMM and follows the maximum performance configuration thumb rules. The GB/core capacity is a side effect, but beneficially meets or exceeds the common 2GB/core requirement. Meeting GB/core requirements blindly with the available DIMM sizes, while driving down prices without regard to performance is ill-advised.

If you have an interest in additional general information about memory, memory types, etc., this is a good place to start:

How will all this change with Intel Xeon IvyBridge (E5-2600 v2), coming in just a few months? Well, faster memory will be available, but probably at a premium price. There will be more cores per socket available. And memory prices continue to fluctuate. I’ll get back to you… ;-)

If you have comments or can contribute additional information, please feel free to do so. Thanks. --Mark R. Fernandez, Ph.D.