Shop
Support
Community
TechCenter
Home
Topics: All
Wikis
Forums
Blogs
Video
TechChat
Events
About
TechCenter
Dell Community
Search Options
Search Everything
Search TechCenter Chats
TechCenter
>
TechCenter Extras
>
TechCenter Chats
>
TechCenter Chats - Wiki
>
05-05-09 HPC Performance
Join
Sign in
05-05-09 HPC Performance
TechCenter Chats
Home
Wiki
Group and Wiki Navigation
Loading...
Search
Article
History
05-05-09 HPC Performance
TechCenter Chats - Wiki
Join our own Dr. Jeff Layton and other Dell experts in the high-performance computing (HPC) space for a discussion on performance with Intel® Xeon® processor 5500 series microprocessors.
Technical Community - Background Reading
HPCC on the Dell TechCenter
Jeff's HPCC Blog
Blog on Nehalem Memory Performance
Blog: Introduction to Nehalem-EP
Related Links
Platform Cluster Manager Webinar
Chat Transcript
DELL-ScottH
Kong!
DELL-ScottH
Everyone welcome Kong, the newest member of our TechCenter family
Wm._Hank_Lea_Intel
Hi Kong
Dell-KongY
Hello world. :)
DELL-ScottH
Follow him on twitter @kongy_dell...he needs stalkers
Dell-KongY
Thanks for the intro Scott
DELL-ScottH
Welcome Jeff
DELL-ScottH
Alright, let's kick off the official chat
Dell-JeffL
Sorry about that. Network trouble. Operating on 3G, well 2G :)
Wm._Hank_Lea_Intel
Ronak Singal is the Intel Xeon processor 5500 series architecture expert; let's see who can stump him
DELL-ScottH
Couple of quick things…
Dell-JeffL
Okay, if a tree falls and no one is around to hear it... :)
DELL-ScottH
This is an informal chat, so jump in any time; we have a loose structure, but it's driven by your questions. This week is all about Intel Nehalem HPC performance on our Dell 11th-generation PowerEdge servers. Joining us today we have Jeff Layton from the HPC team
Dell-JeffL
HPC on Nehalem rocks!
erson
Intel-ronak, which bumps on the E5520 CPU are not used? :)
DELL-ScottH
Along with Onur from the Dell HPC team
erson
/me tries to stump Ronak
DELL-ScottH
And with them we have some great guests from Intel. Hank and Ronak Please feel free to say a few words, and then we'll let Jeff dive right into the meat
DELL-ScottH
Oh, and we also have Juan Ortiz from the Dell HPC team
Wm._Hank_Lea_Intel
@erson, what is your concern about the bumps on E5520?
DELL-ScottH
If you haven't seen Jeff's latest blog posts, be sure to check them out:
http://en.community.dell.com/blogs/hpcc/default.aspx
. Great information on Nehalem architecture
Wm._Hank_Lea_Intel
Oh, I see, that's a good one
DELL-ScottH
FYI, right-click links, or you might be exited from the chat
erson
Wm._hank_lea_intel, no concern really, it's just a pretty hard question to answer. :)
Wm._Hank_Lea_Intel
All the bumps are used if you think about physical connections. You may have us stumped right there
DELL-ScottH
@jeff, did we lose you on the 2G network?
Dell-JeffL
I'm still here; didn't know if you wanted me to jump in
DELL-ScottH
FYI, you can reach Hank and Ronak on the Intel Server Room:
http://communities.intel.com/openport/community/server
DELL-ScottH
Yes, Jeff, please start us off on this week's discussion
Dell-JeffL
Okay
Dell-JeffL
In general terms we've been seeing really great performance from Nehalem for a wide range of HPC applications.
DELL-ScottH
Welcome Aziz. Use Action, Recent Room History at any time to catch up
Dell-JeffL
Those that are memory bandwidth limited typically see a huge performance gain with Nehalem. I can blab on and on, but I'll start with questions if anyone has any
erson
Should HPC customers wait for Beckton?
mrobbert
Do you have any idea what kinds of codes are typically memory-bandwidth limited?
Dell-JeffL
@erson, I'm not familiar with Beckton
mrobbert
Or how should we check the applications that we use?
DELL-ScottH
What have you seen with going from 1,333, to 1,066 to 800 on the memory bus? Memory configurations certainly are interesting in this architecture
Wm._Hank_Lea_Intel
The performance will vary depending on which benchmark you are interested in. For a summary of the performance comparison, please refer to:
http://www.intel.com/performance/server/xeon/summary.htm?iid=perf_server_lhn+dp_sum
intel-ronak
Mrobbert, various applications can be memory bandwidth limited. We see very significant bandwidth needs for applications like the oil/gas reservoir simulations for instance and many other HPC applications. If you look at Specfp, you can see a wide variety of different applications and see the gains that Nehalem gets
DELL-ScottH
@erson, I've been in the processor game for 20 years, and I love how the "should we wait" thread consistently comes up
Dell-JeffL
A good example of memory-bandwidth limited applications is CFD codes. Weather codes are also good examples, as are QCD codes
erson
Dell-jeffl, Nehalem-ex, 4S and up, eight cores per CPU and two threads per CPU. Release time is Q4 2010 to Q1 2010
Dell-JeffL
@erson, Got it
erson
Scotth, yeah, I love it too, but there are usually good and not so good times to purchase things
intel-ronak
@erson, Beckton and Nehalem-EP both have their purposes. Depends on your application and what is the system configuration you are trying to optimize for (how many threads, what memory capacity, etc.)
DELL-ScottH
Hello grimsrud
Dell-JeffL
@erson, Personally I wouldn't wait for Nehalem-EX. The memory bandwidth per core is less than -EP. For some HPC applications -EX will be great, but for memory-bandwidth limited applications it may not be a big win
DELL-ScottH
Hello brady
Brennels
Does it handle kernel memory more efficiently?
Dell-JeffL
Ronak?
Brady.Lambert
Hi Scott, sorry am late
DELL-ScottH
Welcome newcomers; use Action, Recent Room History to catch up. And jump in at any time with questions or discussion
erson
If the memory bandwidth is a big concern you are also limiting yourself to using only six DIMM slots with Nehalem-EP
Dell-JeffL
@erson, that's true for the moment :) I wouldn't be surprised to see Dell support up to 12 DIMM slots at 1,333 :)
Wm._Hank_Lea_Intel
@erson, It can be costly to wait for the next generation; check out the ROI calculator here:
https://roianalyst.alinean.com/roi_calculators/autologin.do?d=238900127442387057
erson
Jeffl, that's great news
Dell-JeffL
In addition, going from 1,333 to 1,066 isn't too bad a drop. The drop to 800 is bad. Plus, there are some gotchas that can make things worse
erson
Wm._hank_lea_intel, and especially since 4S are more than twice as expensive as the 2S servers, might see some improvement on power and cooling with the former though, I guess
Wm._Hank_Lea_Intel
Yes, we believe in most cases you can see full return on investment in less than one year
Dell-JeffL
But it depends on the applications, correct? @erson, what applications are you running?
erson
Jeffl, sad to say nothing HPC-related really. I'm just here to feast on the technology chat :)
Dell-JeffL
:)
DELL-ScottH
Welcome newcomers. You can use Action, Recent Room History to catch up
Dell-JeffL
No sweat. BTW, on one application we're running, CFD, we saw over a three-times boost from a 3.0 GHz Harpertown to a 2.66 GHz Nehalem (1,066 memory). Quite the performance bump
erson
Is Microsoft getting any traction with their HPC version of Windows Server among those who buy HPC clusters from buyers? Strike that last “buyers” and replace with “Dell”
DELL-ScottH
Jeff, are these benchmarks posted?
Wm._Hank_Lea_Intel
CFD is very cool for Formula One racing such as the BMW-Sauber team
Dell-JeffL
The benchmarks aren't posted yet. :) I'm working on posting them and should have an upcoming white paper on CFD applications on Nehalem
Wm._Hank_Lea_Intel
They can do multiple tests without using a wind tunnel, saving $$$$$
Dell-JeffL
@erson, We're seeing some uptake on Windows HPCC. I don't know the exact numbers, but it's been fairly steady and lots of people are looking at it
DELL-ScottH
Okay, cool. FYI for anyone reading the chat transcript, look here:
http://www.dell.com/benchmarks
erson
Jeffl, then the Nehalem would probably take on 4 times hexacore Dunningtons then?
Dell-JeffL
@erson, For some applications I expect a 2S Nehalem to pretty much kill a 4S Dunnington, but it really depends on the application. For some applications I would expect the 4S Dunnington to beat 2S Nehalem (but I haven't done any direct comparisons yet)
erson
Jeffl, why is that? What are they seeing as the reason to go Microsoft in this area that usually has been totally dominated by UNIX/Linux?
kurtis_wagner
Scott, I have a question regarding a customer's experiences in testing a PowerEdge M600 versus a PowerEdge M610
Dell-JeffL
@erson, You don't ask hard questions do you?
DELL-ScottH
@kurtis, shoot
kurtis_wagner
Okay
intel-ronak
@erson, You can see some of the standard benchmarks (Specint, Specfp, Specjbb, etc.) and compare 2S Nehalem and 4S Dunnington. As Jeff said, it will vary by benchmark on which wins
Dell-JeffL
@erson, I've heard lots of reasons (but I don't work for Microsoft). One reason is that HPC is pretty good about squeezing performance from multi-core machines, so understanding HPC allows them a leg up on multi-core enterprise servers. Another reason is that HPC is actually becoming a reasonable big part of CPU sales. So Microsoft thinks the market is worth going after
kurtis_wagner
I have a customer who is testing a proprietary nuclear application and is running a PowerEdge M610 with HT turned on but only using eight cores. He is experiencing a 40 percent performance decline from the PowerEdge M600
Dell-JeffL
@kurtis_wagner, Turn off HT and perhaps even Turbo
kurtis_wagner
However, when he uses all 16 cores with HT, he experiences a 40 percent gain in performance from the PowerEdge M600. What is causing this?
intel-ronak
@kurtis, Which OS is being used?
Dell-Onur_C
@kurtis, It may be due to the way the OS scheduler is scheduling the processes
kurtis_wagner
Microsoft Winders
Dell-JeffL
@kurtis, I wonder if he doesn't have a fair number of I/O wait states and using the HT cores allows the I/O threads to process independently of the real cores?
erson
Jeffl, are there actually cases when Turbo would be better turned off?
Dell-JeffL
BTW, We've seen some HPC applications do better turning on HT and Turbo. Mostly they are life science and chemistry applications
Dell-Onur_C
@kurtis, it may be the case that the processes are being scheduled on the same physical processors instead of distributing them evenly. The recommendation would be to spawn as many processes as virtual processors; 16 if HT is on, eight if HT is off
Dell-JeffL
@erson, Yep. CFD :) We turn off HT and Turbo and we get faster results. I think Stream is that way as well (not that Stream is a real application). I'm not sure about HPL because I've seen conflicting recommendations
erson
I can see that turning off HT might be advantageous depending on the OS and application, but Turbo just squeezes out a little extra MHz if possible. Odd that Turbo would be bad for performance
kurtis_wagner
A load balancing issue?
Dell-JeffL
@erson, Well that's a good point. I tend to turn off HT and Turbo together, so I can't say for sure. Might be something good to test :) I'll have to do that. One thing that Turbo can do for you is give you variability in run times. One time you may get a fast run (Turbo kicks in) and the next time Turbo doesn't kick in and it runs slower and you may start pulling your hair out to figure out why
Wm._Hank_Lea_Intel
Turbo swaps thermal headroom for peak performance
erson
But if it's an application that stresses all four cores, then the Turbo would overclock them each equally much? Ronak maybe could confirm that?
kurtis_wagner
What’s weird is that if HT is on and you don’t fully utilize the cores with associated threads spread across each core, there is performance degradation.
Dell-Onur_C
@erson, many HPC codes use synchronize processes across multiple nodes. Turbo may cause some nodes to run faster and some at base speed, in which case the synchronization will slow the whole application down to the speed of the slowest node
Dell-JeffL
@erson, Ideally I would love an OS that knew the difference between HT cores and real cores. Then I would schedule the application on the real cores, and use the HT cores for clearing up I/O wait states—that's a really killer use of cores—just wish I could get Linux to do that :)
erson
Dell-onur_c, very good point. I took kurtis_wagner’s load balancing issue input on a single CPU and not on "per node"
Wm._Hank_Lea_Intel
Here is a cool video showing Turbo in action:
http://www.youtube.com/watch?v=yvvc9nywrtg
Dell-JeffL
@kurtis, ask the customer to rerun the application with HT off and use all eight cores and compare it to the PowerEdge M600. Then ask them to turn HT on and run it on 16 cores and compare
kurtis_wagner
Yup, that’s what I was going to do :)
Dell-JeffL
@kurtis, send me an e-mail with the results; I'm in the company directory :)
kurtis_wagner
He is wondering if its Microsoft or our Nehalem also
intel-ronak
@kurtis, There can be cases where threads are not optimally scheduled (i.e., if you have eight threads, they may get scheduled to < eight cores). This also comes into play on how the OS schedules for Numa awareness
Dell-JeffL
@ronak, very interesting. I didn't know that. It's an OS issue overall?
intel-ronak
@erson, With Turbo, all cores in a socket turbo up equally. We do not allow active cores in the same socket to run at different frequencies (of course, an "inactive" core is not running)
kurtis_wagner
I’m a 15-year UNIX vet. Is there a process table per CPU that he can reference in Microsoft to determine where his processes are being scheduled?
intel-ronak
@jeff, yes, it is an OS issue. From a CPU perspective, we just execute the threads where the OS places them. If the OS does a poor job in scheduling, not much the CPU can do
intel-ronak
@kurtis, yes
Dell-JeffL
@ronak, any hints what OS or kernels don't do a good job? Any way to tell there's a problem?
intel-ronak
@kurtis, we have done quite a bit of work with all the OS vendors on this topic
intel-ronak
@jeff, I won't single out any operating systems in a forum like this :) But, the more modern the OS, the better off you usually are
erson
Interesting that the E5540 actually has another Turbo Boost binning than E5530 and E5520
Dell-JeffL
@ronak, understood. I was wondering if any specific Linux kernel version showed more problems than others
jpenney
@ronak, by modern do you mean "more frequently updated/released"?
Dell-JeffL
I think "modern" is a good definition by itself :)
intel-ronak
@erson, why does the turbo configuration surprise you?
Dell-JeffL
BTW, I thought I would take a moment and let everyone know that Onur_c is the head of HPCC engineering at Dell! Truly a great person to know and a very good engineer
kurtis_wagner
Sounds good. Is there a way to interact with the scheduler to send processes to specific processors?
Dell-Onur_C
Thanks Jeff, we know you are the real superhero
intel-ronak
@erson, and actually, not sure you have that data right. The 5540 is 2.53 w/ 1/1/1/2 Turbo (1 bin of Turbo if 2–4 cores are active, 2 bins if 1 core is active). The 5530 is 2.4 with the same 1/1/1/2
Dell-JeffL
I'm just good looking, that's all :) (and for those who know me you can close your mouth and stop laughing so hard)
DELL_garima_kochhar
@kurtis, I know you need the answer for Windows. For the Linux it would be numactl
Wm._Hank_Lea_Intel
Previous Turbo link was bad. Try this one:
http://www.youtube.com/watch?v=yvvc9nywrtg&feature=channel_page
Dell-JeffL
BTW, we have a couple of HPCC engineering people on this chat in addition to Onur. I wanted to acknowledge them and say thanks as well!
Dell-JeffL
@garima, with Linux does numatcl know the difference between HT cores and real cores?
erson
intel/ronak, I thought E55?0 would all have 1/1/2/2 as Turbo Boost binning but E5540 has 1/1/1/2
DELL_garima_kochhar
@jeffl, I don't think Linux distinguishes real cores versus HT. Something to dig into
erson
intel-ronak, I'm reading this from the spec update .pdf from Intel
DELL_garima_kochhar
I mean numactl in Linux
kurtis_wagner
I’m just wondering if he can tell the programmers to write some code to utilize the scheduler to increase the performance of their application when they do not want to fully utilize all 16 cores with HT turned on
Dell-JeffL
@garima, I haven't seen anything yet. I think that the CPU doesn't tell the OS which are HT and which are real (at least I don't think so)
intel-ronak
@erson, Turbo upside is another configuration parameter that you will see vary by product
erson
Turbo upside?
kurtis_wagner
Can you turn off Turbo?
Dell-JeffL
@wagner, yes
intel-ronak
@kurtis, Turbo can typically be disabled in the BIOS
erson
Page 17 of this
http://www.intel.com/assets/pdf/specupdate/321324.pdf
shows E5540 having another Turbo Boost binning than E5520 and E5530
kurtis_wagner
Okay. Thanks.
DELL_garima_kochhar
@kurtis, you can use OpenManage CLI (omconfig) to turn Turbo on and off
intel-ronak
@erson, my notes say that 5540 and 5530 are the same turbo (1/1/2/2). Need to go double check
DELL_garima_kochhar
R'ber to reboot!
Dell-JeffL
@kurtis, I tend to turn off Turbo for the simple reason that it makes my testing consistent (i.e., I should see about the same run time from run to run). Otherwise, I would turn it on :)
intel-ronak
@jeff, have you seen variation with Turbo enabled from run to run that is greater than normal run to run variation?
Dell-JeffL
@kurtis, one other problem we see in Linux is that if you are running only eight cores with HT activated (16 total cores), if a daemon wakes up, the kernel scheduler can move the process to an unused core. Consequently, it could start on a real core and get moved to an HT core. This usually manifests itself as what we call "OS jitter," but it's pretty easy to see
Dell-JeffL
@ronak, yes, but only a couple of early runs. Then I started turning off Turbo. But it may have been a function of the early BIOS. How about you?
intel-ronak
@jeff, we have been running everything we do w/ Turbo. We have not seen an issue on the Nehalem server with Turbo and variation
Dell-JeffL
@ronak, did you test w/ Turbo and w/o to see if there was much difference in the variation?
intel-ronak
@jeff, yes. If you want best performance with normal, you want Turbo enabled
erson
Has any customer purchased a Dell HPC cluster with Nehalem yet?
Dell-JeffL
@ronak, agreed
Dell-JeffL
@erson, yes. There is quite a large system at Lawrence Livermore: 4,000 Nehalem cores :)
erson
I can imagine those crawling up on the Top 500 list pretty easily
Dell-JeffL
@erson, me too :) I'm not sure it's on the list though. It's part of a system called Hyperion that is a community system for testing scaling of operating systems, applications, etc. It's a joint project with Lawrence Livermore, Dell, Intel, and some other companies to look at these issues
intel-ronak
@erson, and I think the Top 500 understates how great Nehalem is for HPC (since Top 500 is based solely on Linpack, which doesn't show the benefit of the great memory bandwidth for instance)
kurtis_wagner
I have a customer who is testing a PowerEdge M610 right now with the intention of purchasing a 256-node cluster
erson
I'm guessing blades are the preferred configuration for HPC clusters? 64 PowerEdge M610 in one rack is a mighty punch
Dell-JeffL
@ronak, that's one of the crimes of the Top 500; it doesn't show off Nehalem :) We have one customer who wanted to reach a certain Tflops with Nehalem, but we would be getting three times the performance (he's running CFD). We tried to convince him he needed fewer nodes, but it became political so he had to stick with the Tflops measurement
Dell-JeffL
@erson, blades aren't always the best solution. There are a lot of factors that figure into blades versus racks.
DELL-ScottH
Please keep going for as long as you need. I've got to run to a meeting...I would like to thank Jeff and team and Hank and Ronak from Intel before I run...and thanks to everyone for coming to the chat
Dell-JeffL
I'll stay here for a while, but if anyone needs to run I understand
erson
Jeffl, cooling could make blades troublesome; are there any other factors that would be against blades in an HPC scenario?
Wm._Hank_Lea_Intel
Ronak, can you stay for a few extra minutes?
Dell-JeffL
@erson, cost can be one thing but that depends on the exact configuration. Blades also have limited local disks (some applications need lots of local I/O). If you need PCIe cards, such as Gp-gpus or video cards, then blades aren't the best solution either
intel-ronak
Sure, I can stay for a few minutes more
Brennels
Thanks, guys. Double-Take Software loves Intel...see you next time
Dell-JeffL
@ronak, @wm, thanks guys for helping out! It's been really useful. Send me an e-mail at jeffrey_layton _at_ dell.com (or anyone for that matter)
Wm._Hank_Lea_Intel
Thanks Brennels!
erson
Jeffl, very true, but cost is often less with a filled PowerEdge M1000e compared to R-series servers, at least that was why I bought a PowerEdge M1000e recently instead of the PowerEdge R710 I was first gunning for
Dell-JeffL
@erson, hang on a sec.
kurtis_wagner
Thanks everyone
Dell-JeffL
@erson, sorry…quick phone call. Yes, for some scenarios, blades are cheaper than racks. But I haven't done a big study, so I don't know if this is true for every situation
intel-ronak
Any final questions?
Dell-JeffL
@ronak, many thanks!
Wm._Hank_Lea_Intel
Thanks to all for joining the conversation; if you have Intel-related questions, please check out The Server Room:
www.intel.com/server
erson
Thanks everyone; first talk about HPC clusters for me. Looking forward to upcoming chats as always
intel-ronak
Thanks all for the questions
Wm._Hank_Lea_Intel
You can also check out
www.clusterconnection.com
for more information
Dell-JeffL
Thanks everyone! We're hoping to have more HPC chats coming up. If you have any topics, please let me know at jeffrey_layton _at_ dell.com
erson
Your alias, I assume, is hpcdoc@dell.com? ;)
Dell-JeffL
I don't know that alias :) Pretty cool one though. I might have to ask for that one.
Dell-JeffL
Thanks everyone!
jpenney
Is there a transcript of this chat available?
erson
Does Dell have any blog or other related to HPC?
erson
Jpenney, probably tomorrow
Dell-JeffL
Yes, check back to the TechCenter, and you will see transcripts listed there
Dell-Onur_C
Thanks to all for joining. Check out
DELL.COM/HPCC
for more information on Dell's HPC program. Also, we typically have many HPC white papers and best practices in
Dell Power Solutions
magazines at
DELL.COM/PowerSolutions
erson
Jpenney, you can use Action, Recent Room History to see about the last half hour or so
Dell-JeffL
@erson, yep, I write a blog on HPCC at this Web site. Just look at the links in the top left-hand corner, and you will see links to HPCC and a link to the blog