Come join the Dell PowerEdge M610x and C410x Server teams as they enter the chat room with Special Guests from NVIDIA to talk about all things Dell and GPGPU!
From CUDA cores to x16 Gen2 PCIe slots, this collection of specialists will be standing by, ready to field questions on the powerful Dell and NVIDIA partnership and how it can accelerate your datacenters applications!

Technical Community - Background Reading

http://en.community.dell.com/dell-blogs/b/inside-enterprise-it/archive/2010/06/09/dell-bolsters-blade-offerings-with-new-poweredge-m610x-and-m710hd.aspx
http://bladesmadesimple.com/2010/06/dell-announces-new-blade-servers-m710hd-and-m610x/
http://en.community.dell.com/dell-blogs/enterprise/b/inside-enterprise-it/archive/2010/08/05/paving-the-way-for-powerful-thinkers.aspx
http://en.community.dell.com/dell-blogs/enterprise/b/inside-enterprise-it/archive/2010/08/06/poweredge-c410x-whiteboard-topology.aspx
http://www.nvidia.com/object/fermi_architecture.html


Chat Transcript



Dell-BrandonD Good afternoon everyone.
Dell-KongY Today's chat is around GPGPU on the Dell PowerEdge Server Portfolio.
Dell-KongY Our guests are: Chris Christian, Dell Server Pg Marketing, and Brandon, Dell Data Center Solutions Marketing.
Dell-KongY From NVIDIA, we have Travis and Roy.
Chris_Christian Well, we have spoken about the Dell PowerEdge Dell PowerEdge M610x previously in Dell Chats, but with the recent launch of the Dell PowerEdge Dell PowerEdge C410x, Dell has really rounded out its portfolio of GPU platforms.
Chris_Christian The Dell PowerEdge Dell PowerEdge M610x, as you all know, can support up to 2x16 PCIe expansion slots, or one double width/full length card.
Chris_Christian Such as the NVIDIA Tesla with Fermi Architecture!
Chris_Christian The Dell PowerEdge C410x can deploy up to 16 of such card slots in a very small form factor.
trentg The Dell PowerEdge M610x would take two slots in an M1000e correct? (compared to Dell PowerEdge M605)
Dell-KongY Wow! The Dell PowerEdge C410x is really cool. Can't wait to get it in our lab. :)
Chris_Christian Both of these platforms can bring many cores of CUDA processing power to bear on your datacenters’ application needs!
Dell-KongY @trentg: That's correct, the Dell PowerEdge M610x is a full height blade server.
Roy_Kim Dell PowerEdge C410x is the densest GPU platform on the planet! It's quite a technical feat.
Roy_Kim To introduce myself, I'm a product manager from NVIDIA in the Tesla group.
trentg Alright, thanks. I need to move around some blades to make room for them.
Dell-BrandonD Yes, we are very excited. The 16.5 TFLOPS of compute in 3U (rack unit) is pretty neat stuff.
Chris_Christian @roy: Indeed it is. A massive amount of processing power can be deployed in that small form factor.
Chris_Christian @NVIDIA: What are some of the ways that the CUDA core and Fermi architecture accelerate data processing?
Roy_Kim That's a great question, Chris. A loaded one.
Roy_Kim From a 10-foot pole level, Fermi is a huge transition for us (NVIDIA).
Roy_Kim We took many risks to ship this to market because, as you may know, we added many must-have features into Fermi.
Chris_Christian @Roy: Loaded with processing power!
Roy_Kim Yes. We now have peak double precision performance of 515 GFLOPS.
Roy_Kim That's up from 78 GFLOPS in the previous generation.
Roy_Kim That's a huge amount of chip dedicated to compute.
Roy_Kim We also have 1.03 TFLOPS of single precision performance.
Chris_Christian @Roy: What is a typical deployment of Tesla cards to hosts? What do you see as a customer’s typical setup?
Roy_Kim It really depends on the needs of the customer.
Dell-KongY NVIDIA: Fermi Architecture http://www.NVIDIA.com/object/fermi_architecture.html.
Roy_Kim For example, in oil and gas, we have customers with large amounts of parallel data. Some of them do 4:1 (GPU to CPU) deployments.
Roy_Kim We also find that a popular deployment is 1:1 (GPU to CPU).
Chris_Christian @Roy: Tell us about some of the first steps customers can take to see how they can deploy GPU technology in their datacenters.
Dell-KongY Also, there will be a transcript of today's chat, and it will be posted sometime tomorrow.
Roy_Kim Good question Chris. I'd say start off by looking at the code that you want to accelerate. Perhaps you can start off with a Dell Precision T7500 with a NVIDIA Tesla C2050 just to do some development.
Roy_Kim Or the expansion chassis.
Roy_Kim The Dell PowerEdge C410x.
Roy_Kim There is a lot of CUDA material the customer can access for learning how to accelerate their code.
Roy_Kim Typically, we've seen customers accelerate from 5x to 50x from CPU version of code.
trentg @Roy: What about GPU to CPU ratio for Gaussian and GAMESS?
Chris_Christian @Roy: Are the specific porting tools available? Are there specific types of code or apps that you see readily transferable to GPU?
Roy_Kim On GAMESS, I'd have to ask the specialist on that. Can I get back to you on that?
trentg Sure.
Roy_Kim On Gaussian, it's not quite ready yet. I believe there is a third party trying to port it over to CUDA.
trentg LAMMPS is also popular on our clusters.
Roy_Kim Okay, I just found out. On GAMESS, there's a lot of activity going on, but right now it runs only on a single GPU.
trentg Ok, good to know.
Roy_Kim LAMMPS is a great app. It's multi-GPU.
Roy_Kim We get tremendous performance on LAMMPS.
Chris_Christian @Roy: What other technologies play well with GPU? Virtualization? NAND flash storage (i.e., Fusion-io)?
Roy_Kim In LAMMPS, one GPU (previous generation) can outperform 12 CPUs.
Roy_Kim According to Oak Ridge N Lab.
Roy_Kim With Fermi, I believe LAMMPS will do much better.
Roy_Kim In LAMMPS, with two GPUs, it scales so well that it is 4x performance of 24 CPUs.
trentg Great. We look forward to testing apps with a couple of Dell PowerEdge M610x before we move further into GPGPU territory.
Roy_Kim Yes. LAMMPS is a tremendous app for CUDA, so there’s a lot of material out there for you to look into for accelerating LAMMPS.
trentg Thanks Roy.
Roy_Kim One of LAMMPS code developers, a well known one, spoke at a NVIDIA booth last year at Supercomputing. http://nvidia.fullviewmedia.com/sc09/nvidia-sc09-paul-crozier-sandia-national-laboratory.html
Roy_Kim @chris: Question about porting tools.
Roy_Kim We have a very mature set of SDK and tools, such as Visual Studio profiler, debugger.
Roy_Kim It's a very extensive set of tools, and you can find them on NVIDIA.com. It's not "easy" to code parallel, even with multi-cores.
Roy_Kim But (not biased.) J We have a very good set of tools to help developers code parallel.
Roy_Kim @chris: Question on other technologies. As you can imagine, we are working with many partners in the HPC Community.
Roy_Kim Fusion-io is an interesting technology.
Roy_Kim I can't announce anything right now, but all iIcan say is it'd be a shame for us not to work with technology advancers like Fusion Io.
Roy_Kim J
Roy_Kim In terms of virtualization, yes, we are working on that as we speak.
Roy_Kim It takes a lot of software and driver work to make this happen. but we have important customers in cloud market.
Dell-BrandonD @Roy: Can you talk about the GPU direct technology NVIDIA and Mellanox are developing and the benefits it will offer to HPC customers?
Roy_Kim @brandon
Roy_Kim Good question.
Roy_Kim GPU direct is a technology that reduces latency of data movement from IB HCA to GPUs.
Roy_Kim Right now, the way it's done is that there are multiple memory copies.
Roy_Kim From HCA to user memory, to kernel memory, to user memory, then to GPU.
Roy_Kim With GPU direct technology, we cut out the kernel memory copy so GPU and HCA share the same user memory.
Roy_Kim Namely, it's a "pinned" memory.
Roy_Kim Many, many HPC customers are excited about this feature.
Roy_Kim You can imagine that IB HCAs are not the only technology we are interested in reducing data latency to.
Roy_Kim @trentg: On GAMESS, I have more detail. It supports multi-GPU in a single node, but not GPUs in multiple nodes.
trentg Noted. Thanks Roy.
Roy_Kim Let me throw out a question just to understand. Are most of you familiar with CUDA or GPGPU?
Dell-KongY @Roy: For the benefit of our community, can you please explain CUDA and GPGPU?
trentg Yep.
Roy_Kim Yes. GPGPU is a new computing platform on GPUs.
Roy_Kim It's a co-processor in a system. CPUs are used for serial tasks and GPUs are great for data-parallel tasks
Roy_Kim CUDA is the software architecture on top of our GPUs.
Roy_Kim CUDA is the most widely used parallel programming language in the world.
Roy_Kim The innovation in CUDA is that it basically looks like C programming.
Roy_Kim And now, we support C++.
Roy_Kim So, from your perspective, it's like coding in C or C++ and getting 5x to 10x more FLOPS.
Roy_Kim Then when you get into certain apps, performance increase can be significantly more.
trentg @Roy: Can you comment on the current role of OpenCL?
Roy_Kim @trent: That's a good question. Just for everyone, OpenCL is a "open source" platform to code on GPUs.
Roy_Kim It's another option to CUDA.
Roy_Kim And NVIDIA supports OpenCL.
Roy_Kim NVIDIA is one of the few vendors who has an OpenCL compliant platform. OpenCL is interesting though.
Roy_Kim While the benefit is supposed to be that you code once and run everywhere (ATI cards, IBM, etc.)…
Roy_Kim At the core, it's really difficult to do because coders have to optimize code for particular architecture.
Roy_Kim So OpenCL has benefits as well as drawbacks.
trentg So it sounds like you'd steer folks toward CUDA instead of OpenCL on current NVIDIA GPUs?
Roy_Kim Well, yes, for the most part. Depends on what you are looking for.
Roy_Kim CUDA will always give more performance.
Roy_Kim But NVIDIA will always support OpenCL.
Roy_Kim And our commitment in OpenCL is that we want to be leaders, not followers, so we will continually innovate on OpenCL.
Roy_Kim But it's difficult to be at par with CUDA because OpenCL is run by a industry committee. And it takes a long time to get anything through there.
Roy_Kim @ trent: I believe you are interested in LAMMPS and GAMESS.
Roy_Kim In that case, they are already ported over to CUDA, so you don't have to worry about these details.
trentg Right. Unfortunately, other codes aren't yet.
trentg CUDA seems like the logical choice for the other stuff needing work, as far I can tell.
Roy_Kim @trent: What other codes are you interested in?
Roy_Kim I ask because molecular dynamics is one of the most advanced areas for GPUs.
trentg We have a few in-house codes and some off the shelf stuff like Abaqus, MOLPRO, LAMMPS, Gaussian, VASP, DL POLY.
trentg Yeah, I saw that MOLPRO is supposed to have some GPU support now.
Roy_Kim I can't say officially, but don't be surprised if some of the major CAE ISVs, like SIMULIA, are working on GPUs.
Roy_Kim J
Roy_Kim SIMULIA does Abaqus. FYI.
trentg Right.
trentg Cool.
Dell-KongY We are coming to the end of the hour.
Dell-KongY I'd like to thank our featured guests, Roy, Chris, Brandon and Travis.