Dr. Jeff Layton
Dell's Dr. Jeff Layton

Every SC (Supercomputing Conference) is very interesting and this year’s conference, SC10, (http://sc10.supercomputing.org) will be no different. Next week in New Orleans there will be over 10,000 of the world’s smartest people who are all focused on talking about, learning about, and using computation to solve some of the most difficult and largest problem in the world. This just isn’t any computation but Super Computing (i.e. really huge amounts of computation).

If you haven’t been before, SC is a great place to learn about HPC. There is a HUGE number of Ph.D.’s at the show and they really enjoy talking about what they do and they are eager to help you learn about HPC. There is great deal of variety of people there and a great number of interesting stories about people using HPC to solve interesting problems. You can see people using HPC to design diapers, skin lotions, and detergent containers. You can see HPC being used to decode genomes of humans, plants, and animals. You can see HPC to design and test new drugs and to screen for cancers in MRI’s and CAT Scans. You can see people using HPC to analyze the depths of matter to look back in time at how the universe started. You can see people using HPC to predict the weather, predict traffic patterns during emergencies, analyze financial bond risk, and design aircraft, cars, golf clubs, and tennis rackets. You can rest assured that HPC touches your life in many, many ways and you aren’t even aware of it happening. SC10 is a chance to learn more about HPC and how you can use it in your life or how it affects your life.

Usually at expos you only see the vendors, but SC is a bit different because you will see both HPC vendors and their customers on the same show floor. The consumes or users of HPC will be on the same expo floor because the HPC community is all about solving problems and many times this is a joint collaboration. Plus there is strong feedback from HPC consumers to the vendors to further push the boundaries of what technology is doing. Consequently, having both vendors and customers on the same floor makes perfect sense in the HPC world.

It’s a great show to learn about the very bleeding edge of what people are doing with HPC. From the very largest supercomputers in the world that are faster than 100,000 laptops, to people solving critical and unique problems that cannot be solved any other way than by computation, SC10 will be a wonderful opportunity for you to learn more about HPC or to learn about HPC technology and where it is now, and where it’s headed.

In this blog I want to start by taking a quick look at what Dell will be showing in their booth (and we have a lot of cool stuff). My apologies if this sounds a little too much like marketing, but I tend to get excited about our new products. Then I want to end the blog with some predictions or general trends you’ll see at the show.

Dell and GPU Computing

Dell HPC has a great booth at SC10, with a number of different technologies. But perhaps more importantly, we’ll be talking about and showing, how people are using these technologies to solve problems and make discoveries.

We have a very long list of speakers in our booth from some very prestigious universities and you can find the agenda at http://dell.to/cAKKau. The list is long but includes a number of presenters from the US and Europe, illustrating Dell’s commitment to the HPC world. You will hear about topics such as visualization of very large data sets (University of Texas/Texas Advanced Computing Center) as well as how researchers are using HPC to improve the quality of their research or even enable their research. Dell is working with researchers world-wide to enable them to perform research they couldn’t have previously, or to address problems that have never even been tackled before. Michael Dell has committed to supporting research world-wide using HPC and seeing the scope of research institutions that Dell is collaborating with proves that Dell is in the HPC world to stay.

One of the significant technologies you can see and learn about in the Dell booth is about GPUs. In the past, GPUs were used just for displaying images. But over time, these chips gained the ability to be programmed and became increasing powerful compute engines. People wanted faster and better graphics in their game consoles and their computers so GPU companies such as Nvidia and ATI developed increasingly faster and more powerful GPUs. At one point, GPUs gained the ability to be programmed so that users could get more sophisticated images and effects from their games (let’s face it - people didn’t need that capability for displaying spreadsheets). Then just a few years ago, people realized that these chips had the capability of running computational programs that computed numbers not just images. At that point in time, people began writing HPC applications that ran not just on the CPUs, but on the GPUs as well.

GPUs hold the promise of a massive increase in computational capability. People are reporting speed increases of 10 times to 100+ times. That means that they can run their applications 10 times faster than they could before, allowing them to tackle problems that they never even considered because they took too long. But to reach these levels of performance some work is needed (TANSTAAFL - http://en.wikipedia.org/wiki/TANSTAAFL).

The first thing step in harnessing this computational power is that your algorithm needs to be amenable to GPUs. GPUs are a bit different than CPUs so you have to rethink your algorithm a bit (but just keeping whispering to yourself – “100x, 100x”). Next, you need to rewrite your code. This is one of the hardest steps since GPUs and their programming tools are relatively young but people are doing it every day and the tools are getting better. They are taking their applications and rewriting them for GPUs with the aim of greatly improving the performance of their applications. But rewriting code can be a long and difficult process. Is it worth it? If you get 100x or 10x improvement in performance, the answer is pretty obvious. But at the very least you should be reading and learning about GPUs since the concepts are making their way into future CPUs (http://www.intel.com/pressroom/archive/releases/20100531comp.htm).

Dell has developed GPU solutions for a variety of needs from the development and writing of GPU applications, to production ready systems for running production GPU applications and will be showing these at SC10. In particular, Dell will be showing the Dell T7500 Precision Workstation that can accommodate GPUs inside the chassis. (http://www.dell.com/us/en/enterprise/desktops/workstation-precision-t7500/pd.aspx?refid=workstation-precision-t7500&s=biz&cs=555)

Figure 1 – Dell T7500 Precision Workstation

If you come by the Dell booth you can see this workstation as well as Dell’s mobile workstations (laptops with SERIOUS horsepower). Dell’s Precision line is great for enabling developers to write GPU applications in their office, home, or, if they really want to, at the beach (you’ll have to ask a HPC geek about developing at the beach – it really happens!).

Once the applications are ready to be rolled into the data center, Dell has production ready systems GPU solutions for you as well. Dell also recently introduced a new blade that accommodates GPUs. The Dell PowerEdge M610x blade (http://www.dell.com/us/en/enterprise/servers/poweredge-m610x/pd.aspx?refid=poweredge-m610x&s=biz&cs=555) has the ability to accommodate to GPUs internally.
Dell PowerEdge M610x blade
Figure 2 – Dell PowerEdge M610x blade

The Dell M610x gives you all the benefits of blade such as very efficient power and cooling as well as fantastic management capabilities including plug-n-play where you slide the blade into the chassis and it is automatically configured. You can also install extremely powerful storage devices such as Fusion-IO PCIe based SSD cards in the M610x.

The third GPU product Dell will be showing is the PowerEdge C410x (http://www.dell.com/content/products/productdetails.aspx/poweredge-c410x?c=us&l=en&s=biz). This unit has been very well received by anyone and everyone doing production GPU computing because it gives you a massive amount of GPU capability with extreme density. This chassis serves as “room and board” for GPU, providing power, cooling, and management, along with the ability to connect to servers, but it doesn’t contain any networking nor CPUs.

Dell PowerEdge C410x – front view showing 10 of the slots
Figure 3 – Dell PowerEdge C410x – front view showing 10 of the slots

The C410x is a 3U chassis that can accommodate up to 16 GPUs. It has up to 8 PCI Express (PCIe) connections to connect to servers. You can use all 8 connections or only 4 of them depending upon your configuration.

The C410x is designed for flexibility. Since production GPUs applications are just being made available and people are still rapidly writing new ones, the “best” configuration of GPUs for computation is still unknown. Dell has designed the C410x so that it can adapt to the needs of the applications and the developers. If you need 1 GPU per server, you can do that with the C410x. Two GPUs – no problem. Three GPUs – sure thing. Four GPUs – no sweat. Right now, you can deploy up to 4 GPUs per PCIe connection using the C410x.

If you want to learn more about this amazing new technology stop by the Dell booth where we will have one on display that you can touch and examine (“just feel the electrons!”). If Dell wasn’t serious about HPC, why would we develop the most dense GPU platform in the world?

Dell Servers

GPUs are great at computation but at this time they aren’t general purpose enough to run an entire system alone. They need a host system with CPUs, networking, and storage to be effective. In addition, GPU applications are just becoming more widespread so servers that use CPUs as the main source of computation are still the primary source of HPC capability today.

Dell will be displaying several servers in the booth at SC10. We will be showing our newest blades in the booth and you will be able to pop the cover and examine them yourself.

We will also be showing the latest rack servers from Dell including our PowerEdge C line of rack based servers. In particular we will be showing the Dell PowerEdge C6100 and C6105 that contains four independent servers for both Intel (C6100 - http://www.dell.com/us/en/enterprise/servers/poweredge-c6100/pd.aspx?refid=poweredge-c6100&s=biz&cs=555) and AMD (C6105 - http://www.dell.com/us/en/enterprise/servers/poweredge-c6105/pd.aspx?refid=poweredge-c6105&s=biz&cs=555).

Dell PowerEdge C6100
Figure 5 – Dell PowerEdge C6100

These servers have remarkable density and connectivity. For example, each of the four systems in the C6100 has a dual-port QDR InfiniBand card connected to a PCIe x8 slot and a PCIe x16 slot that can be used to connect to the Dell PowerEdge C410x.

The Dell booth will have all of this hardware available for you to learn about, touch, and examine.

Dell Storage

Arguably one of the thorniest problems in HPC is data storage. Researchers need both very high performance from their storage and they need lots of it. Just a year ago it was somewhat uncommon to hear people discuss 1 PB (Petabyte which is 1,000 Terabytes) of storage. There were few HPC systems in the world with that much storage and only a slightly larger number of people talking about needing that much storage. Today, researchers with systems as small as 32 nodes routinely talk about 1 PB of storage.

A source of the explosion in data storage is sequencing data. The human genome was first sequenced several years ago by The Human Genome Project (http://en.wikipedia.org/wiki/Human_Genome_Project). The project started in 1990 and released the first draft of the genome in 2000 with a complete sequence in 2003. Overall the Human Genome Project cost about $10 Billion. Today, just 7 years later, we can sequence a human genome for under $40,000 and we can do it in a few days. Moreover, these sequencers only cost about $100,000 so research institutions have a large number of them. Each sequencer can generate several TB’s (Terabytes) of data in a single sequence. A sequence instrument can then generate about 10 TBs of data per month and research institutions can have up to 100 of these instruments. This means it is possible to generate up to 1 PB of data per month!!!

In some ways we are victim of our own success in being able to generate so much data. We have very large systems with performance over 1 PFLOPS (PetaFLOPS) and people are deploying GPUs for computation that increases application performance by 10x to 100x. This means we are generating data faster than ever before. However, what is also important is that these applications run so fast that I/O can easily become a bottleneck (there goes your 10x or 100x speedup). So researchers need storage that is faster than ever before so that I/O is not a bottleneck.

Dell has been working on new storage products oriented toward the needs of HPC (http://content.dell.com/us/en/enterprise/hpcc-storage.aspx). Specifically, Dell has partnered with Terascala (http://www.terascala.com) to make Lustre a much easier to use file system.

Lustre has developed a reputation for being brittle and sometimes difficult to work with. Dell and Terascala have partnered to develop a Lustre based appliance called the Dell | Terascala HPC Storage Solution (http://content.dell.com/us/en/enterprise/d/hpcc/Storage_Lustre.aspx) that makes deploying Lustre in HPC a much easier process.

Dell | Terascala HPC Storage Solution
Figure 6 – Dell | Terascala HPC Storage Solution

The current version combines the reliability, cost-effectiveness, and performance of Dell PowerVault Storage with Terascala’s Lustre control nodes, Lustre software stack, and management software. The system has been architected so that the Lustre Metadata Server (MDS) is redundant in an HA (active-passive) configuration and the Lustre Object Storage Servers (OSS) nodes are also HA redundant (active-active) to bring as much stability and resiliency as possible to Lustre storage solutions. Plus the Terascala management console allows much easier management and monitoring of your Lustre storage solution.

To make your job easier, Dell has created several standard configurations from 48TB’s to 336 TB’s all within a single rack that are priced to include 3 years of support as well as installation. Performance is also very good with each OSS pair (active-active) achieving up to 2.6 GB/s in throughput over QDR InfiniBand.

Dell HPC Storage has also adopted the strategy that if you want to just buy Dell hardware and use it for storage yourself, which we call the “roll-it-your-own” option, then you are welcome to do that. In fact to help you, Dell is developing a number of whitepapers to explain how to configure Dell hardware and the file system for good performance. The first example of this is a whitepaper from the Dell Cambridge HPC Solution Centre on how to use Dell hardware to construct a Lustre storage solution (http://content.dell.com/us/en/enterprise/d/shared-content~solutions~en/Documents~lustre-storage-brick-white-paper.pdf.aspx).

In addition to I/O performance another key consideration is data reliability and ease of use. Dell has developed an HPC storage solution that addresses this aspect of storage while keeping the cost very low. The solution, called the Dell HPC NFS Storage Solution (NSS - http://content.dell.com/us/en/enterprise/d/hpcc/storage-dell-nss.aspx) takes standard Dell servers and storage units to create a truly open storage solution. The NSS uses Redhat Enterprise Linux in combination with Redhat’s Scalable Storage Solution that is XFS based. However, what Dell has done is to develop a set of best practices to improve performance from the storage. You can think of this as a “tuned” NFS server configuration. In our studies, we get about 30% more performance from a “tuned” configuration to one that just uses XFS fairly standard out of the box.

Dell HPC NFS Storage Solution (NSS)
Figure 7 – Dell HPC NFS Storage Solution (NSS)

We offer three standard configurations (S, M, and L) that have varying capacities, and we also offer the option of a 10GigE interface or an InfiniBand interface (IPoIB) but you are welcome to use just a single GigE interface if desired. The peak performance we have achieved with a single IB connection from the NSS is about 1.5 GB/s when using multiple NFS clients (using IPoIB). This is using NFSv3 but you are welcome to use NFSv4 but we have not tested it to determine its performance.

If you need SMB or CIFS connectivity, you can install and configure Samba on the NSS gateway node. Similarly, we have not tested the performance or how to tune Samba properly but it has been tested for basic functionality.

Finally, we deliver NSS complete with 3 years of 4 hour support for the hardware and software including the file system. Dell can install it with your cluster or you can install it yourself.

In keeping with Dell’s HPC storage strategy of offering fully supported storage solutions or “roll-it-your-own” solutions, Dell has published a whitepaper describing the tuning options we made on the NSS to boost performance (http://content.dell.com/us/en/enterprise/d/business~solutions~hpcc~en/Documents~Dell-NSS-NFS-Storage-solution-final.pdf.aspx).

Overall trends:

While SC10 hasn’t really started yet, I’m already starting to see some trends from early announcements. The first trend, which I already mentioned, is around GPUs.

While SC09, last year’s SC conference, showed some early GPU based solutions, they were fairly early and people were still waiting to see the general direction that GPUs were taking as well as porting applications to use GPUs. Now since applications are becoming more widespread, we are starting to see movement toward production level GPU systems. Dell will be showing our GPU solutions including our Precision workstations and mobile workstations, our blade based GPU solutions (the M610x) and our ultra-dense GPU chassis, the C410x. Be sure to stop by the booth and see these solution and how they might fit your needs (BTW – no one has a denser GPU solution than the C410x!).

The second trend is the huge need for storage within HPC. GPUs have given us the ability to generate massive amounts of data at an unprecedented rate while sequencing instruments are producing data at a fantastic rate as well. In addition, the HPC industry is seeing a need for very high performance storage solutions that need to keep up with our computational ability. Keep an eye out for Dell’s HPC storage solutions which you can see in our booth. We’ll also have the ability to run remote tests during the show if you want to see some performance numbers and you’ll see some pretty cool storage solutions with the stability of Dell behind them.

One more cool thing you can see in the Dell booth is our “cloud” station. Cloud is a word that used to refer to the puffy things floating in the sky. Now, cloud is either a savior for the IT world or an over-hyped word that has lost all meaning. But the concepts embodied in “Cloud Computing” are real enough. In the Dell booth we will be talking about Cloud Computing concepts and how they apply to HPC. If you want to influence Dell’s direction in Cloud Computing in HPC or want to participate in some real-world testing, or you just want to learn about Cloud Computing in HPC, be sure to stop by the Dell booth. We have some pretty cool ideas about how to integrate the “Cloud” into HPC.

One obvious conclusion you can draw from this rather long blog, is that Dell is in HPC to stay. We’re not leaving HPC and Michael Dell has stated that Dell is committed to HPC. If you look at the innovative compute and storage solutions Dell has developed coupled with our efforts to bring Cloud Computing to a practical level in HPC, it’s blatantly obvious that Dell is in HPC and we are here to help our customers solve their HPC problems.

Be sure to stop by the booth and see the great things Dell has been doing in HPC.