In Polynesian culture, mana means life force or power. To say that someone has mana means that they have influence or authority. The Maui High Performance Computing Center increased its mana with a new 1,152 node cluster of PowerEdge 11G servers. The compute nodes are M610 blades connected together via an Infiniband DDR fabric. To my knowledge, this is the largest PowerEdge 11G cluster deployed by Dell to date.

Being an HPC engineer at Dell is not always glamorous. Usually they just keep me in the lab and poke me with a stick when they need a benchmark run. However, I was lucky enough to travel to Maui to assist with the deployment and benchmarking.

These were my impressions of the Maui deployment:

  1. The logistics involved in installing and testing a 1100+ node cluster are staggering. The Dell Deployment Services team did it in only 5 weeks with a team of 4 engineers. A contact at the DoD told us that clusters like this normally take other vendors 3 months to deploy with much larger teams. So kudos to the Delpoyment team.
  2. Bugs are a fact of life. Dealing with a cluster this large, 1% bugs show up 10+ times. Deploying a massive cluster gives us invaluable insight to the product quality, which I shared with the server platform engineering teams.
  3. That being said, I was impressed by the 11G server's quality. This graph shows the HPL performance for every node in the cluster running a seperate job simultaneously:
MHPCC Single HPL

HPL is a valuable tool for finding hardware problems because it fully exercises the CPU, RAM, and environmental capabilites of the cluster.

95% of the nodes (in red) performed at or above 90% efficiency.

Another 4% (in green) performed at 87% efficiency or higher -- within the standard deviation.

Finally, about 1% of the nodes (in blue) underperformed. Half of those were powered off during the run and the other half had minor issues.

This was the state of the cluster about 5 days after I arrived on the island. This graph shows that servers shipped overseas and were plugged into the data center without problems!

(Of course, we also fixed the problem nodes before we turned the keys over to the customer.)




Here are a few articles about MHPCC and the cluster:Jacob at Le Perouse

Finally, even though the cluster was located in paradise, it was a short trip, and I saw little of it. We did take one afternoon after cluster acceptance to visit Le Perouse Bay, a moonscape of dried lava rocks on Maui's south east coast.

Aloha and mahalo!