Join us at Super Computing 2011!We invite you to visit us at the SC2011 conference in Seattle, Nov 14-17 at Booth #2040 See first-hand how we are enabling research discovery with Dell HPC solutions.
We will now discuss some of the key components that need high reliability in HPC systems. Let’s start with the distribution of power to HPC systems and subsequently to the individual components.HPC systems by nature require large quantities of power. The details of power generation and distribution to a data center are not covered here; we start with a look at power once it arrives at a data center.In order to provide continuous power without interruption, it is necessary to implement solutions to provide power in the event of an outage, such as backup generators and battery power. The extent to which a data center should implement multiple generators and redundant battery power is generally left up to business criticality of the HPC systems involved.Once we get a continuous power feed into a data center it is important to deliver this feed over multiple Power Distribution Units (PDUs). Assuming we architect the power delivery correctly, the loss of one PDU will not cause an HPC system outage.Once we have multiple PDU feeds into an HPC system, we should be careful in how power is delivered to each individual component. Those components that need high reliability, as discussed earlier, should be configured with multiple power supplies. In this manner, we deliver the multiple power feeds to the multiple power supplies. Never should we use a single power feed from an individual PDU to deliver power to multiple power supplies for a single subsystem. Doing so would cause an outage of the subsystem with the failure of only a single PDU (despite the fact that the component may have redundant power supplies). It is also advisable to configure components with hot-swappable power supplies, so that failed supplies can be replaced without interruption of service.Next Up… COOLING-- Blake Gonzales, Dell HPC ScientistSee other posts from this Blog series:INTROSMPCLUSTERED SYSTEMSCLUSTERED SYSTEM INFRASTRUCTURE