Anil Maurya
By Anil Maurya


High Performance Computing Clusters (HPCC) are a very popular type of supercomputer and it is used for solving large problems using parallelization. Dell has Linux-based HPCC solutions offering. Deploying HPC is a comprehensive process. This document provides a simplified method for the installation or updating of drivers on a cluster with heterogeneous platforms. Below is a proposed methodology where a user can use Dell Lifecycle controller to provide the recommended drivers when the Linux based HPCC image is getting provisioned.


Dell Lifecycle Controller (LCC) is an onboard systems management device that is a part of the iDRAC express and above on the 11th generation Dell servers. The Lifecycle Controller (LCC) includes a 1GB managed and persistent storage that embeds systems management features like driver update, firmware update, RAID configuration etc. in addition to the iDRAC features. The LCC contains a driver repository for supported Operating Systems i.e. Windows, Redhat Linux, SuSe Linux etc. Remote Systems management is further enhanced with LCC as it completely eliminates the need for any media-based tools due to the integrated persistent storage. Users can further upgrade to iDRAC Enterprise and vFlash for advanced iDRAC features.

In a typical HPCC offering from Dell, the recommended drivers for each of the supported server is included in the HPC software package and installed during the initial deployment. However there are situations where the device drivers need to be updated during the lifetime of the cluster. For example a self-deployed HPC cluster will need consistent drivers across all servers in the cluster or newer versions of drivers that would provide critical bug fixes or enhanced performance. In order to obtain similar performance on all the compute nodes, the drivers and the firmware versions should be identical. By leveraging the LCC features, the administrator can have all the device drivers updates automated with a onetime configuration of the LCC.

This becomes easy for any Enterprise to easily update to the latest firmware or updated drivers available from http://support.dell.com to be installed in the setup. The manual mechanism to individually monitor updated firmware and drivers and then deploy, is avoided, as the process is now completely automated through LCC.
Administrator can access LCC

  • Locally by pressing F10 during server POST.
  • By using remote enablement tool and iDRAC IP address to access LCC remotely.

In order to utilize the LCC remotely, the following tools are available:

  1. OpenWSMAN: The OpenWSMan client is the WS-Management CLI that is part of the open-source project Openwsman. To download, build, install, and use the WS-Management CLI and OpenWSMan packages from sourceforge.net, see openwsman.org for download links.
  2. WinRM: This tool is Window based. WinRM package comes with Windows Vista, Windows Server 2003 R2, Windows Server 2008.

This blog provides instructions to remotely enable LCC using OpenWSMAN in Linux environment.
HPC architecture is based on the head nodes and compute nodes. For using LCC features in HPC deployment we require one to many implementation processes (Expose LCC Remotely). Install OpenWSMAN on the master node, and use this utility to access the LCC remotely using iDRAC IP address and logon credential.

Below is a schematic representation of a basic HPC cluster:
Schematic representation of a basic HPC cluster
Prerequisites:
The anaconda installer is LCC aware and the driver package for the respective operating system is available.
Servers have iDRAC express or above.

  1. IP source for iDRAC of the member nodes set to DHCP.
  2. Different DHCP scope for LOM interface and iDRAC interface.
  3. Master node has connectivity to ftp.dell.com or a local ftp repository mirror to download and maintain the driver-pack repository.
  4. Master node has openwsman suite installed.
  5. Master node also has remote racadm install.

Steps:
Steps for the leveraging Dell Life Cycle Controller features for new HPC cluster deployment.

1. Install master node.
2. Make the appropriate network connections to connect all the nodes in the cluster. Confirm that AC power is on for all nodes and Servers are not powered on.
3. Confirm that master node received DHCP requests from the iDRAC of all the member nodes. Create a database of the assigned IPs.
4. Once the database of the IP addresses is updated for the head node, the remote racadm utility detects the system type and name. Template of database is given here:
IP address Hostname Server Model
192.168.2.1 compute000 R710
5. Use appropriate logic to detect the first system for each type. For example in a cluster with 20xR610, 20xR710, 20xR715, the logic should pull out the first R610, R710, R715 so that by using openwsman tool we could expose one server LCC of each type server instead of all 20 servers and copy all the required drivers on master node, exposing one LCC on each type of server instead of all servers will save time, network traffic, maintains efforts.
6. Use openwsman tool to expose the LCC of the each identified system from the above step.
7. The openwsman tool unpacks the driver pack and copy the rpms nfs contrib folder on the master node.
#wsman invoke –a GetDriverPackInfo [This command list supported OS information]
#wsman invoke –a UnpackAndAttach [This command unpack driver and place it into OEMDRV device, LCC expose as OEMDRV device]
8. Now add the compute nodes using appropriate method. For Rocks+ use insert-ethers command, using post install script push all the dell drivers on respective server.

Commands:
1- Create a certificate:
#openssl s_client -connect 192.168.10.2:443
Dell Life Cycle Controller features for High Performance Computing Cluster Deployment - The Dell TechCenter

Dell Life Cycle Controller features for High Performance Computing Cluster Deployment - The Dell TechCenter

Dell Life Cycle Controller features for High Performance Computing Cluster Deployment - The Dell TechCenter

Dell Life Cycle Controller features for High Performance Computing Cluster Deployment - The Dell TechCenter

Dell Life Cycle Controller features for High Performance Computing Cluster Deployment - The Dell TechCenter

Dell Life Cycle Controller features for High Performance Computing Cluster Deployment - The Dell TechCenter

Dell Life Cycle Controller features for High Performance Computing Cluster Deployment - The Dell TechCenter


About Anil Maurya | Senior Development Engineer, Massively Scale-out Systems Team – BDC


Anil works with the HPC Solutions Engineering Team at the Bangalore Design Center. He has a Bachelor of Engineering degree in Information Technology from University of Rajasthan, India. He is a RHCE and CCNA and has extensive experience on Linux systems design and deployment. He has been with Dell for the last 1 year. Prior to Dell, Anil has worked with John Deere and Locuz Enterprise Solution Ltd.


References
[1] Integrated Dell™ Remote Access Controller 6 (iDRAC6) Version 1.7 User Guide http://support.dell.com/support/edocs/software/smdrac3/idrac/idrac17mono/en/ug/pdf/ug.pdf
[2] Life Cycle Controller 1.5:
http://support.dell.com/support/edocs/software/smusc/smlc/lc_1_5/remoteservices/en/index.htm
[3] DCIM_OSDeploymentService MOF
http://www.delltechcenter.com/page/DCIM.Library.MOF.DCIM_OSDeploymentService
http://support.dell.com/support/edocs/software/smusc/smlc/lc_1_2/en/ug/html/lc_osd.htm
[4] WinRM Scripting API, MSDN
http://msdn.microsoft.com/en-us/library/aa384469(VS.85).aspx
[5] DMTF Common Information Model (CIM) Infrastructure Specification (DSP0004), http://www.dmtf.org/standards/published_documents/DSP0004_2.5.0.pdf
Devil Installation and Configuration for Windows Remote Management
http://msdn.microsoft.com/en-us/library/aa384372(VS.85).aspx
[7] Clustercorp
http://www.rocksclusters.org/roll-documentation/base/5.4/roll-base-usersguide.pdf