By: Munira Hussain and Kevin Tubbs

 

The Intel Xeon Phi Coprocessor boosts and aggregates the parallel processing power of computation in a cluster. It is designed to expand on parallel programming model of Intel Xeon processors and benefit the applications that are able to scale. Similar to the processor on the systems that are cache coherent and share memory, the Intel Xeon Phi Coprocessor is SMP on a chip and connects to other devices in the system via the PCIe bus.

 

To install and configure Intel Xeon Phi Coprocessors, the administrator must install the Intel Manycore Platform Software Stack (MPSS) and provide initial configuration of all the coprocessors in a cluster. Installing and configuring a new piece of technology can be complex and time-consuming.  This blog provides detailed steps and best practices to get started with Xeon Phi.

Note that such a solution setup has also been simplified with the Bright Cluster Manager software. The drivers and software needed for Intel Xeon Phi are integrated in the software stack for ease of deployment and provisioning.

 

Pre- Install Configuration:

If the Xeon Phi coprocessor is detected by the hardware (“lspci”) but is not recognized by the Intel tools, confirm that the BIOS has the following setting enabled.

 Enable large BAR setting: Integrated Devices >> Memory I/O larger than 4GB in Bios Settings >> Enabled

This can also be done with Dell Deployment Tool Kit 4.2 or higher using the “syscfg” tool from the operating system.

>> /opt/dell/toolkit/bin/syscfg  -- MmioAbove4Gb=enable

 

Setting up Host Nodes:

Install the Host nodes with Bright Cluster Manager 6.1 that includes the Intel MPSS. The host nodes refer to the nodes to which the Intel Xeon Phi are connected via PCI slots.

 

Installation:

  1. Install Intel MPSS package on the compute node that owns the Intel Phi. Bright Cluster Manager packages the software for easy installation
  2. The main drivers and tools are bundled in rpm format in Bright Cluster Manager. These are extracted from Intel MPSS and made easy to deploy. Four main components: intel-mic-cross, intel-mic-driver, intel-mic-ofed, intel-mic-flash and intel-mic-runtime are installed on the host nodes.

    The k1om packages are meant to run inside the Intel Xeon Phi. The packages deployed are libgcrypt-k1o, slurm-client-k1om, strace-k1om, munge-k1om, libgpg-error-k1om

  3. Once the host node is installed, verify with “lspci” command on the OS to ensure that the hardware can detect the coprocessor.
  4. Load the module on the host : module add intel/mic/runtime/<2.x.version> which loads the modules and provides access.
  5. Update the bootloader and flash on the Intel Xeon Phi. This can be done for all the Intel Xeon Phi’s using Bright Cluster Manager rather than going to individual nodes.
    1. Stop the Intel mpss service before proceeding to flash the Phi.(service mpss stop) -- (From the CMGUI or cmsh)
    2. Reset the Intel Xeon Phi cards using the following parameters (If micctrl command is not available, you will need to load the modules as in step 4) ) micctrl –r –f –w  (this resets the Phi and then puts the Xeon Phi in wait stage after it resets) At the point the mic is in a blank state and ready to be updated with an image/firmware
    3. Update the respective boot loader and firmware image on the mic: The main drivers and tools are bundled in rpm format in Bright Cluster Manager. These are extracted from Intel MPSS and made easy to deploy. Four main components: intel-mic-cross, intel-mic-driver, intel-mic-ofed, intel-mic-flash and intel-mic-runtime are installed on the host nodes. The k1om packages are meant to run inside the Intel Xeon Phi. The packages deployed are libgcrypt-k1o, slurm-client-k1om, strace-k1om, munge-k1om, libgpg-error-k1om
    4. Once the Phi has been flashed, start the mpss service on the host/compute node. (service mpss start)
    5. Reboot the compute node. Make sure it is a power reset rather than an OS reboot.
  6. Once the host nodes comes up.
    1. a.Go into CMGUI and setup MIC nodes using the MIC setup wizard. Using the tool below you can easily attach the respective numbers of Intel Phi’s represented in each host node.
  7.  micflash -v -update –noreboot -device all  (this updates the smc bootloader for all Intel Phi attached to the specific host/compute node)

b. Then configure the network bridge IP that is between the host node and Intel Phi for communication purpose. Bright Cluster Manager tool automatically checks IP conflicts across the Intel Xeon Phi in the cluster that are connected.

c. This tool automatically creates and assigns IP to the Intel Xeon Phi. At this stage the Intel Xeon Phi can now be recognized and managed/monitored from the Bright Cluster Manager.

Dell’s HPC offerings with the Xeon Phi are supported on the PowerEdge R720 and C8220x servers.  More info:

Dell HPC Solutions: http://www.dellhpcsolutions.com/

Bright Computing: http://info.brightcomputing.com/intel-xeon-phi/

Intel Corporation: http://www.dellhpcsolutions.com/dellhpcsolutions/static/XeonPhi.html