Dr. Jeff LaytonThere was a recent article in Scientific Computing about scripting languages in HPC (http://www.scientificcomputing.com/articles-HPC-From-Scripting-to-Scaling-041210.aspx). The article talks about the rise in popularity of scripting languages for solving problems and why scripting is perhaps a good idea for researchers/scientists (I’ll just called them HPC-er’s). This article set off a great discussion within the HPC team at Dell especially myself and Glen Otero (http://www.delltechcenter.com/page/Science+and+Silicon%3A+Smarter+Conversations). So we conspired to tag-team a blog series on Python (www.python.org) in HPC. What tools are available for Python in HPC? How can you combine conventional compiled languages with Python for HPC applications? How can you write Python applications for HPC? And how can you combine Python and GPUs in HPC?

We chose Python because we both have written code with it and are reasonably comfortable with the language. Perhaps more importantly, there are a huge number of tools, libraries, add-ons, etc. for Python, which make it attractive for HPC-er’s who are looking to put something together very quickly and simply. We aren’t saying that Python is the end-all, be-all of languages – we are just using it as an example of a scripting eco-system that has great potential for HPC applications.

I’m going to start off the series with a quick overview of Python and highlight some of the tools/extensions/add-ons that are available. Glen is going to covering his favorite tools in Python and some aspects of parallel programming in Python and I will chime in as well. Then Glen and I will divide the hot world of GPU computing and he will cover CUDA and Python and I will cover OpenCL and Python. But overall we want to present how you can use Python, as the representative scripting tool base, in HPC.

This series should be fun. If you are a Python coder you may learn something new about how to apply Python to HPC. If you aren’t a Python coder then maybe you will learn about what the possibilities are and can apply them to your scripting language or tool set or perhaps even switch over to the dark side with me and Luke… I mean Glen.

Quick Introduction to Python:
If you haven’t heard about Python, the one thing you have to remember is that it is named after Monty Python (http://en.wikipedia.org/wiki/Monty_Python%27s_Flying_Circus) and not the snake (that seems to be the #1 question for people new to the language). With that behind us, let’s take a brief look at Python.

This introduction is not intended to be complete by any stretch of the imagination. There are many introductions and tutorials and books around Python. But to make sure we’re all on the same page, I’ll point some features (and quirks) of Python.

Python is a very popular language that has a very clear and easy to use syntax. It has a very large library of functions and a VERY large number of add-ons, applications, etc. that can be interfaced with Python. It is an interpreted language, for the most part, with some attempts at a true compiler, but nothing that has developed popularity (yet).

Python syntax is fairly clean but one of the quirks of the language is that instead of using curly braces, { }, to denote blocks of code, you use spaces. For example, for an if statement or a loop, you will use spaces (whitespace) to denote what code is in the code block and what isn’t. For example,

g = 0.0;
if (i > 0):
aIdea = (c*b)/a;
d = aIdea/1.14159;
aIdea = aIdea*d;
g = aIdea;
f = g;

The last line in the code snippet, f = g, is not indented relative to the code above it. So it’s not considered part of the “if” block.

Python has the range of typical operators one would expect in a language:

  • if statements (including if/then/else/elseif)
  • for statements (looping)
  • while statements (more looping)
  • try statements
  • functions or subroutines (also methods)
  • global (for creating global variables)

Python is also object-oriented so can create classes and objects and instantiate them, etc.

Python has a range of data types as well:

  • str (string)
  • bytes
  • list (basically arrays but can contain mixed data types)
  • tuple (another type of array)
  • set (another type of array)
  • dict (a dictionary that has “key” and “Value” pairs)
  • int (integer)
  • float (floating-point variable)
  • complex (one of the few languages beside Fortran where a native data type is complex!)
  • bool (Boolean value)
  • long (fixed precision number of unlimited magnitude)

With these data types you can create all kinds of data structures – pretty much anything you can imagine including multi-dimensional arrays, trees, heaps, hash data structures, linked-lists, and on and on.

With Python there are a number of built-in “methods” that allow you to manipulate the data structures. If you aren’t into object-oriented programming, these are basically functions that operate on the data. For example, we can add data to a list very easily.

a = []; # define “a” to be an array


The “method” is “append” which is basically adding the data to the end of the list (or array). Also notice that the pound symbol (#) is a comment.

Python also comes with a shell for entering commands (think of it as “bash” or “csh” but it’s devoted solely to Python). The classic shell is called idle. But if you are in Linux, you can type the command Python and it will start up. The command prompt is typically “>>>” or three arrows. At the prompt you can type in Python commands and idle will execute them (pretty easy to do).

The alternative is to create a “script” that can be executed. You create the script in an editor (vi or emacs for example) and save the script to a file (I typically name the file “something.py” where “something” is the name of the file and I end it with a “.py” so that I know the file is a Python script (you don’t have to do this but I recommend some way of tracking the scripts). Then at the top of the script you put something like the following:


This assumes that your python executable (really the idle shell) has the path, /usr/bin/python. Then you make the script executable and execute it,

# chmod 770 something.py
# ./something.py

(Before anyone brings in security concerns about changing the mode of the file, I understand. If you have some notes on how to do this securely, please post it. Until then I’m trying to make things easy).

That’s it! Pretty easy to create Python scripts and run them.

Python Add-ons/Extensions:
For HPC-er’s doing research or science with HPC, you might be interested in the wide range of add-ons for Python. I’m not going to list all of them (that’s way too long), but I will focus on a few of the more important ones.

NumPy: (http://numpy.scipy.org/)
NumPy are a set of extensions for Python that allow the creation and manipulation of multi-dimensional arrays or matrices. It includes a large number of functions that can operate on these objects which are arrays. Here’s a small snippet from the Numpy tutorial for a 2-D array.

>>> b = array( [ (1.5,2,3), (4,5,6) ] )
>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])

Note that “>>>” is the prompt for the Python interpreter. In addition to being able to create array object there is a whole set of array functions available (far too numerous to list here).

SciPy (http://www.scipy.org)
SciPy is an Open Source library of scientific tools for Python, primarily using Numpy. It is probably THE key source of scientific tools for Python and has produced a wealth of tools and papers for Python. In addition, they sponsor a conference around scientific Python, appropriately called Scipy.

The Scipy website also has a great set of links to other tools for Python: http://www.scipy.org/Topical_Software.

Matplotlib: (http://matplotlib.sourceforge.net/)
In addition to just performing computation, there is also a definite need for plotting results. Python has a very large number of plotting add-ons (here’s a partial list: http://www.scipy.org/Topical_Software#head-b98ffdb309ccce4e4504a25ea75b5c806e4897b6). One of the most popular packages is called matplotlib that produces very good 2D plots. From the website, “Matplotlib can be used in python scripts, the python and ipython shell (ala’ matlab or mathematica), web application servers, and six graphical user interface toolkits.” Here’s a sample image from the website of a very simple plot,

matplotlib chart

Lots of great capability with matplotlib.

Mayavi2: (http://code.enthought.com/projects/mayavi/#Mayavi2)
There aren’t too many 3D plotting libraries for Python but there are some in development. In the meantime, there are tools that allow you to import data and then manipulate it and visualize it. One of the more popular tools is called Mayavi2. It is written in Python and has Python scripting capability.

Veusz: (http://home.gna.org/veusz/)
There are also tools to manipulate data and create 2D plots. Veusz is written in Python and is also very scriptable. It can produce some really great output and allows you to manipulate the data.

H5py: (http://h5py.alfven.org/)
This is an interface library that allows Python to interface to HDF5 libraries. It can read, write, and manipulate HDF5 data files.

Netcdf4-python: (http://code.google.com/p/netcdf4-python/)
There are several interface libraries to netCDF data files. One is netcfd4-python. It can read and write files in netCDF 4 and netCDF 3 formats.

There are also ways to integrate Python code with code from other languages (e.g. C++, Fortran, etc.). One approach is to identify parts of your code that need extra speed beyond what Python can provide and then code these parts in a compiled language such as C++ or Fortran. Some examples are below:

Boost.Python: (http://www.boost.org/doc/libs/1_43_0/libs/python/doc/index.html)
It is a C++ library that allows interfacing between C++ and Python.

F2PY: (http://cens.ioc.ee/projects/f2py2e/)
This is a library to allow you to interface your Fortran code with Python. It used to be separate but now it’s part of Numpy (http://www.scipy.org/F2py).

Weave: (http://www.scipy.org/Weave)
Weave allows you to call C/C++ code from the middle of Python code. It too is a package within Scipy.

Pyrex: (http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/)
Pyrex allows you to write code that mixes Python and C data types and then it compiles it into a C extension for Python (i.e. you can call it as any other Python function).

If you are interested in how these interfaces can be used, please read a great example of writing a simple finite difference code in pure Python and then using these interfaces to improve performance. (http://www.scipy.org/PerformancePython)

Another set of libraries for Python is all around parallel and distributed programming (http://www.scipy.org/Topical_Software#head-cf472934357fda4558aafdf558a977c4d59baecb). Here are some selected packages or capabilities.

PyMPI: (http://sourceforge.net/projects/pympi/)
While this package hasn’t seen too much activity it is still available for use.

MPI4Py: (http://mpi4py.scipy.org/)
This is one of the more active MPI projects for Python. If you have written MPI code before then you should be very comfortable with this package.

PyPar: (http://code.google.com/p/pypar/)
This library allows you to write parallel distributed Python code. It is somewhat like MPI but it allows you to send Python objects over the network.

IPython: (http://ipython.scipy.org/moin/)
This is a Python shell that allows you to run Python code on other systems (distributed Python if you will). It is still under active development but shows great promise).

ParallelPython: (http://www.parallelpython.com/)
This is a python module that allows you to execute Python code on SMP or cluster systems.

Parting Words
I’m not trying to convince you that you should abandon your C and Fortran code and start writing everything in Python. However, Python, or almost any “scripting” language for that matter, can help people integrate and adapt other code into a cohesive whole. Glen likes to call it a “glue” language which is a great description but it is also a good language in its own right.

It a dynamically typed language that has a range of data types, including complex numbers (one of the few languages where a complex data type is part of the language). It has a very large library of functions and a VERY large set of extensions and add-ons that can help you quickly develop very sophisticated applications.

If you haven’t used Python before, take a look. It is well worth your time. If you have used it before, then stay tuned to the blog series and keep us honest and let us know your opinion!

-- Dr. Jeff Layton