New technologies appear almost daily in today’s world.  Most don’t make it long term.  The technologies that survive solve problems for the user.  Some technologies are sold off, absorbed, and never see a user again.  This can leave the user bewildered and angry looking for a replacement.  Technologies can be used for a while and accepted, but then are placed on a shelf forgotten waiting for revitalization.  It is this type that I wish to discuss.  But first let’s describe the problem as seen from a special user.

Image from ATLAS Experiment. Caption: Candidate Higgs decay to four electrons recorded by ATLAS in 2012.

The ATLAS experiment which is part of the Large Hadron Collider (LHC) is collecting data at an enormous rate.  About 3 Petabytes (PB) a year, and they will be collecting it for 20 years plus.  They also have a requirement that the data be available for years to come.  This directive is meant for new research ideas and techniques.  The data then should be near line to be re-analyzed if needed in the future.  The collected data is distributed around the globe using a worldwide grid.  The grid is a construct implemented by the LHC community members.  However, each member is responsible for their own implementation.  There are exceptions, but most members have a specific task in the search for new physics.  Breaking this into components and techniques of how this is done and who has responsibility is beyond this scope of this blog.   For storage which is the scope of this blog, it can be summarized as high speed data for simulation and reconstruction, high throughput write once and read many times data for analysis, and long term near line data.  This of course is a classic definition for Tier storage.

Now imagine that you are a large user of this data. Let’s say that you have 15PB on hand and you are responsible for analysis, simulation, and reconstruction using ATLAS data.  You have 2 versions of Scientific Linux, 4 types of storage software (dCache, ext3, NFS, and HPS), and 6 different CERN Openlab software version’s to make your storage tier work.  It really doesn’t matter what hardware you have on the floor, but it is tons of disks and tapes.  I can see why meetings are held at the Pub.  This is a severe problem in my head, and wouldn’t it be great if the LHC community had a better technology.  Something that is easy to manage, scales up, scales out, fast, loves any version of Linux, capable of high throughput, and can manage tape drives.  Well, maybe soon, there will be.

Last August, I was invited to work on a whitepaper that may provide this type of solution using Dell servers and storage.  Some very smart guys and girls at Clemson University decided to pull Parallel Virtual File System (PVFS) off the shelf.  They have been busy revitalizing it for the 21st century.  They call it OrangeFS.  OrangeFS is currently at version 2.0 will do most of the items described above.  However, many improvements are planned for 3.0 including tape drive drivers for the management of tape system. This addition will make OrangeFS a complete tier system.  Is it the long awaited storage management solution that the LHC community and/or HPC in general have been waiting for?  Let’s step back for a very quick look at the technical claims.

Metadata is distributed across the storage system.  When a new user does an “ls” on the whole system, OrangeFS should not crash like some HPC file systems.  It is supposed to handle small files better than other HPC file systems.  It is open source and free as a bird, but what about support.  Clemson has a support service in place called Omnibond.  I understand that there is a cost for this and needs further understanding from yours truly.  The user can define any type of Tier in the upcoming version 3.  The most interesting claim to me, after getting burned more than once by sold code mentioned above, is that this is Clemson’s code.  They are not a company that can be bought or sold.  This screams stability to me which is a rare thing indeed. 

With the claims aside, I see OrangeFS as a large box as opposed to the triangle description used so often---high speed storage on top, NFS in the middle, tape on the bottom or something similar.  I use the box description because the same file system can be used in the entire environment. If you need more high speed storage for a week just define it.  Need more capacity in the middle, just add it.  I don’t know if or when OrangeFS will move to the mainstream.  But I would like to look back I know that I gave it a good push.  Please follow the link to the Clemson whitepaper below.

http://content.dell.com/us/en/enterprise/d/business~solutions~engineering-docs~en/Documents~orange-fs-reference-architecture.pdf.aspx