We all get old and as a result we slow down. What many people don’t know is that as file systems age, they get slower as well. Just as people are different, so are file systems. Some age gracefully and still perform well, while others age quickly and performance tanks very fast. This applies to any file system but somehow it goes unnoticed for parallel file systems in the HPCC world. In this blog I want to discuss this and warn people to carefully examine their options.

The Fundamental Issue

At the core of the decrease in performance as file systems age is something that many people forget about—fragmentation. When a file system is first formatted everything is nice, clean, and performs extremely well. Then you start putting files on the file system, creating them, deleting them, modifying them, moving them, etc. The sizes of the files can vary greatly from small input files of a few KB to large TB-sized files (I’m thinking HPCC parallel file system for this case). It’s even worse for bioinformatics applications that can have millions of very small (KB) files resulting in millions of files per TB of storage.

For younger file systems, they are able to store the data in an efficient manner as prescribed by the file system configuration. But as you delete files, add files, and modify them, where the files all vary in size, you end up with pieces of the file spread across a particular disk. In the case of a parallel file system, they can be spread across multiple storage devices in an unbalanced manner (i.e., some storage devices have more of the file than others). The result is that the file system will have to do extra work to provide the data to the application. Moreover, in the case of parallel file systems where data is not evenly distributed the results are “hot spots” in the file systems that are really bottlenecks to performance.

So it’s pretty evident that as file systems age performance will decrease. It’s a fact. But what you really want to know is how performance decreases with age.

Where’s My File System’s Walker?

It’s almost impossible to give exact numbers on how much performance will decrease for a particular file system because it depends on the usage details—size of files, how they are created and deleted, patterns of use, etc. But, what file system developers (and vendors) should be able to provide are some examples. This is where there is a complete lack of information on file systems.

In a prior blog, I pointed to a great article that discussed what happened to the Intel X25-M SSD as it was used over time. Basically the performance tanked. While it wasn’t really a file system issue, the fundamental causes of the performance degradation were exactly the same.

So how to you overcome the performance problem? (That is, give the file system a “walker”?) The answer is that for most file systems you really can’t do that. The only option is the nuclear option where you remove the data, reformat, and put the data back on the file system (try that with a few hundred TBs and try to finish before you retire). Or you could copy the data from an old storage system to a new one.

Some file systems, such as PanFS the Panasas file system, have great potential for helping reduce the impact of file system aging. Because PanFS puts the data into objects, it can move the data around the file system as needed. It could move the data off of a few drives, reformat the drives, and copy the objects back. The really cool thing is this process could happen while the file system is in use, and you wouldn’t even know it (and with no loss of data!). There are other really cool possibilities for PanFS because its object based.

Other than PanFS I don't see any file systems that have this capability. So what can you do?

What Can I Do?

Unfortunately, there is little you can do. File system aging will hit you, and you have to be ready for it. But here are some things I think you can do to get ready for this problem:

  • Don’t become wedded to a centralized storage system without a plan to move to something new in a few years.
  • Choose a file system that has great potential for alleviating the impact of file system on performance (the only one I know of today is PanFS).
  • Don’t put your eggs all in one basket and split data usage based on applications (this strategy can reduce the impact on fragmentation).
  • For example, put bioinformatics data on one file system.
  • Put large CFD files on a different file system.
  • Insist that the file system developers or vendors publish tests showing the performance degradation over time.
  • If they show little degradation without the file system doing some work behind the scenes, then the test is worthless.
  • They need to show tests with a wide range of files and fragmentation patterns (make sure they do multiple scenarios).
The most important thing you can do is to be aware of the issue of file system aging and ask questions.

Jeff