In the last installment of this blog, we discussed the Hierarchical Storage Management system or HSM, and the differences between Backup and Archive by way of use-examples.  Here we will go further into the differences, specifically by way of value-adds available for each, their goals and their appropriateness.
 
Recall that an HSM:

  • automatically manages storage subsystems in a tiered virtual environment
  • continuously monitors and automatically moves files and data between different storage tiers
  • allows files and data to always appears to be online to end users and applications regardless of the actual storage location

In general with an HSM, there is no concept of a copy:  there is only one version of each file; there always appears to be one version of each file; that file’s location appears “normal”; but its actual location may be anywhere in the tiered storage environment.
 
In contrast, both a Backup System and an Archive System, in the simplest of terms, deal with copies of files and/or multiple copies of files.  This can become a large consideration when selecting and implementing either a Backup System or an Archive System.  The storage capacity requirements for copies can become prohibitive.
 
Enter de-duplication (or dedup) and compression.  These were once features of Backup Systems and Archive Systems, but are now becoming requirements.   Storage capacity is becoming a premium and keeping multiple copies of files may be prohibitive.  The ability to find identical files, removes excessive copies and maintain a database of file names and locations can improve the effectiveness of a Backup or Archival system.  Similarly, compression algorithms are sufficiently mature that they can be used effectively in backup and archive to maximize storage utilization.  Modern Backup and Archive Systems take advantage of both de-dup and compression to maximize storage capacity efficiency.
 
However, both de-dup and compression take time and processing capability.
 
Since HPC is focused on high performance and highly efficient use of high-speed, low-latency storage subsystems, taking the time to search a database and then to de-compress a file or files may increase turnaround time of a typical HPC workload.  Also, this process involves a level of risk (beyond that of a simply file move) that must be evaluated.
 
If the HSM package under evaluation features or requires dedup and/or compression, it may or may not be appropriate for that Tier-1 or fast-scratch storage space and would warrant further investigation and careful configuration.
 
If you have comments or can contribute additional information, please feel free to do so.  Thanks.  --MRF