I’m sure pretty much everyone has heard about SSDs. They are the current technology rave in storage. They have, as their name suggests, no moving parts. They use NAND for storing data, much like flash drives. What makes them so attractive is:

  • They have no moving parts, so you can drop them without losing data (as long as they don’t smash).
  • Theoretically, they use less power than a typical hard drive.
  • They can have better performance than standard hard drives (IOPS and throughput).
Therefore, it’s fairly evident that SSDs are the subject of so much development, investment, and curiosity.

In this blog, I want to tell you a bit about SSDs, focusing on aspects of them that you probably didn’t know. More importantly, I want to point out their current problems and limitations that you probably didn’t know about (TANSTAFFL).


Introduction to SSD


I don’t want to take too much of this blog space to explain what SSDs are and how they work, since it’s fairly easy to get information on the Web or in magazine articles.

Most SSDs are constructed from NAND cells. These cells typically come in two forms, SLC (single-level cell) or MLC (multiple-level cell). SLC allows you to store 1 bit of information per cell while MLC allows you to store 2 bits of information per cell.

Without diving into details of the physics behind NAND and Flash chips, there are some differences between SLC and MLC. One fundamental problem that both suffer from is that each cell has a limited number of rewrite cycles. That is, each cell can be rewritten only a certain number of times before it can’t hold data any longer. Here is a quick summary of them:

SLC:
  • 100,000-cycle write endurance
  • Faster read and write programming time
  • Higher signal-to-noise ratio requiring a less complex controller
  • 1 to 2 bit ECC
  • Single bit per cell

MLC:
  • 10,000-cycle write endurance
  • Slower read and write programming time
  • Lower signal-to-noise ratio requiring a more complex controller
  • 4-bit ECC
  • Two bits per cell

In general MLCs are cheaper than SLCs, allowing higher-density flash solutions to be built for a lower cost. But they have fewer rewrite cycles and are slower than SLC-based flash solutions and require a more complicated controller than SLC, adding cost.


All Types of Drives for All Types of Requirements


Flash solutions are generally made by combining NAND cells with a controller and then packaging them with an interface (for example, SD, USB, PATA, SATA, SAS, and FC). As one would expect, as drives get faster they probably get more expensive. At the lowest end of performance are flash drives—the classic thumb drives (USB key), which are also the cheapest solid-state storage.

The next level up in performance are SSD drives that use a PATA interface that runs at a maximum of UDMA-100. These drives are a bit faster and more expensive than flash drives. There are a number of examples of this type of drive and most of them use MLC. These drives are the most likely ones you will find in laptops, primarily because of cost.

The next level of performance is SSDs that have a SATA interface with a maximum bandwidth performance of 3 Gb/s. But again, these drives are more expensive but faster than the previous drives. This is also where you start to see a transition from MLC to SLC in the drives. There is also a transition from a SATA interface to a SAS interface for enterprise applications because the SATA interface adds complexity in enterprise systems due to the need to add protocol translation from SCSI command sets to SATA command sets and additional hardware to support dual port function.

Again, moving up in performance and cost from SATA drives are SAS/FC SSD drives. Most of these drives are based on SLCs. They are ideal for enterprise systems and have pretty good performance but are also expensive. These drives are at least an order of magnitude more expensive than spinning drives on a per-gigabyte basis (usually 20 times more expensive).

The final set of SSD solutions that are, at this time, the fastest drives and the most expensive drives, have a PCIe interface. That is, you can plug them into a PCIe slot and with the correct drivers they appear as storage devices to the operating system.

There are also companies such as Texas Memory that make SSD-based storage devices that combine multiple SSD devices internally and then apply an interface to the resulting device such as the RamSan-500. It has a capacity of 1 to 2 TB with a 16 to 64 GB memory cache. It can sustain 2 GB per second to the flash storage and can perform 100,000 read IOPS. It also has up to 8 FC ports (4 Gb/s each) for lots of throughput.

Performance of SSDs can vary extremely widely. At the low end, USB flash drives, the typical performance you would see is 10-20 MB/s for reads and writes with very low IOPS. At the high end, we have the Texas memory RamSan-500 with 2 GB of throughput and a read IOPS of 100,000. Fusion-IO has a PCIe device that has read/write IOPS of 100,000 for SLC versions and a throughput of about 600 MB/s (write) and 700 MB/s (read).

You can pay a few dollars for a simple flash drive or you can pay $$$ for a much faster device, but, fundamentally, both are solid-state storage devices. However, it’s pretty evident that not all SSDs are created equal. Moreover, they are not perfect devices and have their problems.


Now for the Bad Things


So far everything about SSDs sounds pretty good. They are expensive but they promise massive amounts of performance, primarily IOPS, but with very good throughput as well. Prices are coming down so people are getting very excited at the possibility of low-power, solid-state storage with very high levels of performance. But the story isn’t all rosy :)


Asymmetric Performance

The first thing you need to realize is that NANDs are asymmetric. That is, the read performance is different from the write performance. While one could argue that hard drives are also asymmetric, the difference between reads and writes for hard drives is not nearly as pronounced as for flash drives. As an example, there is one SSD drive that has a specification of 130 random writes per second and 18,000 random reads per second (reference – InfoStor Sept. 2008 article). So you can see that the differences in IOPS performance can be quite startling.

As another example, Figure 1 below plots the IOPS as a function of the read/write ratio. It normalizes the IOPS by the read IOPS at 100 percent reads. It plots this normalized IOPS for an SSD drive and a hard drive for various block sizes as the mix of read/writes varies. While I can’t give out the explicit name of the SSDs (two were tested) and the hard drive, I can say that the SSDs are so-called "second-generation" drives and have very good performance with SATA 3 Gb interfaces (IOPS provided by the manufacturer are in the 10,000 range for reads) that are built from SLC technology (the fastest available). The hard drive is a simple 2.5” 10K SAS drive with 16 MB of cache.


Figure 1 – Normalized IOPS as the Read/Write Ratio Changes for SSD and Hard Drive (various block sizes)

SSD - Figure 1



Notice that as the ratio of reads/writes decreases to the point of 100% writes, the IOPS performance of the SSD drops to less than 5% of its read value. However, the hard drive performance drops to just a little under 95%. This shows you the dramatic impact of writes on the performance of SSDs.

One might argue that even if the IOPS write performance drops off to less than 5% of the 100% read value it still might be better than a hard drive (after all, 5% of 10,000 is still a reasonable number). The actual answer is dependent on the actual drive. So let’s take the same drives we used in the previous study and look at real numbers. Figure 2 below is the same plot as Figure 1 except it is not normalized.

Figure 2 – IOPS as the Read/Write Ratio Changes for SSD and Hard Drive (various block sizes)

SSD - Figure 2


This chart is also interesting because it shows that, in general, for block sizes greater than 8K, SSDs don’t have the same level of IOPS performance as a hard drive (at least for the drives tested). At 100% reads, the SSDs have astronomical read IOPS—way off the plot. For 8K blocks, the SSD matches the 100% write performance of the hard drive. However, for the 64K and 128K block sizes the SSDs have much worse IOPS performance than the hard drive. At 100% writes, the SSDs at 64K blocks achieve about 40 IOPS and at 128K they achieve about 20 IOPS. However, the hard drive keeps about 140-150 IOPS despite the read/write ratio.

What is equally interesting is the crossover point between the SSDs and the hard drive. For 64K blocks, the SSD IOPS performance matches the hard drive’s performance at a read/write ratio of about 80/20 (80% reads and 20% writes). For 128K blocks, the SSD IOPS performance matches the hard drive’s performance at a ratio of a little less than 90/10 (90% reads and 10% writes). These two crossover points show how much writes impact the IOPS performance of SSDs. Just a few writes relative to reads can drop the IOPS performance below that of a simple hard drive.

So the picture for SSDs is perhaps not as rosy as people think at this point. They are really asymmetric storage devices and their write performance, in terms of IOPS, is not the best at this time—with the caveat that this observation is limited to the drives tested (there are drives that have better IOPS but the cost is moving up into the range of a new car). Not to burst your bubble, but there are some additional problems that you need to be aware of—SSDs have data retention issues.


Data Retention and Write Endurance
When we write data to a storage device we assume that the data is going to be there later when we need it. In essence we don’t want the device to suffer from “bit-rot.” However, I think those of us who are a little older know that this isn’t always true. Some forms of media degrade over time, losing data. Classic examples are floppy drives (really showing my age now), and CDs or DVDs. Spinning media don’t necessarily suffer bit-rot in the same manner as CDs or floppy drives, but depending upon the media, how it’s used, etc. you can have blocks of the drive go bad (this is a completely separate topic since it is possible for you to lose data even if it’s just sitting on the drive). At that, file systems like Panasas’ Tiered Parity become vital.

SSDs suffer from the same problem but it is slightly different. The difference between SSDs and spinning media is that cells on SSDs can perform only so many rewrites before the cell is incapable of storing new data. So any data retention requirements are likely to have to be a function of the number of the number of rewrites.

JEDEC is the leading developer of standards for solid-state storage. JEDEC has specified that when a drive has used 10% of its rewrite capacity, it should be able to hold the data for at least 10 years. They also say that when the drive has used 100% of its rewrite capacity, it should be able to hold the data for at least one year.

Figure 3 below shows a plot of the data illustrating retention time vs. the rewrite endurance. The endurance is the number of rewrites that have occurred (smaller is better). This particular chart is for MLC, which has a generally accepted number of rewrites of 10,000. So at an endurance of 0 to 10%, the SSD has to be able to retain the data for 10 years. When you get close to the 10,000 limit for the number of rewrites, the data retention time drops to one year.


Figure 3 – Data Retention Time vs. Rewrite Endurance (Number of Rewrites)

SSD - Figure 3

Notice that 60nm MLCs can meet JEDEC’s requirements but they do not exceed it. As the size of the cells decreases to 50nm and 4x nm, notice that the retention time drops off significantly. For example, for 50nm, at about 5,000 rewrites, the device can retain data for only about one year. It can’t retain data for any length of time beyond the 5,000 limit. As the size of the cell shrinks, the limit moves to the left, indicating that the drive cannot retain data after a small number of rewrites. The reasons get very detailed and are well beyond this blog (I don’t even understand to be honest). But the basic reason is that as the cells shrink there are fewer electrons in the floating gate (see, I told you it was detailed).

In projections for the next generation of NAND cells that are smaller, the number of rewrites will be reduced by about half. For example, MLC will move to 5,000 rewrites and SLC will have to move to 50,000. This only makes the problem of rewrites worse.

There are reports of NANDs being “cherry picked” that can handle up to 500,000 rewrites. However, these were just selected from a normal production batch and are not a result of a new manufacturing technique. So while the vast majority of the production run is either 100,000 or 10,000 rewrites, you can get 500,000 rewrites from certain cherry-picked NANDs. But you will pay a dear price for it, I’m sure. It’s also disconcerting to see such wide variations in rewrite endurance from the same manufacturing line.


Hot Spots - and I Don’t Mean T-Mobile
Given the fact that Flash-based storage devices have limited lifetimes, virtually all of the drives have what is called a wear-leveling algorithm in the controller. This algorithm spreads the writes to all cells in the most even manner possible. The idea is to have each cell written approximately the same number of times. However, how the drive interacts with the file system can affect the wear leveling of the device. So it’s almost impossible to reach the ideal of having the same number of writes for each cell. Therefore, there will always be “hot spots” on the drive that will wear out faster than others.

An Approach to Help Write Endurance and Hot Spots
What some vendors are doing to help limit the impact of hot spots and to improve the overall write endurance of the device is to reserve some cells to replace ones that have reached their rewrite cycle limit. This is typically called over-provisioning. For example, you could take a 64 GB drive and reserve perhaps 14 GB for “backup” cells. So you have a 50 GB drive with 14 GB held in reserve. The controller firmware knows when cells are approaching their limit and will “remap” the drive to use the cells held back. It’s somewhat similar to what hard disk drives do that save a number of blocks for when blocks go bad. In hard disk drives the remapping is automatic, just as it is in the SSD drives with spare cells.

This approach helps hot spots and also helps the overall write endurance of the drive. But there are no standards for over-provisioning so it’s very implementation-specific.


Summary

I’m sure there are some people who will skip all of the previous discussion and read the summary first (not a bad strategy when reading something for the first time). So, to help them, I’ve got a few bullets that summarize some things I’ve discussed in this blog.

  • SSDs hold great promise because they have no moving parts, they can potentially use less power than spinning drives, their throughput can be very good, and their IOPS performance is very good.
  • However, SSDs suffer from some problems:
  • Writes Endurance – NAND cells can be programmed and erased for only a specified number of times. SLC flash is better than MLC flash.
  • Data Retention – JEDEC has specified data retention criteria for SSDs, and current drives are just barely meeting those criteria. Plus, future NANDs, as they are shrunk, will have a reduced number of rewrites (i.e., reduced write endurance).
  • Asymmetric performance – Reads are much faster than writes and write performance in terms of IOPS may not be as good as people think depending upon the read/write ratio.


Unfortunately, there are other problems that SSDs face that I haven’t covered (perhaps another blog?). But I’m sure there will be people who think that I’m trying to beat down a new technology and I truly am not trying to do that. What I’m hoping my blog has done is to open people’s eyes to the fact that SSDs have problems much like any other technology. All new technologies are going to have good points and bad points. It’s pretty easy to spot unbelievably great technology because someone would be extraordinarily rich and everyone and their mom would be using it. I just want people to understand that they need to ask questions about any new technology, in this case SSDs, and understand that there are problems.

Thanks!

Jeff