Doug Rollins is a principal SSD systems marketing engineer at Micron Technology, Inc., who holds 13 U.S. patents and is an active member of the Storage Networking Industry Association (SNIA).
In this article we examine two enterprise solid state drive (SSD) lifespan extension techniques that are used by top-tier SSD manufacturers: dynamic read tuning and NAND-level redundancy. These techniques are critical to getting the most from your investment in SSDs. When choosing an SSD, it is important to ensure that the supplier can explain how techniques like these are implemented in their products and the net benefits of each.
Note: Both part 1 and part 2 of these articles refer to NAND-based SSDs and commands executed inside them (as opposed to commands issued by the host).
Dynamic read tuning
Optimal methods for reading data from an SSD are not static. As the NAND in an SSD ages, specific characteristics of the command used to read data should be dynamically tuned by the SSD controller and firmware. Such tuning has a direct impact on data reliability. Proper tuning improves READ command performance in terms of immediate data access and long-term data reliability which are key requirements of enterprise applications.
Figure 1a shows the default settings used to read data from the media on the SSD. These are factory presets and are optimal for new NAND devices. Figure 1b shows how the optimal read settings can change over time as the drive is used (shown in green). The amount of data written to and read from the drive, for instance, can impact the optimal settings. Adaptive read management dynamically tunes these settings to ensure best performance and data integrity for the SSD.
Dynamic read adjustment can operate in both background and foreground modes. In background mode, the SSD controller and firmware read data from the NAND before the host requests it. Unlike a caching prefetch design, this is a proactive method of pre-tuning the NAND such that when the host reads data from the NAND device, the NAND read settings have been preoptimized by this background process. In background mode, when a read error occurs, the SSD controller and firmware retune the NAND read settings on-the-fly and retry the read—a process that can be iteratively applied and determined by the SSD design.
For cases where dynamically tuning the NAND read settings does not enable a successful read, many enterprise-grade SSDs employ parity protection as a secondary, fallback protection system. This additional protection mechanism operates in real-time and uses well proven parity techniques to generate parity data and embed it with the user data. The details of each implementation are design-specific, but the SSD supplier should be able to clearly articulate the core elements:
- Data-to-Parity Ratio: Expressed as X data + Y parity (or X:Y), this ratio is optimized for intended drive workload, performance, media type, and several other factors. It is also referred to as the stripe size.
- Parity Storage Location: The parity may be stored in a fixed, relative, or rotating location.
- Protection Level: NAND-level parity can protect user data from catastrophic media failures.
- Hardware Acceleration: SSD suppliers can choose to manage parity in the firmware or accelerate it via hardware.
The figure below shows a data-to-parity ratio of 7:1 with seven elements of user data and one element of parity data. However, RAIN is not limited to 7:1; the ratio can be designed specifically to balance data protection, drive design, intended workload, and cost.
The ability to dynamically tune the NAND (both proactively and reactively) is a key feature offered in many enterprise-class SSDs that helps to ensure more reliable operation and greater SSD lifespan. As with most enterprise-class storage designs, a single protection mechanism is not enough. When dynamically tuning the NAND for the best read operation is not sufficient, many Enterprise SSDs also adopt a fallback protection system of parity generation and storage, which enables protection and recovery from even catastrophic media failures.
Part 2 of this article will discuss techniques that enterprise-grade SSDs can use to protect user data as it moves inside the SSD, as well as ways to manage background operations to improve SSD responsiveness.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.