Solid-state drives are often marketed as being more reliable than hard drives. But some evidence suggests that isn’t always true. How much more reliable than hard drives can SSD’s really be?
A retailer of SSDs and hard drives published its return data last month. Some of the 1 TB hard drives were more reliable than some of the SSDs. And when you consider the much larger number of bits on each hard drive, the per-bit reliability looks even better.
Why do hard drives fail?
Google’s study of hard drive failures – see Google’s disk failure experience and Everything you know about disks is wrong – found that 36 percent of failed hard drives did not exhibit a single SMART monitored failure.
Why? Because drive failures have two components: mechanical and electrical.
Hard drives are mechanical devices. Over time components wear: platters start wobbling; actuators lose precision; lubricants dry. The result: more retries; more corrupted data requiring ECC recovery: higher drive temperatures: greater power draw. These are the kinds of things that SMART records.
If SMART (Self-Monitoring, Analysis, and Reporting Technology) warns you of an impending drive failure, you should respond. But SMART is almost useless because of the failure modes it can’t predict. Power regulators, capacitors, traces, firmware and connectors can all cause hard drive failures. And SMART can’t warn about those.
What’s different with SSD’s
SSD’s replace the platters, heads, bearings and motors of a hard drive with flash. But they don’t replace the electrical components that cause many hard drive failures.
If all flash chips were the same, we could calculate how much more reliable an SSD should be. But they aren’t: manufacturers bin chips into different grades just as they do with CPUs. Manufacturers who build SSDs, such as Intel, Toshiba, and Samsung, often use the highest grade chips for their own SSD’s.
The lesser quality chips are sold on the open market. Most such chips go into USB drives and SD cards, but can go into SSDs – which probably explains the reported return rates for disk drives and SSDs. The high-quality SSD’ had fewer chip failures and the lower quality ones had more.
Fifteen years ago disk drives had significant differences in performance and quality. Today disk drives of given spec are much more similar.
SSDs today are where disk drives were 15 years ago: big differences – even from generation to generation within a single vendor’s line. The firmware layer that makes flash look like disk – the Flash Translation Layer – is evolving rapidly.
There is good news. Over the next year SSD prices will drop by 50%. Likewise, the quality of controllers and chip error detection and correction is rapidly improving.
Over a five year life I would expect an SSD to offer a 30%-40% lower annual failure rate than a mature disk drive. On the other hand, that disk drive will store considerably more information for lower cost. New, higher reliability drive technologies – such as HAMR and patterned media – are coming, but we’ll have to wait and see how good they are.
Given the trends, 2011 is the year even conservative data center managers will begin integrating SSD’s into their server infrastructure.
Comments welcome, of course. Thanks to Intel Fellow Neal Mielke and Application Engineer James Meyers for briefing me on quality issues. Conclusions are my own.