Will the cold storage data center of the future include a DNA synthesizer? According to a new research paper by the University of Washington and Microsoft, it’s a strong possibility.
Today, we generate data faster than we can increase storage capacity. The volume of digital data worldwide is projected to exceed 16 zettabytes sometime next year, the paper’s authors wrote, citing a forecast by IDC Research. “Alarmingly, the exponential [data] growth rate easily exceeds our ability to store it, even when accounting for forecast improvements in storage technologies,” they said.
A big portion of the world’s data sits in archival storage, where the densest medium currently is tape, offering maximum density of about 10 GB per cubic millimeter. One research project has demonstrated an optical disk technology that’s 10 times denser than tape.
Nature’s Data Storage
But there’s another approach that promises storage density of 1 Exabyte per cubic millimeter, or eight orders of magnitude higher than tape. That approach is encoding data the same way nature encodes instructions for building every living thing on Earth: DNA.
In addition to density, DNA storage addresses another big limitation of archival storage: longevity. Tape can hold data for 10 to 30 years before data integrity starts to corrode, and spinning disks are rated for three to five years. DNA’s observed half-life is more than 500 years in harsh environments, according to the paper.
The idea to store data in the form of synthetic DNA has been around for a long time, but the huge improvements in cost and efficiency of synthesizing and sequencing genes in recent years have made its feasibility a lot more probable. Its state of the art went from a 23-character message in 1999 to a 739 kB message in 2013.
As today’s booming biotech industry delivers orders-of-magnitude cost and efficiency improvements in DNA sequencing and synthesis, it quickly raises the limits of how much data can be stored using the method. Growth in sequencing productivity eclipses Moore’s Law, the paper’s authors wrote.
Big DNA Storage Improvements Proposed
The work presented in the paper pushes the technology further in two big ways: the researchers propose a way to improve integrity of stored data (current DNA storage error rates are about 1 percent per nucleotide) and a way to access individual pieces of data in a sequence randomly (with the current approach, you have to sequence and decode an entire DNA pool to access a single byte within).
The paper proposes an architecture for a DNA storage system that includes a DNA synthesizer, a storage container, and a DNA sequencer. The synthesizer encodes data to be stored, the container holds pools of DNA that map to a volume, and the sequencer reads DNA sequences and converts them to digital data.
It addresses the error problem with redundancy, an approach that has been proposed before but without regard to the impact of redundancy on storage density. The new encoding scheme introduced in the paper offers “controllable redundancy,” where you can specify a different level of reliability and density for each type of data.
The problem of random access is solved by using the same technique molecular biologists use to isolate specific regions of a DNA sequence in research. Polymerase Chain Reaction is a technique used to “amplify” a piece of DNA by repeated cycles of heating and cooling. The DNA storage researchers use PCR to amplify only the desired data, which they say accelerates reads and enables specific data to be accessed without sequencing the entire DNA pool.
While DNA storage is not practical today, the rate of progress in DNA sequencing and synthesis in the biotech industry and the “impending limit of silicon technology” make it something computer architects should seriously consider today, the researchers conclude. They envision hybrid silicon and biochemical archival storage systems as the ultimate cold storage of the future.