DNA: The hard drive of the future

dna genetics
Credit: Pixabay

Humans create a lot of digital data. And figuring out the best way to store it is a challenge.

Well, researchers think they may have started to solve that problem, by figuring out an efficient way to store digital data: on DNA. But how does it work?

In IT Blogwatch, we get our science caps on.

So what exactly is going on? Eva Botkin-Kowacki has some background:

Computer engineers have created some amazingly small devices...But geneticists say Mother Nature can do even better.
DNA...is incredibly dense. The whole genome of an organism fits into a cell that is invisible to the naked eye.
That's why computer scientists are turning to microbiology to design the next best way to store humanity's ever-increasing collection of digital data.

But how do you store digital data on DNA? Robert Service has the details:

Researchers report that they’ve come up with a...way to encode digital data in DNA to create the highest-density large-scale data storage scheme ever invented...the system could...store every bit of datum ever recorded by humans in a container about the size and weight of a couple of pickup trucks.
[Researchers]...converted...files into binary strings of 1s and 0s, compressed them into one master file, and...split the data into short strings of binary code. They devised an algorithm called a DNA fountain, which randomly packaged the strings into so-called droplets, to which they added extra tags to help reassemble them in the proper order later.

But how exactly does it work? Charles Choi fills us in:

The...technique...essentially encodes files in DNA as very simple Sudoku puzzles...In Sudoku, players are given mostly empty grids, and the...numbers provided within the grids serve as hints [for] the rest of the grids...In much the same way, DNA Fountain generates many "hints" about the contents of files...when it comes to retrieving data from these molecules, even if a few "hints" and fragments of the files are lost, the other hints can help reveal what data was lost.

So what did the scientists store on the DNA strands? We let the Columbia University give us the official line:

Study coauthor Yaniv Erlich...and his colleague Dina Zielinski, an associate scientist at NYGC, chose six files to encode...into DNA: a full computer operating system, an 1895 French film, “Arrival of a train at La Ciotat,” a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon.

How exactly did the researchers store the information on the DNA? And how did they then read it again? Brooks Hays has that info:

The coding process produced 72,000 DNA strands, each 200 bases long. Researchers sent the DNA file to...a startup in San Francisco that turns digital DNA into biological DNA. Two weeks later, the company sent the researchers a vial containing their DNA strands.
Researcher Yaniv Erlich and Dina Zielinski used standard DNA sequencing software to re-digitalize their DNA. A...program helped them translate the nucleotide sequences back into binary code. They found their files with zero coding errors.

And what are the benefits of storing data this way? Ed Yong is in the know:

DNA has advantages that other storage media do not. It takes up...less space. It is...durable, as long as it is kept cold, dry, and dark -- DNA from mammoths that died thousands of years ago can still be extracted and sequenced. And...it has a 3.7-billion-year track record. Floppy disks, VHS, zip disks, laser disks, cassette tapes...every media format eventually becomes obsolete...But DNA will never become obsolete.

This isn't the first time this has been done, though, right? Vlad Dudau has some background:

For years, scientists have theorized and showed that DNA can be used as a data storage medium...Even Microsoft, which has to deal with huge amounts of files in datacenters, has been trialing DNA as a storage solution. But now, scientists have managed to pack more data than ever before in this nucleic acid...scientists proved they could effectively store 215 petabytes of data on a single gram of DNA.

Is there a downside to all this? Alyssa Navarro has a reality check:

The use of DNA in data storage is still in its early stages...and...the immediate use of the DNA data storage is for archiving. However, it's still quite expensive to archive...data on DNA. In fact, synthesizing the DNA costs $7,000 alone, while reading it costs $2,000.

So what does this all mean? Stuart Ponder has an "ah-ha" moment:

Perhaps we are all just files walking around on one drive in a server farm.
To express your thoughts on Computerworld content, visit Computerworld's Facebook page, LinkedIn page and Twitter stream.
Shop Tech Products at Amazon