CoreGenomics: Anyone fancy trying to “read DNA”? It goes something like this…01110010 01100101 01100001 01000100 00100000 01000100 01001110 01100001

George Church is one of the “godfathers of genomics”*. In one of his latest publications, Next-Generation Digital Information Storage in DNA he demonstrates how to use DNA as an information storage medium. He’s not the first to do this and the supplemental information to the paper lists ten other references, but his is the best example so far.

*George Church, along with 15 others including; Walter Gilbert, Leroy Hood and John Sulston was one of attendees at the 1984 Alta conference where the Human Genome Project was conceived. He published the first method for sequencing of methylation sites in 1984, Genomic sequencing in PNAS. George is very open access, see his unathorised autobiography. In fact he is so open-access I wonder if he could be a godfather of that too!

The paper describes how the text, images and a JavaScript program from the book Regenesis were converted to DNA in a readily amplifiable and readable form. This is not something just anyone can read though; in the paper they used 170M PE100bp reads from a HiSeq lane. This makes it a very expensive book in a format not compatible with a Kindle!

How do you turn a book into a library: George used Agilent’s programmable eArrays to make the DNA version of the book. After synthesis the oligo’s were cleaved from the array into a pool that was PCR amplified with Illumina compatible primers ready for sequencing. You can buy an 8x60k eArray for the equivalent of about £100 per book.

How much sequencing do you need to do: The sequencing was 3000x fold coverage and geivn the aibtily of Hmunas to raed smrlcbaed text I suspect that level of redundancy is massive overkill. Reducing read lengths to PE75 and using slightly longer fragments (150 vs 115) would decrease the costs of sequencing. George used 54,898 115bp oligos each carrying an address and 12x8bit sequences, increasing this to 16x8 would result in a 151bp oligo and only require 41,000 fragments. Even low coverage sequencing could be completed on a MiSeq or PGM.

"Encoding and decoding" DNA from the paper

As DNA read-lengths increase, especially out to the 100kb Oxford NanoPore presented, then reading become a matter of only a few reads. Georges book could be read by just 52x100kb ONT reads. Perhaps combining the oligo production with Craig Venter’s artificial lifemethods would be the way to go?

Fancy giving it a try yourself, the code is available here, Bits2DNA.pl and some of you have sequencers ready to run in the lab.

PS: George Church did the experiments himself. His supplementary information is excellent, probably the best I have read for being able to actually repeat the experiment. It also appears that George has written like this for most of his research career, the methods in his 1984 paper are just as comprehensive and concise. I wish everyone (myself included) wrote this level of detail so succinctly.

PPS: if George Church is reading this then please accept an open invitation to coffee next time you are in Cambridge, UK.

1 comment:

DNA Testing12 October 2012 at 14:54
DNA is effective, affordable and most importantly cannot be tampered so it becomes a part of both government and private sector, whenever something related to identifying someone like DNA testing for immigration.

Note: only a member of this blog may post a comment.

CoreGenomics

Pages

Monday, 1 October 2012

Anyone fancy trying to “read DNA”? It goes something like this…01110010 01100101 01100001 01000100 00100000 01000100 01001110 01100001

1 comment: