Some comments and analysis from the exciting and fast moving world of Genomics. This blog focuses on next-generation sequencing and microarray technologies, although it is likely to go off on tangents from time-to-time
We're currently testing some new methods in the lab to find an optimal exome library prep, not the capture just the prep. The ideal would be a PCR-free exome, however we want to work with limited material and so maximising library prep efficiency is key, and we'll still use some PCR. The two main factors we're considering are ligation temperature/time, and DNA:adapter molar ratio. The major impact of increasing ligation efficiency is to maximise library diversity, and this applies whatever your DNA input. Even if you're not working with low-input samples, high-diversity libraries minimise the sequencing required for almost all applications.
During discussions with some users it became evident that not everyone knows what the critical bits of a DNA ligation reaction are and since adpater ligation is key to the success of many NGS library preps I thought it would be worthwhile summarising some key points here.
Image from taken from Bob Lehman's 1974 Science paper
Two woolly mammoths (Mammuthus primigenius) have had their genomes sequenced by a team led by the Swedish Museum of Natural History: Complete Genomes Reveal Signatures of Demographic and Genetic Declines in the Woolly Mammoth. The BBC story included coverage of the Long Now Foundation and their plans for de-extinction via genetic-rescue; "to produce new mammoths... repopulate [] tundra and boreal forest in Eurasia and North America", but "not to make perfect copies of extinct woolly mammoths, but to focus on the mammoth adaptations needed for Asian elephants to live in the cold climate of the tundra".
The Mammoth genome story is likely to be big news and I think that is unfortunate, not just for the elephants that are going to get fur coats and be shipped off to cooler climes, but also for the perception of science and scientists. It perpetuates the mad-scientist image and people will inevitably think of films like Jurassic Park. I find it difficult to think of reasons why we would actually need/want to adapt Asian elephants (why not African elephants too) for modern Siberia. Is anyone honestly going to use genome editing on a large scale to make a hairy elephant so it can live in the cold? This kind of coverage is not especially good for science, but it is probably great for your rating on Google!
The mammoth genomes: The two genomes are separated by 40,000 years; the first was from a 44,800 year old (Late Pleistocene) juvenile found in Siberia, the second from a 4300 year old molar taken from a Mammoth that lived in probably the last extant population on Wrangel Island.
Pretty standard library construction with the addition of a UNG step to remove uracil bases (resulting from cytosine deamination) that reduced C>T artefact's. Genomes were aligned to an (unpublished) 7x Sanger-seq African elephant (Loxodonta africana) genome, LoxAfr4. Alignment to the reference showed differences in the average length of perfectly mapping reads, 55bp and 69bp for the Siberian and Wrangel Island individuals respectively. Population size was estimated by measuring the density of heterozygous sites in the genomes, the authors are explicit in stating that there analyses is probabilistic and they "always quote a range of uncertainty". This analysis suggested two population bottlenecks; the first 280,000 years ago, and the second more recently 12,500 years ago at the start of the Holocene. This second indicated a probable significant drop in mammoth diversity in the time just before extinction, possibly due to in-breeding. The Wrangel Island sample had large regions termed "runs of homozygosity", about 23% of the genome.
There is a possibility that more genomes are coming as the group sequenced DNA from 10 individuals to find one good one, and are on the hunt for more to better understand mammoth diversity and the reasons behind extinction.
In How to Clone a Mammoth she discusses the challenges both scientific and ethical that make de-extinction tough, "I question if it's something we should do at all"! Cloning is likely to be impossible so we'll have to result to genome editing or recombineering to get extinct species DNA into their closest modern relatives.
We need good science stories to hook news broadcasters in whichever media we can, but with impact being one of the metrics many PIs are judged on nowadays it might be too tempting to spin a story a little too hard. Chris Mason's recent Cell paper on New York's subway metagenome got some tough criticism for over-playing the levels of Anthrax and Plague. Although the paper itself is pretty clear about the realities of what the data show.
I don't imagine the headlines "Scientist discover people don't always wash their hands after going to the loo", or "Scientists confirm elephants are related to mammoths" would have elicited such high profile coverge!
Do No Harm: Stories of Life, Death and Brain Surgery by Henry Marsh is one of the best books I've ever read, the last time a book made me cry I was a kid. I was in tears several times, including on a flight from London to Helsinki, no holding back, pure emotional roller-coaster.
As a cancer research scientist (genomics core facility head in a cancer research institute) I rarely see or talk to cancer patients; we do have tours a few times a year and speaking to people living, and dying, with cancer brings home the real reason we're all doing our jobs.
Dr Henry Marsh is a Neurosurgeon and this book describes his career through the lives of patients he has operated on; some are cured, some die and some are unlucky - they live but with terrible repercussions of treatment gone wrong. When something goes wrong in my lab I might get tied up in knots about the loss of an £11,000 Illumina PE125bp run, who's going to pay, were the samples irreplaceable, etc; but when surgery goes wrong for Dr Marsh the results are catastrophic. He discusses the good and bad of his career with unflinching honesty and genuine emotion.
Illumina launched NeoPrep at AGBT and Keith Robison at OmicsOmics wrote a detailed summary of what Illumina say the NeoPrep is capable of (he compared this movie of the electro-wetting technology to video games of his, and my, youth), I thought I'd write down my thoughts on how this instrument might fit into labs like mine and possibly yours too.
The main selling point of NeoPrep is that it provides a one-stop solution for NGS library prep. The price point is pretty good (around £30-35k), and speed and quality looked great in the data presented by Illumina at AGBT (by Gary Schroth and Kevin Meldrum, Illumina have run over 5000 libraries so far); so is NeoPrep a good option for every NGS lab? There are a couple of limitations I'll return to in a bit - but broadly speaking I can see that this system really could be a good, even a sensible, fit for many many labs running NGS. In a core lab or heavy NGS research lab the NeoPrep looks like it will take some of the worry out of NGS library prep. And even is a lab that only does a dozen or so NGS experiments per year, removing the worry that library prep will go wrong with precious samples might be enough to warrant a purchase.
Because NeoPrep can be run in such a hands-off way, and because the quality and reproducibility of data are reportedly high (although we'll have to wait for user reports over the next six months for confirmation of this), then a lab that is spending PostDoc or PhD time making small to medium sized numbers of libraries might well buy a NeoPrep where they would never have considered purchasing a liquid handling robot due to their complexity. This means Illumina might have hit the nail squarely on the head on this instrument.
For Core Labs the ability to offer library preps with an Illumina guarantee of quality is likely to be a positive, and something customers might approve of. And as Illumina release more library prep methods for NeoPrep the instrument might be perfect for those things you rarely get asked to do.
The positives: If the hands on time claims of just 30 minutes for sequencing ready libraries are true then NeoPrep is going to save people time and that is probably our most precious commodity. Labs like mine like big projects and we tend to batch smaller ones together, this can mean a wait for users while other samples come in to fill a 96well plate. NeoPrep would allow us to run projects as small as 16 samples. Alternatively it would allow us to provide automation solutions directly to users, running NeoPrep as a bookable instrument in the core.
The price per library is attractive and Illumina are aiming for parity with manual kits (which could generate a "why bother" attitude to manual prep). The TruSeq Nano kit, comes in at around $30 per sample, and TruSeq stranded mRNA around $55. GenomeWeb quoted Illumina as saying "the "fully loaded cost per sample" using NeoPrep would be around $75 per sample, which includes the cost of amortization of the instrument...assuming 1,000 samples are run per year." Compare that to the real cost of a top-flight post-doc spending two weeks making libraries for one experiment!
Price is not everything, so also encouraging is Illumina's data on reproducibility which appears to be very good in the results presented so far; concordance of 0.97 between RNA-seq libraries of varying inputs (100ng vs 10ng, and even 100ng vs 2ng). And the NeoPrep is being touted as requiring less starting material, in fact Illumina recommends starting with 25ng to 75ng of DNA for TruSeq Nano which is several thousand genome equivalents. For TruSeq mRNA, they recommend an input of just 25ng. They also tested down to 2ng but reported some drawbacks, including a lower yield and increased numbers of duplicates. I do worry about biases in low-input experiments and we previously showed a drop in sensitivity, although not specificity, as RNA input dropped from 100ng to 10ng in Illumina HT12 arrays. I guess we should repeat this experiment with TruSeq RNA on NeoPrep!
The negatives: The most obvious problem is that you can only run Illumina reagents on the NeoPrep and whilst they are competitively priced some competitors are significantly cheaper. The range of library preps is also very limited with just DNA Nano and TruSeq stranded mRNA at the moment (GenomeWeb quoted Illumina as saying the "PCR Free kit, followed by a "steady stream of protocols" [would begin] in the second half of the year and including targeted resequencing panels"). I know I'd like to see Nextera exomes ASAP; and knowing a timescale for ChIP-seq and ribozero would be great.
There is a huge range of methods that have been developed to run on Illumina sequencer (download Jaques Retief's amazing poster with almost 150 apps), most probably only a handful of these will ever see the light of day on a NeoPrep. But many of them use steps in Illumina's core library prep technology (end-repair, adapter-ligation), so if NeoPrep could be configured by users we might be able to make it do what we want. As an example we've been making some RNaseH libraries with NEB kits for ribo-depletion, we did this by eluting the ribodepleted RNA from Agencourt RNAClean XP beads directly into the 19.5 μl of Fragment, Prime, Finish Mix in the Illumina TruSeq stranded mRNA kit, then we carried on with the protocol from "Incubate RFP" to make multiplexed RNA-seq libraries.
If it is unclear what else might come, and when, then making a purchase decision is tougher. Perhaps more important is understanding what methods Illumina can not migrate to NeoPrep, I suspect anything that needs a gel is going to be difficult.
Some of the library prep kits from other companies are way better than Illumina's for specific applications. This is not because Illumina can't make those kits (unless IP stops them), but is probably more a case of Illumina looking to see which markets are largest, and possibly which competitors they want to crush. If someone does something Illumina can't, or won't, do then we'll not be running that on NeoPrep.
My biggest concern with NeoPrep is that it is a black (and white) box so user's don't need to know what is going on under-the-hood; you could equally say my labs library prep services. I am a strong believer that you (we) need to understand the library prep technology to innovate and NeoPrep might reduce the likelihood of new users doing their homework. It could also be argued that as long as you can ask a sensible question and can validate and interpret the results, then using a black-box to go from library-prep (NeoPrep) to results (BaseSpace or similar) does not matter. However much of the innovation in NGS methods has come from tweaks to library prep methods, and I'd hate to see a slow down in this space.
Lastly 16 samples at a time could be limiting if it is not easy to run a 96plex experiment in four runs! High-plexity sequencing rocks!
How to choose if NeoPrep is for you: Illumina have data on their blog and website, they also have an unbiased buyers guide to laboratory automation that is probably worth reading if you're thinking "should I buy a NeoPrep or a Agilent/Beckman/Hamilton etc?". Ultimately you need to consider the number and type of libraries you want to make over the next couple of years. You might decide that kits like Thruplex, KAPA Hyper+ or Lexogen make library prep so simple that you don't need a robot at all.
Your needs will ultimately guide you and my thoughts in this post are very squarely for Human genomics and transcriptomics. If you are working with small genomes then other technologies to reduce costs by orders of magnitude are probably more interesting e.g. high-plexity mRNA-seq with a single library prep.
Data from my lab: I'd love to be able to share RNA-seq data from my lab with you but I can't; because there appear to be no demo units available! However I did run through the demo instrument at AGBT and the run setup wizard was a longer version of the same wizards on HiSeq, MiSeq and CBot. I could see myself starting 16 RNA-seq libraries first thing in the morning and then doing the day-job while NeoPrep makes the libraries. I could also see a lab like mine wanting three instruments so we can run 48 samples in a batch, hopefully they will rack nicely to save bench space.
Nature Methods recently carried a News & Views article from Nick Loman and Mick Watson: “Successful test launch for nanopore sequencing”, in which they discuss the early reports of MinION usage; including a paper in the same issue of Nature Methods (Jain et al). They recall the initial “launch” by Clive Brown at AGBT 2012; this caused a huge amount of excitement, which has been tempered by the slightly longer wait than many were hoping for. Nick ‘n’ Mick suggest that a “new branch of bioinformatics” is coming dedicated to the nanopore data (k-mers), which is very different compared to Sanger or NGS data (bases).
Jain et al:The paper in Nature Metohds from Mark Akeson’s lab at UCSC presents the sequencing of the 7.2kb genome of M13mp18 (42%GC) and reported 99% of 2D
MinION reads (the highest quality reads) mapping to the reference at 85%
raw accuracy. They presented a SNP detection tool that increased
the SNP-call accuracy up to 99%. To
achieve this they modelled the error rate in this small genome at
high-coverage; 100% accuracy might be impossible in homopolymer
regions where the transition between k-mers is very very difficult to
interpret, but for much of the genome MinION looks like it will be
usable. Whether this approach will work for targeted sequencing of
Human genomes will be something I’ll be working on myself.
In the paper they also reported very
long-read sequencing of a putative 50-kb assembly gap on human Xq24
containing a 4,861-bp tandem repeat cluster. They sequenced a BAC clone
and obtained 9 2D reads that spanned the gap allowing them to determine
the presence of 8 repeats, confirmed by PFGE and “short” 10kb-reads from
fragmented BAC DNA.
The future looks bright for MinION: Jain et al discuss the rapid rate of improvement in MinION data quality, and Nick
‘n’ Mick also mention this when talking about why they're so upbeat about the MinION (hear it directly, both are
speaking at the ONT "London Calling" conference). Their main reason is the success of the MinION Access Program in its first year (e.g. Jain et al
reported the increase in 2D reads due to changes in sequencing chemistry
from June (66%), July (70%), October (78%) and November (85%); and
Loman published a bacterial genome after just 3 months in the MAP demonstrating the improvements in chemistry); they also point out that
very long-reads allow access to regions of the genome off-limits to
short-read technologies; and they mention the hope of direct
base-modification analysis, direct RNA-seq and protein sequencing. Jain
et al also discuss the possibilities of detecting epigenetic
modifications, etc. These all seem a very long way off to me, but with so
many labs participating in the MAP who knows how soon we’ll be reading
about these applications?
Jain et al and Nick ‘n’ Mick both mention the miniature size of the MinION and its portability. It is certainly small, I accidentally took mine home after a meeting because it was in my pocket! If this portability can move sequencing from the bench-to-bedside then MinION could be the first point-of-care diagnostic sequencer. It may be premature to suggest this, but many cancer researchers would love to sequence DNA directly from blood with as little time in-between collection and sequencing – if Clive’s AGBT 2012 claim “that sequencing can be accomplished directly from blood” proves to be accurate then this may just be a matter of technology (mainly sample prep) maturation.
I agree that the future looks bright for MinION. ONT tried something quite different with the MAP, this was a risk but is one that seems to be paying off. Year two is likely to see many many more publications from the large number of MAPpers.
Disclosure: I am a participant in the MinION Access Program.