Friday, 24 April 2015

Mammoth de-exinction: not good for elephants, not good for science?

Two woolly mammoths (Mammuthus primigenius) have had their genomes sequenced by a team led by the Swedish Museum of Natural History: Complete Genomes Reveal Signatures of Demographic and Genetic Declines in the Woolly Mammoth. The BBC story included coverage of the Long Now Foundation and their plans for de-extinction via genetic-rescue; "to produce new mammoths... repopulate [] tundra and boreal forest in Eurasia and North America", but "not to make perfect copies of extinct woolly mammoths, but to focus on the mammoth adaptations needed for Asian elephants to live in the cold climate of the tundra".



The Mammoth genome story is likely to be big news and I think that is unfortunate, not just for the elephants that are going to get fur coats and be shipped off to cooler climes, but also for the perception of science and scientists. It perpetuates the mad-scientist image and people will inevitably think of films like Jurassic Park. I find it difficult to think of reasons why we would actually need/want to adapt Asian elephants (why not African elephants too) for modern Siberia. Is anyone honestly going to use genome editing on a large scale to make a hairy elephant so it can live in the cold? This kind of coverage is not especially good for science, but it is probably great for your rating on Google!

The mammoth genomes: The two genomes are separated by 40,000 years; the first was from a 44,800 year old (Late Pleistocene) juvenile found in Siberia, the second from a 4300 year old molar taken from a Mammoth that lived in probably the last extant population on Wrangel Island.

Pretty standard library construction with the addition of a UNG step to remove uracil bases (resulting from cytosine deamination) that reduced C>T artefact's. Genomes were aligned to an (unpublished) 7x Sanger-seq African elephant (Loxodonta africana) genome, LoxAfr4. Alignment to the reference showed differences in the average length of perfectly mapping reads, 55bp and 69bp for the Siberian and Wrangel Island individuals respectively. Population size was estimated by measuring the density of heterozygous sites in the genomes, the authors are explicit in stating that there analyses is probabilistic and they "always quote a range of uncertainty". This analysis suggested two population bottlenecks; the first 280,000 years ago, and the second more recently 12,500 years ago at the start of the Holocene. This second indicated a probable significant drop in mammoth diversity in the time just before extinction, possibly due to in-breeding. The Wrangel Island sample had large regions termed "runs of homozygosity", about   23%  of the genome.

There is a possibility that more genomes are coming as the group sequenced DNA from 10 individuals to find one good one, and are on the hunt for more to better understand mammoth diversity and the reasons behind extinction. 

De-extinction: Beth Shapiro at UCSC is author of the book How to Clone a Mammoth and she'll be doing a talk and book signing at Oxford University Museum of Natural History Tuesday, 19 May 2015 at 6PM, she previously published research on the museum’s Dodo.
In How to Clone a Mammoth she discusses the challenges both scientific and ethical that make de-extinction tough, "I question if it's something we should do at all"! Cloning is likely to be impossible so we'll have to result to genome editing or recombineering to get extinct species DNA into their closest modern relatives.


We need good science stories to hook news broadcasters in whichever media we can, but with impact being one of the metrics many PIs are judged on nowadays it might be too tempting to spin a story a little too hard. Chris Mason's recent Cell paper on New York's subway metagenome got some tough criticism for over-playing the levels of Anthrax and Plague. Although the paper itself is pretty clear about the realities of what the data show.

I don't imagine the headlines "Scientist discover people don't always wash their hands after going to the loo", or "Scientists confirm elephants are related to mammoths" would have elicited such high profile coverge!

Monday, 20 April 2015

Book Review: Dr Henry Marsh "Do No Harm: Stories of Life, Death and Brain Surgery"

Do No Harm: Stories of Life, Death and Brain Surgery by Henry Marsh is one of the best books I've ever read, the last time a book made me cry I was a kid. I was in tears several times, including on a flight from London to Helsinki, no holding back, pure emotional roller-coaster.

As a cancer research scientist (genomics core facility head in a cancer research institute) I rarely see or talk to cancer patients; we do have tours a few times a year and speaking to people living, and dying, with cancer brings home the real reason we're all doing our jobs.

Dr Henry Marsh is a Neurosurgeon and this book describes his career through the lives of patients he has operated on; some are cured, some die and some are unlucky - they live but with terrible repercussions of treatment gone wrong. When something goes wrong in my lab I might get tied up in knots about the loss of an £11,000 Illumina PE125bp run, who's going to pay, were the samples irreplaceable, etc; but when surgery goes wrong for Dr Marsh the results are catastrophic. He discusses the good and bad of his career with unflinching honesty and genuine emotion.

Read Chapter 1: Pineocytoma from the Orion Books website.


Wednesday, 15 April 2015

Should you buy a NeoPrep (or any other NGS automation)

Illumina launched NeoPrep at AGBT and Keith Robison at OmicsOmics wrote a detailed summary of what Illumina say the NeoPrep is capable of (he compared this movie of the electro-wetting technology to video games of his, and my, youth), I thought I'd write down my thoughts on how this instrument might fit into labs like mine and possibly yours too.



The main selling point of NeoPrep is that it provides a one-stop solution for NGS library prep. The price point is pretty good (around £30-35k), and speed and quality  looked great in the data presented by Illumina at AGBT (by Gary Schroth and Kevin Meldrum, Illumina have run over 5000 libraries so far); so is NeoPrep a good option for every NGS lab? There are a couple of limitations I'll return to in a bit - but broadly speaking I can see that this system really could be a good, even a sensible, fit for many many labs running NGS. In a core lab or heavy NGS research lab the NeoPrep looks like  it will take some of the worry out of NGS library prep. And even is a lab that only does a dozen or so NGS experiments per year, removing the worry that library prep will go wrong with precious samples might be enough to warrant a purchase.

Because NeoPrep can be run in such a hands-off way, and because the quality and reproducibility of data are reportedly high (although we'll have to wait for  user reports over the next six months for confirmation of this), then a lab that is spending PostDoc or PhD time making small to medium sized numbers of libraries might well buy a NeoPrep where they would never have considered purchasing a liquid handling robot due to their complexity. This means Illumina might have hit the nail squarely on the head on this instrument.

For Core Labs the ability to offer library preps with an Illumina guarantee of quality is likely to be a positive, and something customers might approve of. And as Illumina release more library prep methods for NeoPrep the instrument might be perfect for those things you rarely get asked to do.

The positives: If the hands on time claims of just 30 minutes for sequencing ready libraries are true then NeoPrep is going to save people time and that is probably our most precious commodity. Labs like mine like big projects and we tend to batch smaller ones together, this can mean a wait for users while other samples come in to fill a 96well plate. NeoPrep would allow us to run projects as small as 16 samples. Alternatively it would allow us to provide automation solutions directly to users, running NeoPrep as a bookable instrument in the core.

The price per library is attractive and Illumina are aiming for parity with manual kits (which could generate a "why bother" attitude to manual prep). The TruSeq Nano kit, comes in at around $30 per sample, and TruSeq stranded mRNA around $55. GenomeWeb quoted Illumina as saying "the "fully loaded cost per sample" using NeoPrep would be around $75 per sample, which includes the cost of amortization of the instrument...assuming 1,000 samples are run per year." Compare that to the real cost of a top-flight post-doc spending two weeks making libraries for one experiment!

Price is not everything, so also encouraging is Illumina's data on reproducibility which appears to be very good in the results presented so far; concordance of 0.97 between RNA-seq libraries of varying inputs (100ng vs 10ng, and even 100ng vs 2ng). And the NeoPrep is being touted as requiring less starting material, in fact Illumina recommends starting with 25ng to 75ng of DNA for TruSeq Nano which is several thousand genome equivalents. For TruSeq mRNA, they recommend an input of just 25ng. They also tested down to 2ng but reported some drawbacks, including a lower yield and increased numbers of duplicates. I do worry about biases in low-input experiments and we previously showed a drop in sensitivity, although not specificity, as RNA input dropped from 100ng to 10ng in Illumina HT12 arrays. I guess we should repeat this experiment with TruSeq RNA on NeoPrep!

The negatives: The most obvious problem is that you can only run Illumina reagents on the NeoPrep and whilst they are competitively priced some competitors are significantly cheaper. The range of library preps is also very limited with just DNA Nano and TruSeq stranded mRNA at the moment (GenomeWeb quoted Illumina as saying the "PCR Free kit, followed by a "steady stream of protocols" [would begin] in the second half of the year and including targeted resequencing panels"). I know I'd like to see Nextera exomes ASAP; and knowing a timescale for ChIP-seq and ribozero would be great.

There is a huge range of methods that have been developed to run on Illumina sequencer (download Jaques Retief's amazing poster with almost 150 apps), most probably only a handful of these will ever see the light of day on a NeoPrep. But many of them use steps in Illumina's core library prep technology (end-repair, adapter-ligation), so if NeoPrep could be configured by users we might be able to make it do what we want. As an example we've been making some RNaseH libraries with NEB kits for ribo-depletion, we did this by eluting the ribodepleted RNA from Agencourt RNAClean XP beads directly into the 19.5 μl of Fragment, Prime, Finish Mix in the Illumina TruSeq stranded mRNA kit, then we carried on with the protocol from "Incubate RFP" to make multiplexed RNA-seq libraries.

If it is unclear what else might come, and when, then making a purchase decision is tougher. Perhaps more important is understanding what methods Illumina can not migrate to NeoPrep, I suspect anything that needs a gel is going to be difficult.

Some of the library prep kits from other companies are way better than Illumina's for specific applications. This is not because Illumina can't make those kits (unless IP stops them), but is probably more a case of Illumina looking to see which markets are largest, and possibly which competitors they want to crush. If someone does something Illumina can't, or won't, do then we'll not be running that on NeoPrep.

My biggest concern with NeoPrep is that it is a black (and white) box so user's don't need to know what is going on under-the-hood; you could equally say my labs library prep services. I am a strong believer that you (we) need to understand the library prep technology to innovate and NeoPrep might reduce the likelihood of new users doing their homework. It could also be argued that as long as you can ask a sensible question and can validate and interpret the results, then using a black-box to go from library-prep (NeoPrep) to results (BaseSpace or similar) does not matter. However much of the innovation in NGS methods has come from tweaks to library prep methods, and I'd hate to see a slow down in this space.

Lastly 16 samples at a time could be limiting if it is not easy to run a 96plex experiment in four runs! High-plexity sequencing rocks!

How to choose if NeoPrep is for you: Illumina have data on their blog and website, they also have an unbiased buyers guide to laboratory automation that is probably worth reading if you're thinking "should I buy a NeoPrep or a Agilent/Beckman/Hamilton etc?". Ultimately you need to consider the number and type of libraries you want to make over the next couple of years. You might decide that kits like Thruplex, KAPA Hyper+ or Lexogen make library prep so simple that you don't need a robot at all.

Your needs will ultimately guide you and my thoughts in this post are very squarely for Human genomics and transcriptomics. If you are working with small genomes then other technologies to reduce costs by orders of magnitude are probably more interesting e.g. high-plexity mRNA-seq with a single library prep.

Data from my lab: I'd love to be able to share RNA-seq data from my lab with you but I can't; because there appear to be no demo units available! However I did run through the demo instrument at AGBT and the run setup wizard was a longer version of the same wizards on HiSeq, MiSeq and CBot. I could see myself starting 16 RNA-seq libraries first thing in the morning and then doing the day-job while NeoPrep makes the libraries. I could also see a lab like mine wanting three instruments so we can run 48 samples in a batch, hopefully they will rack nicely to save bench space.

Wednesday, 8 April 2015

Rise of the Nanopore

Nature Methods recently carried a News & Views article from Nick Loman and Mick Watson: “Successful test launch for nanopore sequencing”, in which they discuss the early reports of MinION usage; including a paper in the same issue of Nature Methods (Jain et al). They recall the initial “launch” by Clive Brown at AGBT 2012; this caused a huge amount of excitement, which has been tempered by the slightly longer wait than many were hoping for. Nick ‘n’ Mick suggest that a “new branch of bioinformatics” is coming dedicated to the nanopore data (k-mers), which is very different compared to Sanger or NGS data (bases).

Jain et al: The paper in Nature Metohds from Mark Akeson’s lab at UCSC presents the sequencing of the 7.2kb genome of M13mp18 (42%GC) and reported 99% of 2D MinION reads (the highest quality reads) mapping to the reference at 85% raw accuracy. They presented a SNP detection tool that increased the SNP-call accuracy up to 99%. To achieve this they modelled the error rate in this small genome at high-coverage; 100% accuracy might be impossible in homopolymer regions where the transition between k-mers is very very difficult to interpret, but for much of the genome MinION looks like it will be usable. Whether this approach will work for targeted sequencing of Human genomes will be something I’ll be working on myself.

In the paper they also reported very long-read sequencing of a putative 50-kb assembly gap on human Xq24 containing a 4,861-bp tandem repeat cluster. They sequenced a BAC clone and obtained 9 2D reads that spanned the gap allowing them to determine the presence of 8 repeats, confirmed by PFGE and “short” 10kb-reads from fragmented BAC DNA. 

The future looks bright for MinION: Jain et al discuss the rapid rate of improvement in MinION data quality, and Nick ‘n’ Mick also mention this when talking about why they're so upbeat about the MinION (hear it directly, both are speaking at the ONT "London Calling" conference). Their main reason is the success of the MinION Access Program in its first year (e.g. Jain et al reported the increase in 2D reads due to changes in sequencing chemistry from June (66%), July (70%), October (78%) and November (85%); and Loman published a bacterial genome after just 3 months in the MAP demonstrating the improvements in chemistry); they also point out that very long-reads allow access to regions of the genome off-limits to short-read technologies; and they mention the hope of direct base-modification analysis, direct RNA-seq and protein sequencing. Jain et al also discuss the possibilities of detecting epigenetic modifications, etc. These all seem a very long way off to me, but with so many labs participating in the MAP who knows how soon we’ll be reading about these applications? 

Jain et al and Nick ‘n’ Mick both mention the miniature size of the MinION and its portability. It is certainly small, I accidentally took mine home after a meeting because it was in my pocket! If this portability can move sequencing from the bench-to-bedside then MinION could be the first point-of-care diagnostic sequencer. It may be premature to suggest this, but many cancer researchers would love to sequence DNA directly from blood with as little time in-between collection and sequencing – if Clive’s AGBT 2012 claim “that sequencing can be accomplished directly from blood” proves to be accurate then this may just be a matter of technology (mainly sample prep) maturation.

I agree that the future looks bright for MinION. ONT tried something quite different with the MAP, this was a risk but is one that seems to be paying off. Year two is likely to see many many more publications from the large number of MAPpers.

Disclosure: I am a participant in the MinION Access Program.

PS: You can find a few MinION’s on the Google Map of NGS.

Thursday, 2 April 2015

BGI sells off Illumina HiSeq instruments

Looks like I missed out on a bargain, the BGI has sold all their HiSeq 2500 instruments  on eBay, but the auction ended yesterday.


PS: Is the 454 on eBay for £7000 an April fool?

Tuesday, 31 March 2015

Book Review: Bang wongs "Visual Strategies for Biological Data"

I've written before about how much I liked Nature Methods “Points of View” by Bang Wong and I created a public Mendeley group so you could access the papers. I'd also said that having the articles collected together as a hard-copy version would be great.

Now available is Nature Collections: Visual Strategies for Biological Data "this
e-book collects the Points of View columns published in Nature Methods through February 2015, providing practical advice on effective strategies for visualising biological data to researchers in the biological sciences."
 
Enjoy. 






Friday, 20 March 2015

Oxford Nanopore MinION for ctDNA sequencing


A great poster at AGBT was presented by Boreal Genomics and available on the Nanopore wiki for MAPpers. In A nanopore liquid biopsy Patrick Davies describes their combination of the Boreal On-Target with ONT MinION sequencing to detect mutant allele fractions in ctDNA of sub 0.1%. I spoke briefly to Andre Marziali (Boreal Founder & CSO) about the work and summarise the poster here.

Saturday, 14 March 2015

Fancy working in my lab?


I've currently got three positions open in my lab and thought I'd use this blog as another way to get the message out to prospective candidates. Two people recently moved onto new jobs; one in Inivata (the first spin-out from CRUK-CI) and one at AbCam, and another person was recently promoted. We're also busy so we're also recruiting for a six-month temporary contract to help out with the sequencing services.

If you want to see what the lab does please take a look at our lab website and you may have seen us on Twitter.

The posts:

The posts will all be involved in providing Next-Generation Sequencing and library preparation services; including nucleic acid and library quant (with KAPA); setting up, monitoring and troubleshooting Illumina HiSeq, NextSeq and MiSeq sequencers; and library prep using a diversity of methods, such as Exome-seq, ChIPseq, and RNASeq - we do a lot of RNA-seq and Exomes. The senior post wil be responsible for the day-today operational management of the NGS service, and will work alongside their counterpart running the library prep services.

The Genomics core has been operational for 8 years and we've focused on NGS for 7 of those; it is an experienced lab doing exciting work with a diverse set of users from across the Cambridge, although our primary focus is on Cancer Research methods for scientists at the CRUK funded Cambridge Institute.

Please follow the links for details on applications rather than contacting me directly.

Thanks.

James.

PS: Closing date for all posts is 27 March 2015.


Friday, 13 March 2015

A better way to sequence exomes?

I caught up with a new company on the target capture scene, Directed Genomics, at AGBT. Their approach is based on a simple idea: if you want to sequence exomes, why not capture only exons?

Most exome-seq methods (Illumina, Agilent, Nimblegen) use oligo-baits to pull-down adapter-ligated fragment libraries, with fragments of 200-300bp. As exons are only 170bp long (80–85% Human exons less than 200bp Zhu et al & Sakharkar et al) we sequence lots of near- or off-target bases. These can be used (cnvOffSeq for instance), but are to some degree wasted sequencing.

The Directed Genomics approach: similar to other exome capture companies Directed Genomics also uses a probe hybridisation to targeted regions and/or exons, but applies this in a very different manner than we’re used to with standard exome capture. Two methods are presented in their recent posters; the first uses two probes, one at each end of the exon; the second uses a single probe hyb and random 5’end to create molecularly identifiable libraries. Current plans appear to be for custom panels, but hopefully they'll to build out to a whole exome panel over time.

Directed Genomics workflows

1: In their dual-probe method a short 50bp biotinyated-oligo probe is hybridised to fragmented gDNA at the 3’ end of an exon, the sequence upstream of this is then enzymatically digested and the 3’ hairpin adapter ligated. Next a second 50bp probe is hybridised to the 5’ end of the exon, the 5’ end is blunted and a 5’ adapter is ligated. Rather cleverly the hairpin adaptor ligated at the 3' end of the target links the target to the probe, allowing for a heat step in the second probe hybridisation without losing the target. Finally the 3’ hairpin is cleaved releasing products for PCR amplification and sequencing that contain only targeted exonic sequences. On-target rates of 97% were reported in their AGBT poster.

2: In their single-probe method a short 50bp probe is hybridised to fragmented gDNA at the 3’ end of an exon, the sequence upstream of this is then enzymatically digested and the 3’ adapter ligated. The probes is then extended to create the complementary strand and a 5’ adapter is ligated to the blunt end. This creates a library with random 5’ ends enabling a duplicate filtering step, unlike PCR approaches.

The protocols are both same-day 6-8 hours with around 1.5 hours hands-on time (according to the posters). Both allow a certain amount of, or all of the off-target sequence to be removed, reducing the amount of sequencing wasted. However the variation in exon length means that some sequence is inevitably lost.

Molecular IDs in cell free DNA: Their single-probe method creates libraries with in-built molecular ID. The random nature of the 5’ end should allow removal of all PCR duplication, without affecting biological duplication too much. Adding a  molecular identifier to the 3’ probe would increase this even further; and also bring molecular ID to the of the dual-probe method.

These molecular ID’s are likely to become increasingly important in methods to call low-frequency mutations in cell-free DNA applications, particularly ctDNA. Current methods make use of deep-sequencing to call mutations just below 1% MAF (mutant allele freq). However simply sequencing deeper may not be enough to get under 0.1%. A MAF of 0.1% would require sequencing to >10,000x to have enough mutant allele reads; and PCR, clustering and sequencing errors all make the detection harder.

Adding a molecular identifier should allow us to develop better statistical methods to call lower and lower MAF. Ultimately we aim to get to a point where we are restricted more by the presence of mutant alleles in a sample than by the technology used to capture and sequence them.

Directed Genomics and cell free DNA: The AGBT poster contained results from the Horizon Diagnostics Multiplex Reference Standard (link). Correlations of observed vs expected allele frequencies were >0.91. This is one of the first methods that can target mutant alleles with a single oligo, as compared to the two used for PCR amplicon sequencing, e.g. TAM-seq. It should mean an increase in sensitivity as more ctDNA molecules can be captured and amplified.

Directed Genomics expects to be launching later in 2015.

Thursday, 12 March 2015

Combining high-throughput CRISPR with in silico cancer drug development

In my last post I wrote about a computational screen of TCGA data and its use in repurposing approved drugs and/or finding new drug candidates for cancer patients. The work demonstrated the possibilities for finding novel treatments, but I also pointed to a cautionary Vemurafenib study that showed poor performance repurposing the drug in Colorectal cancer. As it becomes easier to identify novel therapies in a high-throughput manner, we need to develop methods to test these the are equally high-throughput. CRISPR knock-out or mutation of cancer drivers in multiple cancer cell lines or in tumour xeongrafts is one possibility - but most groups have carried out only a handful of knock-out or genome editing experiments.