CoreGenomics: March 2014

Thursday 27 March 2014

Illumina vs LifeTechnologies: the latest instrument comparison

A recent PLoS One paper is perhaps the latest battle in the Illumina vs LifeTech war. The latest paper in PLOS ONE presents a comparison of MiSeq and Proton PI sequencing to detect chromosome abnormalities in self-aborted foetuses: Chen et al: Performance Comparison between Rapid Sequencing Platforms for Ultra-Low Coverage Sequencing Strategy .

3TC-seq: differential expression from degraded RNA

Joakim Lundeberg at the Science for Life Laboratory, Stockholm, Sweden has published a nice paper in PLoS One on the impact of RNA degradation on the quality of RNA-seq and differential expression results: Sigurgeirsson, Emanuelsson & Lundeberg: Sequencing Degraded RNA Addressed by 3' Tag Counting. PLOS ONE 2014. They used high-quality cell line RNA degraded to specific RINs by metal hydrolysis to demonstrate that "RIN has systematic effects on gene coverage, false positives in differential expression and the quantification of duplicate reads" and provide a computational method for low RIN DGE analysis that most importantly keeps false positives low. Whilst they demonstrate pretty good sensitivity most users are likely to be affected more by false positives in low quality RNA experiments, they will almost certainly come to the experiment understanding there will be limitations and I find sensitivity is usually traded in place of specificity.

5mC-PCR: preserving methylation status during polymerase chain reaction

Methylation analysis is hampered by the simple fact that PCR amplification removes methylation marks from native DNA. We came up with a simple idea to produce a thermo-stable DNA methyltransferase to preserve methylation status through PCR cycles, allowing amplification of DNA and simplified analysis. The first thing we did was approach some enzyme companies to see if anyone had something suitable on their books, they did not. But they did seem to think this was a good idea so we designed a pretty simple experiment to test it: this involved going back to basics, to how PCR was performed before the application of Taq polymerase - we added Dnmt1 after each amplification cycle to copy methylation marks onto the daughter strands.

Some help with your stats

Stats: not everyone's favourite subject but something we can't avoid so understanding the basics is a very good idea. We're lucky in my Institute having biostatistical support in our Bioinformatics core facility and try to have a statistician with us every time we design a genomics experiment. The same questions come up time and time again, how many samples and how deep to sequence, we're slowly getting answers, but the experience we're building up helps nearly every time. I also find other sources of information can be really helpful and have listed a couple of them below.

Can RNA-seq stop Tour de France dopers?

The BBC ran an article a few weeks ago on the possibility of performance enhancing genetics: think Team BMC Genomics! The piece has an interview with Dr Philippe Moullier from INSERM in Nantes, he was part of a group that published a paper describing "Neo-organ" gene therapy treatment of neuromuscular diseases by the introduction of the erythropoietin gene into mice.

For those of you that easily forget: EPO has a rather bad rap in cycling, just ask the UCI or Lance Armstrong!

How long can you store your NGS libraries: DNA adsorption demystified

We've been running Illumina NGS in my lab for six and a half years and have almost 8000 libraries in our -20 going back almost to the start, to number 300 in fact. Occasionally we have users requesting us to rerun these old libraries and we'd normally re-quantify these before putting them back on the sequencer.

More often we get users asking us for another lane a few weeks or months after the first run. In these instances until recently we'd still re-quantify if the last lane was more than a month ago and that meant a lot of extra real-time PCR. Michelle in my group took libraries out of the freezer and compared qPCR results to the original quant. The results were pretty clear, quantification values do not change significantly over a period of up to a year.

Cheaper RNA-seq: but you might have to give up strandeness and splicing

I'm going to start this post by saying something that may be controversial "splicing analysis is a niche, strandedness is useful in a minority of cases and that most RNA-seq users are only interested in getting Affy-like 3' biased differential gene expression data".

I'm constantly looking at how we perform RNA-seq analysis for differential gene expression (DGE) analysis with the major goal of making it cheap enough that replication levels increase from the commonly used 3 replicates to a more powerful 6+ replicates. Obviously users are put off doubling the cost of their experiment when everyone's been running triplicate as the maximum level of replication for the last ten years or more, so ideally we'll keep experimental costs the same while trying to improve the quality.

Now I do believe RNA-seq has made a dramatic difference to the cost of performing DGE experiments and is way cheaper than Affy arrays used to be. But a side-effect of moving over to RNA-seq is that most scientists are very interested in science and read widely so when they see papers exclaiming all the possibilities RNA-seq brings they'd like to get as much as possible from their own RNA-seq experiments.

What can you do with RNA-seq: i) differential gene expression in an (almost) unbiased manner, ii) analyse splicing isoforms using splice-junction reads, iii) identify the full extent of the transcriptome including non-coding, antisense and over-lapping transcripts often making use of strand information.

The challenge with the first statement is that there is always bias experiments, just because RNA-seq does not suffer from the problems microarrays has/had; of sensitivity, reliance on an annotated transcriptome, limitations of design/use of oligonucleotide probes, etc, does not mean RNA-seq isn't biased. The choice of time-point of cell-type in any experiment adds an immediate, although hopefully informed, bias, RNA extraction methodology will have an impact and RNA-seq library preparation, and sequencing will add other layers of bias on top.

Statement two gets many biologists salivating. We've all read some fantastic papers showing the biological importance of splicing and would like to be able to show similar results in the systems we're studying. However splicing analysis is limited by the fact that today we can only sequence through a single splice junction in each read meaning we need to infer which isoforms might be present, and if there is differential isoform usage. The competing methods of analysis work in quite different ways: cufflinks use of the Bowtie+TopHat spliced alignment versus DEX-seq's analysis of differential exon usage by couting reads in individual exons.

Statement three has been massively impacted by the introduction of stranded RNA-seq protocols. Methods that preserve the information about which strand the transcript came from have allowed an even more detailed view of the transcriptome than we have ever had especially in the cases of transcripts that overlap in genome location and the extent and biological importance of antisense transcription.

What does this have to do with making RNA-seq cheaper: Unfortunately ii) and iii) add significant complexity to the design, cost and analysis of RNA-seq experiments. The impact of splicing analysis is relatively simple to explain to users, we need many more reads and this increases the cost of the experiment in a linear fashion and users can decide if the splicing analysis is "worth" it. But the impact of getting strand information is less easy to explain (certainly for me in mammalian systems), very few of us are strand-savvy and we're most likely to ignore the stand information other than making nicer figures for presentations to show how well the method worked. Strand information does not come free-of-charge, the most popular method for stranded RNA-seq uses dUTP and uracil-DNA glycosylase (UDG) to remove the second-strand cDNA after ligation of sequencing adapters. Strand-preserving methods cost more to manufacture than non-stranded methods and hence cost more for end users.

The cost of RNA-seq in 2014: I recommend 10-20M single-end 50bp reads per sample for DGE studies and that allows 10 samples per lane on a HiSeq 2500/2000. At about £600 per lane this makes sequencing £60 per sample. Library prep reagents and labour are about £35 each, making RNA-seq one of the few NGS applications where sequencing costs are lower than sample prep.

Ideally we'd maintain a 10 fold difference between sequencing and library prep, but this would mean bringing prep down to under £15 or $10 per sample. Is it possible to squeeze oligo-dT mRNA enrichment, cDNA synthesis and an Illumina prep into this budget? I'm not sure this is possible outside of massive labs that can afford to make their own kits and buy enzymes in bulk.

We're thinking about how we might approach RNA-seq prep to keep costs as low as possible. Home-brew is certainly going to get us some way to $10 per prep, but some very creative thinking may be necessary. Would you give up strandedness for an RNA-seq kit that gave perfectly good DGE data but cost half as much as the leading stranded protocol?

Would it make much difference to your next experiment: not if you once you think carefully about your experiment you can clearly state that splicing and strandedness are not going to be primary areas for analysis. I do think most scientists don't have these uppermost in their minds when embarking on RNA-seq experiments and a quick search in PubMed might back me up: there are 3148 hits in PubMed for RNA+Seq or 2359 hits for "RNA-seq", but only 3% (73) come up with the terms RNA+seq+strand+specific, so strandedness is not yet massive, which was a surprise to me given the reception Josh Levin's 2010 Nat Methods paper: Comprehensive comparative analysis of strand-specific RNA sequencing methods got.

So here's my plea to RNA-seq users: think about what you really want to get from your RNA-seq experiments and only aim for "expensive" splicing experiments with 20-50M reads per sample if you really need to, think about whether you'd give up other features to get faster, cheaper RNA-seq experiments; and save the money and effort for the experiment that needs it.

Wednesday 5 March 2014

in-situ RNA-seq for digital pathologists

An awesome demonstration of what's possible when people think outside the box: Highly Multiplexed Subcellular RNA Sequencing in Situ in Science this week demonstrates in situ RNA-seq of single cells. FISSEQ (fluorescent in situ RNA sequencing) was developed in the lab of George Church who's been involved in DNA sequencing for a very long time. Currently the technique is limited to a 30bp read and takes days to complete.

This video is a zoomed in view of primary fibroblasts for 30 sequencing cycles.

CoreGenomics

Pages

Thursday 27 March 2014

Illumina vs LifeTechnologies: the latest instrument comparison

Wednesday 26 March 2014

3TC-seq: differential expression from degraded RNA

Monday 24 March 2014

5mC-PCR: preserving methylation status during polymerase chain reaction

Friday 21 March 2014

Some help with your stats

Wednesday 19 March 2014

Can RNA-seq stop Tour de France dopers?

Tuesday 18 March 2014

How long can you store your NGS libraries: DNA adsorption demystified

Saturday 15 March 2014

Cheaper RNA-seq: but you might have to give up strandeness and splicing

Wednesday 5 March 2014

in-situ RNA-seq for digital pathologists