A recent PLoS One paper is perhaps the latest battle in the Illumina vs LifeTech war. The latest paper in PLOS ONE presents a comparison of MiSeq and Proton PI sequencing to detect chromosome abnormalities in self-aborted foetuses: Chen et al: Performance Comparison between Rapid Sequencing Platforms for Ultra-Low Coverage Sequencing Strategy .
Some comments and analysis from the exciting and fast moving world of Genomics. This blog focuses on next-generation sequencing and microarray technologies, although it is likely to go off on tangents from time-to-time
Thursday, 27 March 2014
Wednesday, 26 March 2014
3TC-seq: differential expression from degraded RNA
Joakim Lundeberg at the Science for Life Laboratory, Stockholm, Sweden has published a nice paper in PLoS One on the impact of RNA degradation on the quality of RNA-seq and differential expression results: Sigurgeirsson, Emanuelsson & Lundeberg: Sequencing Degraded RNA Addressed by 3' Tag Counting. PLOS ONE 2014. They used high-quality cell line RNA degraded to specific RINs by metal hydrolysis to demonstrate that "RIN has systematic effects on gene coverage, false positives in
differential expression and the quantification of duplicate reads" and provide a computational method
for low RIN DGE analysis that most importantly keeps false positives low. Whilst they demonstrate pretty good sensitivity most users are likely to be affected more by false positives in low quality RNA experiments, they will almost certainly come to the experiment understanding there will be limitations and I find sensitivity is usually traded in place of specificity.
Monday, 24 March 2014
5mC-PCR: preserving methylation status during polymerase chain reaction
Methylation analysis is hampered by the simple fact that PCR amplification removes methylation marks from native DNA. We came up with a simple idea to produce a thermo-stable DNA methyltransferase to preserve methylation status through PCR cycles, allowing amplification of DNA and simplified analysis. The first thing we did was approach some enzyme companies to see if anyone had something suitable on their books, they did not. But they did seem to think this was a good idea so we designed a pretty simple experiment to test it: this involved going back to basics, to how PCR was performed before the application of Taq polymerase - we added Dnmt1 after each amplification cycle to copy methylation marks onto the daughter strands.
Friday, 21 March 2014
Some help with your stats
Stats: not everyone's favourite subject but something we can't avoid so understanding the basics is a very good idea. We're lucky in my Institute having biostatistical support in our Bioinformatics core facility and try to have a statistician with us
every time we design a genomics experiment. The same questions come up
time and time again, how many samples and how deep to sequence, we're slowly
getting answers, but the experience we're building up helps nearly every time. I also find other sources of information can be really helpful and have listed a couple of them below.
Wednesday, 19 March 2014
Can RNA-seq stop Tour de France dopers?
The BBC ran an article a few weeks ago on the possibility of performance enhancing genetics: think Team BMC Genomics! The piece has an interview with Dr Philippe Moullier from INSERM in Nantes, he was part of a group that published a paper describing "Neo-organ" gene therapy treatment of neuromuscular diseases by the introduction of the erythropoietin gene into mice.
For those of you that easily forget: EPO has a rather bad rap in cycling, just ask the UCI or Lance Armstrong!
Tuesday, 18 March 2014
How long can you store your NGS libraries: DNA adsorption demystified
We've been running Illumina NGS in my lab for six and a half years and have almost 8000 libraries in our -20 going back almost to the start, to number 300 in fact. Occasionally we have users requesting us to rerun these old libraries and we'd normally re-quantify these before putting them back on the sequencer.
More often we get users asking us for another lane a few weeks or months after the first run. In these instances until recently we'd still re-quantify if the last lane was more than a month ago and that meant a lot of extra real-time PCR. Michelle in my group took libraries out of the freezer and compared qPCR results to the original quant. The results were pretty clear, quantification values do not change significantly over a period of up to a year.
Saturday, 15 March 2014
Cheaper RNA-seq: but you might have to give up strandeness and splicing
I'm going to start this post by saying something that may be
controversial "splicing analysis is a niche, strandedness is useful in a
minority of cases and that most RNA-seq users are only interested in
getting Affy-like 3' biased differential gene expression data".
I'm constantly looking at how we perform RNA-seq analysis for differential
gene expression (DGE) analysis with the major goal of making it cheap
enough that replication levels increase from the commonly used 3
replicates to a more powerful 6+ replicates. Obviously users are put off
doubling the cost of their experiment when everyone's been running
triplicate as the maximum level of replication for the last ten years or
more, so ideally we'll keep experimental costs the same while trying to improve the
quality.
Now I do believe RNA-seq has made a dramatic difference to the cost of performing DGE
experiments and is way cheaper than Affy arrays used to be. But a
side-effect of moving over to RNA-seq is that most scientists are very
interested in science and read widely so when they see papers exclaiming
all the possibilities RNA-seq brings they'd like to get as much as
possible from their own RNA-seq experiments.
What can you do with RNA-seq: i)
differential gene expression in an (almost) unbiased manner, ii)
analyse splicing isoforms using splice-junction reads, iii)
identify the full extent of the transcriptome including non-coding,
antisense and over-lapping transcripts often making use of strand
information.
The challenge with the first statement is
that there is always bias experiments, just because RNA-seq does not
suffer from the problems microarrays has/had; of sensitivity, reliance
on an annotated transcriptome, limitations of design/use of
oligonucleotide probes, etc, does not mean RNA-seq isn't biased. The
choice of time-point of cell-type in any experiment adds an immediate,
although hopefully informed, bias, RNA extraction methodology will have
an impact and RNA-seq library preparation, and sequencing will add other
layers of bias on top.
Statement two gets many
biologists salivating. We've all read some fantastic papers showing the
biological importance of splicing and would like to be able to show
similar results in the systems we're studying. However splicing analysis
is limited by the fact that today we can only sequence through a single
splice junction in each read meaning we need to infer which isoforms
might be present, and if there is differential isoform usage. The
competing methods of analysis work in quite different ways: cufflinks use of the Bowtie+TopHat spliced alignment versus DEX-seq's analysis of differential exon usage by couting reads in individual exons.
Statement three has been massively impacted by
the introduction of stranded RNA-seq protocols. Methods that preserve
the information about which strand the transcript came from have allowed
an even more detailed view of the transcriptome than we have ever had
especially in the cases of transcripts that overlap in genome location
and the extent and biological importance of
antisense transcription.
What does this have to do with making RNA-seq cheaper: Unfortunately ii) and iii) add significant complexity to the design, cost and analysis of RNA-seq experiments. The impact of splicing analysis is relatively simple to explain to users,
we need many more reads and this increases the cost of the experiment
in a linear fashion and users can decide if the splicing analysis is
"worth" it. But the impact of getting strand information is less easy to
explain (certainly for me in mammalian systems), very few of us are strand-savvy and we're most likely to ignore the stand information other than making nicer figures for presentations to show how well the method worked. Strand information does not come free-of-charge, the most popular method for stranded RNA-seq uses dUTP and uracil-DNA glycosylase (UDG) to remove the second-strand cDNA after ligation of sequencing adapters. Strand-preserving methods cost more to manufacture than non-stranded methods and hence cost more for end users.
The cost of RNA-seq in 2014: I recommend 10-20M single-end 50bp reads per sample for DGE studies and that allows 10 samples per lane on a HiSeq 2500/2000. At about £600 per lane this makes sequencing £60 per sample. Library prep reagents and labour are about £35 each, making RNA-seq one of the few NGS applications where sequencing costs are lower than sample prep.
Ideally we'd maintain a 10 fold difference between sequencing and library prep, but this would mean bringing prep down to under £15 or $10 per sample. Is it possible to squeeze oligo-dT mRNA enrichment, cDNA synthesis and an Illumina prep into this budget? I'm not sure this is possible outside of massive labs that can afford to make their own kits and buy enzymes in bulk.
We're thinking about how we might approach RNA-seq prep to keep costs as low as possible. Home-brew is certainly going to get us some way to $10 per prep, but some very creative thinking may be necessary. Would you give up strandedness for an RNA-seq kit that gave perfectly good DGE data but cost half as much as the leading stranded protocol?
Would it make much difference to your next experiment: not if you once you think carefully about your experiment you can clearly state that splicing and strandedness are not going to be primary areas for analysis. I do think most scientists don't have these uppermost in their minds when embarking on RNA-seq experiments and a quick search in PubMed might back me up: there are 3148 hits in PubMed for RNA+Seq or 2359 hits for "RNA-seq", but only 3% (73) come up with the terms RNA+seq+strand+specific, so strandedness is not yet massive, which was a surprise to me given the reception Josh Levin's 2010 Nat Methods paper: Comprehensive comparative analysis of strand-specific RNA sequencing methods got.
So here's my plea to RNA-seq users: think about what you really want to get from your RNA-seq experiments and only aim for "expensive" splicing experiments with 20-50M reads per sample if you really need to, think about whether you'd give up other features to get faster, cheaper RNA-seq experiments; and save the money and effort for the experiment that needs it.
Wednesday, 5 March 2014
in-situ RNA-seq for digital pathologists
An awesome demonstration of what's possible when people think outside the box: Highly Multiplexed Subcellular RNA Sequencing in Situ in Science this week demonstrates in situ RNA-seq of single cells. FISSEQ (fluorescent in situ RNA sequencing) was developed in the lab of George Church who's been involved in DNA sequencing for a very long time. Currently the technique is limited to a 30bp read and takes days to complete.
This video is a zoomed in view of primary fibroblasts for 30 sequencing cycles.
Subscribe to:
Posts (Atom)