CoreGenomics: 3TC-seq: differential expression from degraded RNA

Joakim Lundeberg at the Science for Life Laboratory, Stockholm, Sweden has published a nice paper in PLoS One on the impact of RNA degradation on the quality of RNA-seq and differential expression results: Sigurgeirsson, Emanuelsson & Lundeberg: Sequencing Degraded RNA Addressed by 3' Tag Counting. PLOS ONE 2014. They used high-quality cell line RNA degraded to specific RINs by metal hydrolysis to demonstrate that "RIN has systematic effects on gene coverage, false positives in differential expression and the quantification of duplicate reads" and provide a computational method for low RIN DGE analysis that most importantly keeps false positives low. Whilst they demonstrate pretty good sensitivity most users are likely to be affected more by false positives in low quality RNA experiments, they will almost certainly come to the experiment understanding there will be limitations and I find sensitivity is usually traded in place of specificity.

Why does RIN vary: anyone who's extracted RNA has probably run an Agilent BioAnanlyser, or a denaturing agarose gel if you're old enough or don't have a Bioanalyser close by. When you look at RNA on a gel there is often variation in the intensity of the band/smear, 18S and 28S peaks usually dominate in high-quality RNA. Agilent use 9 specific regions of their electropherogram to calculate the RNA Integrity Number (RIN) and this has become the default method for RNA quality assessment. Agilent Technologies used to have an RNA Integrity Database as a repository of Agilent 2100 Bioanalyzer which users could access for free and compare their own results to validated examples from over 650 total RNA runs including human, mouse, rat, plants. Unfortunately it seems to have disappeared, can anyone point me back to it?

Whilst we do try to use samples with high RIN there are often times when this is not the case. Low quality RNA is fine in many applications as long as the user is aware of some caveats: a 3' bias if using oligo-dT priming and the difficulty of comparing sample groups with different RINS.

The experiment: Anyone who's tried to fragment RNA to a defined RIN may well have struggled. RNA is very labile and can easily be turned into RIN 3 or lower but getting a nice distribution can be tough. The group used NEBNext Magnesium RNA Fragmentation reagents and different conditions to achieve RINS of 2, 4, 6, 8 and 10. Figure 2 (below) shows how the 18S & 28S peaks decrease and there is an increase in small RNA products (fragmented RNA) as RNA degrades. They also showed that a degradation temperature of 74C gave a more linear change in RIN than the higher temperatures more commonly used.

They made libraries using TruSeq and sequenced using paired-end 100bp sequencing (I'd point them to an earlier post from this month! Although they did downsample to just 20M reads for DGE analysis.) They marked but did not remove duplicate reads and used HTSeq for counting reads and DESeq for differential expression analysis.

They used only a defined length of each transcript, set at 1500, 1000, 500 and 200bp in their tag counting. Without this then the more degraded RNAs lose gene counts as only their 3' ends are retained compared to high-quality RNAS with full-length (or almost mRNAs) length restriction reduces the number of genes labelled as expressed, i.e. it decreases sensitivity.

Comparison of degraded RNA: The data show very clearly that comparing RIN 10 to 8 results in reasonable numbers of DGE calls. Of course we'd argue very strongly that users should never attempt to do this, especially if groups are confounded by RNA quality. However the use of a variable defined transcript length (200-1500bp) in the tag counting allows them to demonstrate that these false-positive DGE calls can be almost entirely removed maintaining specificity e.g. RIN 10 vs 8 generated 4344 DEGs without length restriction, but DEGs drop to 10% of that figure when a 1500bp restriction is applied and to just 2 at 200bp. Sensitivity remains high until the 200 nt length restriction (sensitivity is the ability of the method to call genes as expressed, see Methods for definition of sensitivity).

Beware of ribosomal reduction in low RIN: It should be obvious but the group show that attempting to use ribosomal depletion methods like RiboMinus on degraded RNA is not generally a good idea. Because the ribosomal RNAs are degraded along with mRNAs only the RNA with homology to the depletion probes will be removed.

Lastly in their discussion the authors make a similar observation to mine: that "the majority of all archived RNA sequence data to date is derived from poly A selection", oligo-dT enrichment of mRNAs works, people understand it and it is a popular method.

CoreGenomics

Pages

Wednesday, 26 March 2014

3TC-seq: differential expression from degraded RNA

No comments:

Post a Comment