What do 48 replicates tell you about RNA-seq DGE analysis methods: that two the most widely‐used of the tools DESeq and edgeR are probably the best tools for the job*. These two tools also top the rankings of RNA-seq methods as assessed by citations with 1204 and 822 each. These are conclusions in probably the most highly replicated RNA-seq study to date**. The authors aimed to identify the correct number of replicates to use and concluded that we should be using ~6 replicates for standard RNA-seq, and we should consider increasing this to ~12 when identifying DGE irrespective of fold‐change.
The paper is a very good one and certainly makes comforting reading in a place where people are open to increasing replicates. It also supports two messages we’ve been giving out in our Tuesday afternoon experimental design clinics for many years; with n=3 you only need one to drop out and you’re screwed, and more replicates but fewer reads per replicate = an acceptable level of cost increase!
But the paper also got me wondering about what made DESeq and edgeR so popular? Why does a bioinformatics tool sometimes end up dominating a field? Is it because the dominant tool has the best statistical approach? Is it easy to use? After talking to a couple of bioinformaticians here the answer seems to be a combination of the following:
If a bioinformatician develops a tool to help answer a biological question and that tool is published in the high-impact biology paper then they have lost the 1st author slot, but potentially gained in many other ways.
*The same conclusion was made about the quality of DESeq and edgeR results by researchers at the Queensland Brain Institute in 2014.
**The data for the S.cerevisiae experiment are available on ENA: project ID PRJEB5348. Samples were processed in four batches of 24 samples with 12 of each strain in each batch using Illumina’s TruSeq (stranded) mRNA kit. Seven pools were created and each was run in a lane as single-end51bp. This is a goldmine for looking at batch effects, albeit in an almost perfect experimental situation. This is discussed in a paper published alongside this one by the same group, and you should check out Geoff Barton's Blog for more...
The paper is a very good one and certainly makes comforting reading in a place where people are open to increasing replicates. It also supports two messages we’ve been giving out in our Tuesday afternoon experimental design clinics for many years; with n=3 you only need one to drop out and you’re screwed, and more replicates but fewer reads per replicate = an acceptable level of cost increase!
But the paper also got me wondering about what made DESeq and edgeR so popular? Why does a bioinformatics tool sometimes end up dominating a field? Is it because the dominant tool has the best statistical approach? Is it easy to use? After talking to a couple of bioinformaticians here the answer seems to be a combination of the following:
- The tool is actually quite good at what it was designed to do
- The tool is easy to use, is implemented well and can be configured for advanced use, is well documented, and is supported - ideally by the person that wrote it, but if made open access support might come from the community e.g. BioConductor (described very comprehensively in this paper).
- The tool is published in a high-impact paper (not usually a methods paper)
If a bioinformatician develops a tool to help answer a biological question and that tool is published in the high-impact biology paper then they have lost the 1st author slot, but potentially gained in many other ways.
*The same conclusion was made about the quality of DESeq and edgeR results by researchers at the Queensland Brain Institute in 2014.
**The data for the S.cerevisiae experiment are available on ENA: project ID PRJEB5348. Samples were processed in four batches of 24 samples with 12 of each strain in each batch using Illumina’s TruSeq (stranded) mRNA kit. Seven pools were created and each was run in a lane as single-end51bp. This is a goldmine for looking at batch effects, albeit in an almost perfect experimental situation. This is discussed in a paper published alongside this one by the same group, and you should check out Geoff Barton's Blog for more...
No comments:
Post a Comment
Note: only a member of this blog may post a comment.