CoreGenomics: How do you count reads from a next-gen sequencer?

We’ve been asked a question many times over the years “should you count paired-end sequencing as one read or two?”

“Who cares” I hear you cry. But if you ask someone to give you back 200M reads for a sample and they give you 100M paired-reads who’s right? Again this may not be important to you but if you have to pay for 100M reads when you were expecting 200M you’re going to feel short-changed. And when those 100M might have cost £500 or more you care about the change!

My personal view is that it is better for us to count the number of molecules we sequenced from the initial library. This currency tells us something about how deeply we sequenced one sample compared to another. In the case of Illumina sequencing that means counting clusters (raw or PF is a whole other debate), for Ion Torrent I guess it would be positive wells (probably PF reads).

We get about 160-180M reads per lane on our HiSeq 2000, and we count a ‘pair’ of reads as a single data point. That is to say, a single end run or a paired end run with the same cluster density will give the same "read" count. This turns out to be a useful when we want to compare performance of single-end and paired-end runs. I’m happy to listen to the point of view that says their are actually two sequencing reads generated in the paired-end run but I find it adds to the confusion new users have. I know others do too as I’ve been to many talks where people have quoted the number of reads they get for an instrument and I know it was way too high to be possible (my suspicion being a paired-end run was quoted).

The nice simple metric number of clusters per lane works well for me in most cases. This also allows me and my users to compare between different instruments easily; GAIIx 40M, HiSeq 170M, MiSeq 15M, etc.

Unfortunately in trying to decide whether to upgrade to HiSeq 2500 and use rapid runs it gets confusing as I really need to consider the number of clusters per unit of time I can sequence to determine if the experiment will be cheaper on multiple rapid runs rather than one standard run! Instrument amortisation and maintenance charges are high so the more runs I can do in a week the better, I think. The life of a core lab manager is full of exciting stuff like this.

CoreGenomics

Pages

Monday, 14 October 2013

How do you count reads from a next-gen sequencer?

No comments:

Post a Comment