Wednesday, 21 May 2014

What should we call an overlapping read-pair from a paired-end fragment run?

HiSeq rapid is going to be getting paired-end 250bp reads. Assuming you can sequence with a 500bp insert (+/-50bp) then we can expect to get perhaps 200-300M pairs of overlapping reads that can be merged to create 200-300M 500bp sequences. I suspect this is going to have a transformative effect on genome assembly and transcript isoform detection and the development of novel methods is likely to come from longer and longer reads on the Illumina platforms.

Geoff Smith hinted that significantly longer reads might be possible from clustering of 1000-2000bp fragments. That opens the possiblity of capturing most structural variation or splice isoforms in a single library. Library prep could be a problem as you'll need high-quality nucleic acids, and there are lots of transcript under 1000bp in length. But the possibility should get some of us thinking about doing things differently.

So what to call these reads: I Tweeted whilst at the Illumina scientific summit last week about this upcoming development and called the read pairs contigs. That started off a Twitter conversation between Lex Nederbragt, Nick Loman and I about the terminology.

I do think having a term to describe these reads with would make discussions easier. Anyone who remembers comparing reads between Illumina (1 read per cluster) and Life Technologies (1 read per end of a fragment) will know that terminology is important.

So far the following terms are up for grabs: contig, mootig, μ-contigs, junctig (join), coagmentig (merge), partig (pair). I'll add microntig, sign off on this post and hand over to you!


  1. Ords. "300M reads should give you 150M ords". Non-overlaps can remain as 'reads' or be referred to as 'nords'. "300M reads gave me 120M ords, 60M nords."

  2. I had heard the term "sloptig" is already established?

  3. I've been loving my MiSeq overlapping 2x250 for a while now, am glad HiSeq rapid will get them now.

    I would avoid any naming with "tig" in it. That is best reserved for larger scale assembled sequences.

    The key phrases are overlapping, joined, and stitched, paired end.

    Maybe "stitched read" or "stitcher" or "ope" (overlapping paired end). I also like the "nord" and "ord" suggestion above.

    Our take the idea of the union of two mates, perhaps "married read", or "offspring" or "mated read" or "union read".

  4. Marreads, opes (overlapping paired end) and nopes (non-overlapping paired end). Keep the suggestions coming in and lets see what appears in manuscript form!