CoreGenomics: Oxford Nanopore's Direct RNA-seq - the killer app for bacterial genomics?

I'll finish my write up of the ONT London Calling event after the bank-holiday weekend but I wanted to get this post out as I think we came up with a killer app for MinION, or even SmidgION on your iPhone: direct RNA-seq from bacterial 16S rRNA. This was an idea I proposed in response to an answer about what one of the speakers was doing with Nanopores (they were working on 16S amplicons). In the ensuing discussion we decided that sequencing the ribosomal RNA directly (see Clive's announcement below) would allow interrogation of phylogenetic relationships and environmental diversity, quickly, and with close to zero bias from sample extraction and library prep.

ONT Updates at London Calling: In Thursday's keynote Clive Brown made a ton of exciting announcements including the first look at direct RNA-seq on the MinION. The first- release of the protocol is for poly-adenylated mRNAs and uses a modified ONT adapter with an oligo-dT tail, the motor protein and leader are incorporated pretty much as usual. With a bit of a twist this would be adapted to ribosomal RNA seq, possibly by poly-adenylating all RNAs before starting (but ultimately we'd want a protocol without poly-adenylation).

As ribosomal RNAs make up 90%+ of the RNA in a cell there would be maximum signal with minimal effort; no need to select specific RNA species, no need for amplification, just crack cells open, add adapters to the RNA and sequence. Add a WIMP workflow on the end and you'll soon know if that kebab has salmonella or not!

16S rRNA-seq: The current gold-standard for 16S rRNA analysis looks to be rRNA amplicon-seq on MiSeq and bioinformatic assessment of diversity, providing reasonably quick and reliable confirmation of family, genus, or species in the majority of cases. All bacterial genomes carry at least one copy of the 23S, 16S and 5S rRNA genes. The conserved regions in the rRNA sequences allow "universal" PCR primers (see Klindworth et al) to be designed for amplicon sequencing. But there are potential biases - we don't know the full set of rRNA sequences so could be missing a whole chunk of the diversity, suboptimal primers can lead to under-representation of some species/taxa, and also the analysis can be difficult to interpret where complex mixtures of bacteria are analysed. Today nearly all studies make use of the 16S rRNA only - losing the added value of the 5S and 23S variation.

A 2015 review of bacterial genomics highlighted the poor resolution that 16S rRNA can have in closely related species, but the authors suggested that it is still an important and useful tool for understanding bacterial community diversity. Full bacterial genomes and metagenomes improve on the relationship between species over 16S rRNA analysis, but even with cheap genomes there may be reasons to consider why developing a direct rRNA-seq method would be worth doing; especially if they can speed the detection of pathogenic bacteria.

The major limitations of 16S rRNA sequencing on NGS platforms so far has been the use of PCR and the read-length, which falls far short of the full-length 16S rRNA's 1500bp. Short-reads can limit taxonomic resolution, so the possibility of full-length rRNA-seq from Oxford Nanopore is exciting. The current error-rate of 10-15% may be too high, even with full-length sequences, but the R9 data presented at London Calling show that error rate is headed into the 1% territory sometime soon.

Given that many clinically relevant micro-organisms e.g. Mycobacterium tuberculosis are difficult to culture, the relative simplicity of direct rRNA-seq should be attractive. Hopefully it woudl allow a very fast workflow - potentially minutes from sample to answer if only a quick RNA extraction (e.g. Epicentre or Zymo) is required.

Bacterial rRNA copy number variation: Another issue is that rRNA copy number varies between taxa, which can lead to over-estimation of bacterial species. A paper in PLoS One aimed to use 16S amplicon data to determine if; related taxa had more similar copy number; if the variability of 16S rRNA copies within a genome increased with copy number; and what impact this might have on 16S rRNA analysis of bacterial communities. They showed the 15% of bacterial genomes contained more than one 16S rRNA copy, and 50% of genomes had 5+ copies, but that within a species the copy number was maintained. And also that those copies were very very similar - but not necessarily identical. The final figure in their paper (see below) showed how community abundance estimates changed depending on what was being used to do the estimation - "abundance estimates based on the 16S rRNA sequence counts...underestimate abundance of low 16S rRNA copy number taxa e.g. Acidobacteria and overestimates taxa with high 16S rRNA copy number e.g. Firmicutes."

From Větrovský & Baldrian PLoS One 2013

So what's the impact of copy-number? Well if there is a mix of three bacteria, A,B,C with different rRNA copy number say 1, 3, 6 then a simple interpretation of rRNA reads would suggest there were 3x more C than A and 2x more C than B; including copy number changes the interpretation about the bacterial community. As the 23S, 16S and 5S rRNA genes are all contained within a single cistron it may be that sequencing all three versus focusing on 16S will help resolve the impact of copy number. But even so the relationship between copy number level and rRNA transcript level is unlikely to be 1-2-1.

The future for direct RNA-seq: Clive suggested that the protocol would be out in users hands this year. I'd certainly like to see this being taken up for 16S rRNA-seq, but will watch out for it on the BioRxiv or at the next MinION community meeting in NYC. I'm excited about the prospects for RNA sequencing on MinION, and I think this is a real differentiator to Illumina, PacBio or other sequencing technologies. No-one ever did direct RNA-seq before, but in 12-18 months time I believe it could be as widespread as single-cell RNA-seq is now. Possibly even Nature Methods: Me thod of the Year 2016?

5 comments:

Chris Cole31 May 2016 at 09:15
Slight correction. Direct RNA-seq (DRS) has been done before by Helicos BioSciences and is still available from seqll.com. We used it in several experiments and liked it a lot:
10.1038/nsmb.2345
10.1371/journal.pgen.1003867
10.1371/journal.pone.0094270
10.1016/j.jaci.2014.04.021

I don't disagree with you that DRS with nanopore will be a big USP.
GANIT labs31 May 2016 at 10:06
james, yes, this will be cool. ont will not be the first one to achieve this, however. others have tried to do direct rna seq long back, if you remember using the long lost sequencer called helicos. pat milos did this and published a paper in nature in 2009 (http://www.nature.com/nature/journal/v461/n7265/full/nature08390.html). binay
James@cancer31 May 2016 at 20:00
Helicos did release DRS but the method was never highly published, presumably due to only generating short reads (120 cycles) over 3 days and generating only 40,000 reads (only half of which aligned). The most recent data certainly look more intersting with millions of reads per sample. I'd stopped following this a long time ago. Thanks for pointing out the papers.
Anonymous13 September 2016 at 17:08
Helicos did not do true DRS - it was sequencing by synthesis, which the ONT method is not.
James@cancer15 September 2016 at 10:58
But they did sequence RNA which was one small step forwards...maybe ONT's RNA-seq is "one giant leap"!

Note: only a member of this blog may post a comment.

CoreGenomics

Pages

Monday, 30 May 2016

Oxford Nanopore's Direct RNA-seq - the killer app for bacterial genomics?

5 comments: