My collaborator Nick Loman is lead author on a Nature Biotechnology paper released today. Performance comparison of benchtop high-throughput sequencing platforms. I have been involved in several performance comparisons in the past and the one question people ask when they read papers like this is which one should I buy. The paper does not come down firmly one-way-or-the-other so read it and make your own decisions.
Nick talks about the new batch of laser-printer sized “personal sequencing machines”; GS Junior from Roche, MiSeq from Illumina and PGM from Ion Torrent. There is a lot of competition in this market mainly between Ion Torrent and Illumina, which Nick refers to as “lively”! In fact at a meeting I hosted last week I asked if anyone would consider using 454 to develop new methods on and no-one was willing. It looks like both PGM and MiSeq are making life very difficult for Roche.
Roche have responded with paired-end multiplexed sequencing protocols now and soon-to-be-released longer read lengths and automated library-prep and ePCR. However users have reported problems with the longer reads (>700bp).
Over the past four or five years the perfect mix of machines in many labs has been an Illumina GAIIx or HiSeq and a 454. Is this still the case with the personal genome sequencers?
You can read my summary of the paper at the end of this post. But first I thought I'd share some questions I asked Nick after reading the paper and his responses:
1. Assuming Illumina release 700bp reads this year what is the impact on 454 and your kind of bacterial assembly? I saw the Broad presentation about 1x300bp reads which looks promising, and pairing those up could give an overlapping read of say 500-550bp which would be very nice. Qualities looked to drop off markedly on the poster I saw so that will need addressing if they are to be used for more than just scaffolding. Even with the long read kit I think 454 will find it increasingly tough to keep up unless they can increase the throughput and/or drop the run cost.
2. Is increasing to PE250 and 5Gb on MiSeq a big or small impact on your comparison? For our applications (clinical & public health microbiology) we are interested in the per-strain cost, so greater throughput means higher multiplexing which is good. However library prep is increasingly the bulk of the costs, so some work needs to be done there on all the platforms. PE250 - if the qualities are high - should give assemblies very competitive with the 454 at a fraction of the cost.
3. How would the current PGM 318 chip with PE200bp (or even longer) reads compare to the data you used? A 318 chip should give about 1Gb or 3-4x what we got with the 316 chips and make sequencing cheaper. With the PGM the maximum amplicon size you can use in the pipeline is still quite small, so running paired-end would mainly have the effect of improving read quality, as you are reading the same molecule twice. The 200bp kits had some quality issues when first released see some discussion from Lex Nederbragt. I'd definitely like to repeat my comparison this summer with the latest and greatest kits available for all manufacturers.
4. Another big assumption but if ONT get even 10kb reads and 1,000,000 per run is there any hope for the three technologies you compared? That's a very good question! There is a general feeling that all this faffing around with short reads may completely pointless if a nanopore technology delivers on the promise. Certainly bacterial de novo assembly becomes a trivial problem if you can reliably get 10kb+ reads, in this E. coli strain that would cover all the repetitive regions. PacBio already can do great bacterial assemblies but the cost of this instrument puts it out of the running for labs like ours. What's most interesting about Nanopore is the apparent lack of sample preparation and amplification, plus the tiny form factor in a USB stick. As I said at the time of the announcement, this could be a major game-changer for applications like near-patient testing in clinical and public health microbiology.
5. Thinking about workflow and assembly, have you considered using the Nextera library prep before? Yes, Nextera is very interesting and speeds up the time taken to make libraries, plus with the obvious advantage you can do a bunch in a 96-well plate. The downside it seems to us is that you don't get tight size selection of fragments which makes it less nice for paired-end runs for de novo assembly, where a fixed insert size is very helpful.
6. What are the things NGS companies need to work on to make your job easier? Is it just more data, longer reads and faster? Can you name three things that would be on your "desert-island discs" list for them to focus on? I guess all the companies need to keep pushing in all directions, but probably the most important for us are: 1) workflow, making it as plug and play as possible to get from a clinical sample to a sequence 2) cost, getting the per-sample price down as far as possible - we want the $10 bacterial genome, so that is going to mean cheaper library preparation then throughput improvements 3) read lengths, the longer the better!
7. Which one should people buy if they want to do bacterial genome sequencing? Ha, the $1000 question - I'd point people at the paper and get them to figure out what was appropriate for their needs and budget.
So what does the paper say? There are some simple comments from initial instrument comparison, the 454 GS Junior produced the longest reads, Ion Torrent PGM generated data fastest and MiSeq produced the greatest mount of data. See table below.
Comparison analysis: In the paper Nick and his colleagues compare instrument performance sequencing the O104:H4 E. coli isolate behind the 2011 food poisoning in Germany. They set a benchmark for comparison by first generating an O104:H4 reference assembly on GS FLX+ from long fragment and 8-kb insert paired-end libraries using Titanium chemistry reads (~800bp). They produced a 32-fold coverage very high-quality draft genome assembly.
They used to to compare de novo assemblies from each instrument. Contigs obtained from the 454 GS Junior data aligned to the largest proportion of the reference, with 3.72% of the reference unmapped, compared to 3.95% for MiSeq and 4.6% for Ion Torrent PGM. None of the instruments produce a single-contig 10% accurate genome. And for each technology there is a trade-off between advantages and disadvantages.
As the paper is based on data generated several months ago it is certain to be out of date. Roche continue to improve 454 chemistry, Ion Torrent aim for 400bp reads and MiSeq is about to get PE250bp and has been reported as generating a near 700bp read. How would the comparison look with these improvements?
The impact on public health: Moving NGS into pubic health lbs is not ing to be easy so it had better come with real advantages. It is also likely that NGS data will need to compare well to current typing methods. In the paper Nick used the NGS assemblies to generate multi-locus sequence typing (MLST) profiles, MiSeq performed best at this but the other platforms also worked quite well.
The paper asked three questions in the discussion one of which was “how much one should have to rely on human insight rather than automated analyses and pipelines”. There is a lot of discussion around clinical NGS and the need for highly automated reporting back to clinicians, perhaps we should make sure this question stays at the front of those discussion.
The challenge from ONT: I’d be surprised if anyone reading this has not heard of Oxford Nanopore. There is a sense that their technology will render the “short read” platforms (and in this instance I’ll include 454 here) obsolete. Generating an assembly from 100kb reads is game changing, especially if no sample prep is required.You can read a great article by Michael Eisenstein in the latest issue of Nature Biotechnology.