Comparison papers help.
As do beers at AGBT!
The latest sequencer comparison paper: Mike Quails group at the Sanger published a comparison of PGM, MiSeq and PacBio (interesting choice of the third platform). They sequenced several small genomes that varied massively in GC content. It was interesting to me that these genomes are the routine test genomes for Mikes group, most of us would shudder if a user asked us to sequence something with 20% GC on HiSeq!
Table 1 is excellent reading and should help people in making purchasing decisions. Collecting all this information together needs to be done by each individual institute as prices can vary quite widely. But the table as it stands should allow anyone to make basic comparisons and also see what is missing that they might need to put greater effort into. In the paper they say that although the raw error rate is significantly different for the instruments compared, the affect on SNP calling is negligible given sufficient coverage. 15x appeared fine for the genomes tested. I’d prefer to have seen this in the table as well, to act as a counter to claims around error rates from sales people! They compared most of the things you would want to when deciding what to buy (see the table for everything). The sequencing costs differ significantly per Gb at $500, $1000 and $2000 for MiSeq, PGM 318 and PacBio respectively. This compares to about $50 per GB on HiSeq.
Table 1 from the paper |
How did they do the comparison: Genomes sequenced included Bordetella pertussis (68% GC), Salmonella Pullorum (52% GC), Staphylococcus aureus (33% GC) and Plasmodium falciparum (19% GC). They made PCR-free or PCR amplified libraries for MiSeq PE150bp runs, or HiSeq PE75bp lanes allowing a direct comparison of the impact of PCR. Additionally they prepared Nextera libraries from three of the genomes sequenced (Bp, Sa & Pf) and whilst two produced “remarkably even” data the Pf genome was very biased. They made PGM libraries using physical shearing and “Fragmentase” digestion using the Ion Xpress kits and showed both to be comparable. These were run on 316 chips for 65 cycles, generating mean read lengths of 120 base pairs. Standard PacBio libraries were prepared and sequenced using C1 chemistry on multiple SMRT-cells how many?
What did they find: PGM struggled with the very AT rich Pf genome, and the bias appeared to be partly in the library-prep. By tweaking the protocol and swapping the polymerase for a better one they demonstrated a significant improvement in results. Why don’t all companies do this kind of testing before releasing products on us users, using the best polymerase or ligase available can make a huge difference.
Error rates were best for MiSeq, no surprise to Illumina users there. But there was no impact on true-SNP calling with PGM doing best at 15x genome coverage although it did produce more incorrect SNP calls. PGM and MiSeq correctly called 82% and 76% of SNPs and produced 1800 and 1300 incorrect SNP calls respectively. For Illumina MiSeq made more correct SNP calls than HiSeq or GAIIx and Nextera library prep worked as well as the standard protocol. Both MiSeq and PGM’s built-in variant calling was inadequate; MiSeq reporter called 7% and Torrent suite called 1.5% of variants. SNP calling for PacBio was hampered by a lack of tools as most are designed for short-read data.
A word of caution: The paper is out-dated as are all comparisons and the authors are happy to acknowledge this. It takes time to perform an experiment like this, analyse it and finally write it up. C2 chemistry was used for PacBio and a new method has been described for magnetic loading of chips. MiSeq now has 500bp kits available and even more reads. PGM has error rate has improved. MiSeq has an upgrade being rolled out now for more and longer reads. To be fair to the non-Illumina platforms MiSeq is based on a pretty mature technology whilst Ion and PacBio should be given some time to catch-up (and perhaps overtake), some of the issues with the PGM and PacBio might be resolved by evolution.
GenomeWeb had comments from Ion, Illumina and PacBio. Ion and Illumina both said the comparison was fair. Ion clarified this by saying that the data showed what was possible in 2011 but that error rate was now just 0.4%. Whilst IlluminaLoman et al presented.
Mike also spoke to GenomeWeb and said that the same test genomes are still being run and that the results were as valid today as back in 2011. Significant improvements had come from PGM 200 cycle kits and the C2 chemistry for PacBio.
I am confident there will be more of these comparisons in the next few months. Expect at least one AGBT presentation and lots more discussion over beers.
See you at the bar perhaps?