Friday 10 August 2012

Battle of the benchtops part II (you'll need a strong bench for one of these!)

After the furore around the Loman et al paper it is interesting to read another comparison of NGS platforms. Lets face it most of us want to know either what we should be buying next, or if we bought the right thing in the first place.

Comparison papers help.

As do beers at AGBT!

The latest sequencer comparison paper: Mike Quails group at the Sanger published a comparison of PGM, MiSeq and PacBio (interesting choice of the third platform). They sequenced several small genomes that varied massively in GC content. It was interesting to me that these genomes are the routine test genomes for Mikes group, most of us would shudder if a user asked us to sequence something with 20% GC on HiSeq!

Table 1 is excellent reading and should help people in making purchasing decisions. Collecting all this information together needs to be done by each individual institute as prices can vary quite widely. But the table as it stands should allow anyone to make basic comparisons and also see what is missing that they might need to put greater effort into. In the paper they say that although the raw error rate is significantly different for the instruments compared, the affect on SNP calling is negligible given sufficient coverage. 15x appeared fine for the genomes tested. I’d prefer to have seen this in the table as well, to act as a counter to claims around error rates from sales people! They compared most of the things you would want to when deciding what to buy (see the table for everything). The sequencing costs differ significantly per Gb at $500, $1000 and $2000 for MiSeq, PGM 318 and PacBio respectively. This compares to about $50 per GB on HiSeq.

Table 1 from the paper
Most people considering PGM or MiSeq are after a fast sequencer and both will deliver. As we get used to sub-24 hour run times our users will notice how long library prep takes. As costs for sequencing continue to fall we’ll also spend more time questioning the costs of library prep. The paper talked about the push from all companies to make library prep as simple as possible. However there was no mention on the cost of library prep. The genomes sequenced require only ?-? Gb of data and so ??? libraries would be needed per run. At $100 per sample the cost is X?times more than the sequencing. This is still an unmet need of the community, $1 sample prep for $10 genomes.

How did they do the comparison: Genomes sequenced included Bordetella pertussis (68% GC), Salmonella Pullorum (52% GC), Staphylococcus aureus (33% GC) and Plasmodium falciparum (19% GC). They made PCR-free or PCR amplified libraries for MiSeq PE150bp runs, or HiSeq PE75bp lanes allowing a direct comparison of the impact of PCR. Additionally they prepared Nextera libraries from three of the genomes sequenced (Bp, Sa & Pf) and whilst two produced “remarkably even” data the Pf genome was very biased. They made PGM libraries using physical shearing and “Fragmentase” digestion using the Ion Xpress kits and showed both to be comparable. These were run on 316 chips for 65 cycles, generating mean read lengths of 120 base pairs. Standard PacBio libraries were prepared and sequenced using C1 chemistry on multiple SMRT-cells how many?

What did they find: PGM struggled with the very AT rich Pf genome, and the bias appeared to be partly in the library-prep. By tweaking the protocol and swapping the polymerase for a better one they demonstrated a significant improvement in results. Why don’t all companies do this kind of testing before releasing products on us users, using the best polymerase or ligase available can make a huge difference.

Error rates were best for MiSeq, no surprise to Illumina users there. But there was no impact on true-SNP calling with PGM doing best at 15x genome coverage although it did produce more incorrect SNP calls. PGM and MiSeq correctly called 82% and 76% of SNPs and produced 1800 and 1300 incorrect SNP calls respectively. For Illumina MiSeq made more correct SNP calls than HiSeq or GAIIx and Nextera library prep worked as well as the standard protocol. Both MiSeq and PGM’s built-in variant calling was inadequate; MiSeq reporter called 7% and Torrent suite called 1.5% of variants. SNP calling for PacBio was hampered by a lack of tools as most are designed for short-read data.

A word of caution: The paper is out-dated as are all comparisons and the authors are happy to acknowledge this. It takes time to perform an experiment like this, analyse it and finally write it up. C2 chemistry was used for PacBio and a new method has been described for magnetic loading of chips. MiSeq now has 500bp kits available and even more reads. PGM has error rate has improved. MiSeq has an upgrade being rolled out now for more and longer reads. To be fair to the non-Illumina platforms MiSeq is based on a pretty mature technology whilst Ion and PacBio should be given some time to catch-up (and perhaps overtake), some of the issues with the PGM and PacBio might be resolved by evolution.

GenomeWeb had comments from Ion, Illumina and PacBio. Ion and Illumina both said the comparison was fair. Ion clarified this by saying that the data showed what was possible in 2011 but that error rate was now just 0.4%. Whilst IlluminaLoman et al presented.

Mike also spoke to GenomeWeb and said that the same test genomes are still being run and that the results were as valid today as back in 2011. Significant improvements had come from PGM 200 cycle kits and the C2 chemistry for PacBio.

I am confident there will be more of these comparisons in the next few months. Expect at least one AGBT presentation and lots more discussion over beers.

See you at the bar perhaps?

What do celebrities think about science and what do scientists think of celebrity genomes?

Sense About Science is an organisation that tries to provide expert advice on scientific matters to whoever needs it. They monitor the papers and news and produces an annual “Celebrities and Science” round up of the best and worst comments from people in the public eye. The organisation behind Sense About Science has come under some criticism for being pro-GM and a bit radical and certainly not everyone is a fan. But I enjoyed reading through their annual reviews and wanted to share a few of my favourite comments. The best for me was from Nicole ‘Snooki’ Polizzi who said “the oceans were salty because of all the whale sperm”! See the bottom of this post for a selection from the last three years round-ups.

Sense About Science scan many publications looking for comment; of course celebs and politicians don’t always get it wrong but it is far easier to pick up on the crazies out there. There appear to be fewer celebrities who deny evolution or suggest “fossil fuels” aren’t running out, whilst some politicians careers appear to be built on such claims.

If you want to help out then you can sign up or email them with examples of bad science. 

What do Scientists think of celebrity genomes? Jeff Barrett's web page at the Sanger has coverage of a debate on the value of celebrity genomes between Ewan Birney and Paul Flicek. This was part of a series of events at the Sanger institute looking at the relationship between society and personal genomics.

Jeff chaired the debate and before starting the room was evenly split between those who agreed, disagreed or were undecided on the statement “celebrity genomes are a useful contribution to science and society”.

The debate focused around how useful genomes from celebrities were in creating a dialogue between scientists and the public. Paul argued that celebrity genomes are no more important than non-celebrity genomes, so what makes celebrities qualified to speak about genomics? Ewan argued that celebrity genomes have contributed to science, even if only a little. At the end of the debate Jeff asked the audience to judge what impact they thought celebrity genomes had on science, 33% said positive, 62% said negative and 4% were undecided. He also asked the audience if they thought celebrity genomes had had an impact on society, 41% positive, 51% negative and 8% were undecided.

During the debate Paul talked about the impact celebrities can have as patient advocates using Michael J Fox and Parkinsons as an example. Celebrities have as much chance of developing cancer as any of us and as they get their cancer genomes sequenced and see a benefit from the “treatment” they are uniquely placed to talk about the impact in a way that is going to get across to more people than coverage of a Nature paper on the BBC six o’clock news will ever do.

We should be trying to engage with this as much as possible, shouldn’t we?

PS: If you are a celebrity (why wouldn’t they be reading my blog?) and need some advice then help is just a phone call away, call sense about science on +44(0)20 7478 4380. I can’t promise they can say how many reads you’ll need for your next exome sequencing experiment!

PPS: If you want your celebrity genome sequenced there are plenty of labs in LA.

My pick of the best and worst from the annual round-ups.

Bonnie Tyler when questioned about trying acupuncture said “I lost some weight but I was also on a more sensible diet at the same time which, if I’m cynical, is more likely the reason for the weight loss.” And Natascha McElhone’s comments about tetanus after a visit to Angola: “It’s completely preventable if you’re inoculated against it.”


Heather Mills “meat sits in your colon for 40 years and putrefies, and eventually gives you the illness you die of. And that is a fact.”

Roger Moore “eating foie gras can lead to Alzheimer’s, diabetes and rheumatoid arthritis. In short, eating foie gras is a tasty way of getting terminally ill.” I don’t eat foie gras on compassionate grounds but it is unlikely to be the cause of so many diseases, and I am not sure any of those Sir Patrick listed are actually terminal?

Alex Reid gave out a horrible message about unprotected sex saying “it’s actually very good for a man to have unprotected sex as long as he doesn’t ejaculate” and “semen has a lot of nutrition. A tablespoon of semen has your equivalent of steak eggs, lemons and oranges.” Irresponsible nutter if you ask me!

Julia Sawalha doesn’t get inoculated or take anti-malarials but uses “ homeopathic alternatives, called ‘nosodes’” and said “I’m the only one who never goes down with anything.”

Joanna Lumley, her AbFab co-star put the increase in cancer down to “the growth hormones in the food we eat, that try to make all the chickens, sheep and cows, more productive”.

Sarah Palin who’s autobiography “Going Rogue” says that she “didn’t believe in the theory that human beings — thinking, loving beings — originated from fish that sprouted legs and crawled out of the sea or from monkeys who eventually swung down from the trees.” Yikes how can such strong anti-evolution views be held by someone who (from a UK news coverage perspective) holds some power in the USA?

Michelle Bachman, member of the US House of Representatives and Republican Presidential Candidate, told journalists that a woman had told him her daughter suffered mental retardation after receiving the HPV vaccine, and that this vaccination program has dangerous consequences. What is the likelihood she is a right-wing, pro-christian, pro-guns, anti-abortion Republican?

These last two particularly disturb me. The first highlights how nuts some politicians are. The second because as the UK MMR scare showed, bad science can become mainstream fact and affect us all in a very negative way.

We shouldn’t believe everything we hear in the press, but politicians surely have an obligation to be careful about what they say.

Wednesday 8 August 2012

What happened to Illumina’s single molecule sequencing or do you remember Solexa’s SMA-seq?

Eight years is a long time in NGS. I recently re-read a 2004 article in Pharmacogenomics 2004, and also found a EBI presentation from Clive Brown and Ewan Birney. Both of these were from a small company based in Cambridgeshire called Solexa. At the time of publication they had only just identified their first alpha-test site and the presentation talked about a prototype instrument ready for the end of 2004. 

Prototpye GA1
Trademarks mentioned in the paper such as SMA-seq and TotalGenotyping have not, I suspect, been heard of by most Illumina sequencing users (including myself). 

The paper describes where Solexa came from (Shankar Balasubramanian and David Klenerman's patents of 1998 spun out of Cambridge University Department of Chemistry). It mentions Solexa's demonstration version of “a system that will allow rapid, base-by-base comparison of genomic DNA sequences” and that this will produce “four or five orders of magnitude improvement over conventional sequencing”. Read lengths of just 25-30bp are proposed, and a nice graph illustrates how just over 80% of the Human genome is uniquely mappable with these incredibly  short reads.

Simon Bennett, business development director of Solexa at the time and author of the Pharmacogenomics paper suggests that Solexa will achieve the $1000 genome within the next ten years. That leaves us two more years to get to $1000 genomes. It does not seem unreasonable that we’ll get there although more discussion today is about the cost of bioinformatics analysis!

What happened to single molecule sequencing: There is an overview of the Solexa Single Molecule ArrayTM technology that the paper suggest can analyse a Human genome in a single experiment. As described there were just 100,000 DNA molecules per cm2 compared to 100M cm2 today. The basic chemistry description is unchanged from current SBS, although only 25 bases were being sequenced at the time of publication.

It is only towards the end of the paper that Solexa’s acquisition of Manteia’s solid surface bridge-amplification technology, this is the clustering we know and love today. Up until this point Illumina had been focusing on single molecule sequencing. Without the acquisition of Manteia perhaps Solexa would have continued to chase single molecule sequencing and ended up like Helicos or Pacific BioSciences. As it stands clustering and SBS chemistry have been the bedrock of next-gen sequencing for the past five years.

Personally I’d bet Illumina are still putting lots of effort into single molecule approaches, and not just by investing in companies like ONT. I’d like to know if it would be possible to sequence single molecules on a HiSeq with a more sensitive camera (massive oversimplification I know)? Imagine 1000M single molecule reads! This might not be what we ultimately use for single-molecule but I think we can be certain there is a lot more coming for next-gen in the next eight years.

PS: Would SOLiD have been the dominant technology if Agencourt had bought Manteia instead? Perhaps we should have a genomics version of Marvel’s “What if” comic books from the 80’s?

PPS: The Illumina history lesson also taught me that we share half our genes with bananas!

Friday 3 August 2012

Is visual QC of NGS libraries needed anymore?

I have been using the Bioanalyser since its introduction in 1999. Originally intended for QC analysis of total RNA for microarray studies it quickly became a standard tool for many labs. Over the past few years we have run almost as many NGS libraries on DNA 1000 assays as we have RNA chips.

I think we are going to stop using it for all but a small proportion of libraries by next year.

The Bioanalyser has been a great tool for quality control of NGS libraries. Users can clearly see if they have prepared a high-quality library, if there is lots of adapter-dimer present and if the insert size is what they expected. Unforunately running the Bioanalyser is a bit of a pain once you have more than 12 or 24 libraries.

In my lab we are now preparing 24, 48 and 96 libraries in each batch. QC of these has become too much work using current methods so we looked at alternatives. This included the Caliper LabChip GX, Shimazdu MultiNA, Agilent ScreenTape, Qiagen QIAxcel and Advanced Analytical’s Fragment Analyser (see the bottom of this post for a full list of features).

From our analysis of the system features we asked for demonstrations of the Caliper and Advanced Analytical instruments. These two both appeared to give us the throughput and sensitivity we need, both systems worked well and I know of several labs using these instruments very successfully. However we decided not to invest in a high-throughput Bioanalyser.

Why not and what do we want from library QC: most users want sequence results as soon as possible and are happy with some libraries failing so for some the QC is seen as a bar that gets in the way of their science. My lab wants to satisfy all users and return the highest percentage possible of high quality sequencing runs. Generating 40M reads of a poor library is no use to anyone.

With the introduction of 96 and 384 index kits from companies like Bioo Scientific and with Illumina finally catching up with the TruSeq HT kits I think we are ready to ditch gel-based analysis. Instead we will start using a QC pipeline that will use the data from a single lane analysis of up to 96 libraries. We can look at computed insert-size, verify quantification by checking pooling ratios, screen for adapter-dimer or contamination with other genomes and make sure duplication rates are not too high. Even with 96 samples we should get around 1-2M reads each, and some readers of this blog may remember when 1 M reads was considered enough for ChIP-seq analysis, let alone QC! There are also some hints that 1M reads might be acceptable for basic differential gene expression analysis of highly expressed transcripts.

We’ll be slowly retiring the Bioanalyser type analysis of libraries and using the qPCR quantification as a simple QC tool for pass/fail decisions. We might even get to a point that we only quantify the final pool after mixing equal volumes of all 96 libraries, such that cluster density is spot-on. Then we can use the sequence demultiplexing to indicate the actual balance of indexes to re-pool for the final high read number sequencing.

High Throughput Bioanalyser Platform Features
Caliper - Labchip GX
  • High throughput bioanalyser with 96 and 384 well compatibility
  • Asseses RNA quality and gives exact sizing and quantification of DNA fragments.
  • Can analyse 96 samples in less than 1 hour
  • RNA metrics are used to calculate the RGS value (RNA quality score) which has been validated to correlate with the agilent bioanalyser RIN score. This would be beneficial since users are already familiar with a RIN value for assessing RNA quality.
  • Resolution down to 5bp and sensitivity of 0.1 ng/ul
  • Can visualise the results on electropherogram or gel view similar to Agilent 2100.
  • Data can be viewed in tabular form which can be easily exported/uploaded onto our LIMS system.
  • High sensitivity kit also available
  • There is a barcode reader for sample tracking which would be important when running large numbers of samples.
Shimadzu Biotech – MCE 202 MultiNa
  • This is a microchip electrophoresis system for DNA/RNA analysis.
  • Reusable microchips are used which could reduce running and consumable costs.
  • 120 samples can be run simultaneously across 4 separate microchips with 80 seconds per sample processing speed.
  • It can also perform automatic or manual reanalysis of the samples as seen with the agilent bioanalyser and can export the results in a csv. format.
lab901 Agilent - Screentape
  • The Lab901 ScreenTape system is a fully automated system for gel electrophoresis. The ScreenTape instrument loads, separates, images and analyses both DNA and RNA samples. It does this by loading each sample onto a screentape each of which contain 16 microgels which align to built in electrodes and imaging system.
  • Only 1 ul of sample is required and analysis takes 1 minute per sample. It is fully automated with prepacked reagents so there is no gel preparation or chip priming.
  • Different screentapes are available for DNA and RNA analysis.
  • For RNA analysis, quality is displayed as the screentape degradation value (SDV)
Qiagen- QIAxcel system
  • A microcapillary electrophoresis system, which is fully automated and can process up to 96 samples per run. Separation is performed in a capillary of precast gel cartridge which are reusable.
  • Sensitivity of 0.1ng/ul Resolution down to 3-5 bp.
  • Sample consumption is less than 0.1ul, although the minimum sample volume to load for analysis is 10ul.
  • 96 samples can be processed in approximately 1 hour.
  • The data can be viewed as electropherogram or gel images.
Advanced Analytical –Fragment Analyser
  • is a fluorescence-based capillary electrophoresis instrument for both sizing and quantifying nucleic acids (DNA and RNA).
  • Can run either 12 samples or 96 samples at a time
  • The instrument provides space for up to six 96-well plates
  • Can be used to quantify and qualify NGS fragments, RNA, genomic DNA and also for mutation detection, Microsatellite (SSR) analysis.
  • Various capillary lengths can be used, depending on the application, required resolution and desired speed of analysis. Longer arrays provide resolution down to 2 bp for fragments under 300 bp in length. Shorter arrays still provide good resolution with run times as fast as 15 minutes
  • PROSize™ software is used to analyse the data and this can be viewed as a gel view, electropherogram or a results table.
  • The data is exportable and can be linked to the LIMS.