CoreGenomics: January 2012

Monday, 30 January 2012

Nanopore sequencing: is the hype about to end?

A follow-up post to this one explains what nanopores are and how they can be used for DNA sequencing.

There has been a lot of hype around Nanopore sequencing for a number of years. The promise of very long single molecule sequencing with nucleotide modifications being directly read out is the holy grail of sequencing technology. The fact is that it has been hard to translate a relatively simple concept into reality.

For the past three years attendees at AGBT have been waiting for Oxford Nanopore to speak. To date they have been very reluctant to say much publicly and although it has not been confirmed, the rumour mill is hot with speculation that this year is finally the year that they will talk.

But what will they talk about and is the Next Next-Gen (N2GS) just around the corner?

What am I hoping to hear about: I am hoping for a genome, PhiX would be OK (any nanopore sequence would be better than nothing), but I’d admit to being disappointed if this was all. It would be great to see a complex genome presented, Yeast or C. elegans. Of course what I really hope is that they will talk about their sequencing of a Human genome and if I allow my imagination to run away then I am looking forward to details of a long-read single molecule 1000 genome project.

What do Oxford Nanopore offer: ONT are developing nanopore-based technologies, the most interesting of which for this post is DNA sequencing.

ONT’s sequencing that has been discussed publicly uses an α-haemolysin pore coupled with an exonuclease. Rather than feeding an intact DNA strand through the pore and reading out the bases, the exonuclease-sequencing approach cleaves each base from the end of the DNA strand, those bases translocate through the pore and are detected and the DNA seqeunce is read out. Many thousands of these nanopores are required to run in parallel to sequence a genome. And part of the parellelisation comes in the form of the GridION system, where multiple sequencing chips can be run together.

Strand-sequencing has not been dropped by ONT though and Hagan Bayley has modified the pore to improve discrimination of bases. Others have also demonstrated methods to slow down the translocation of DNA through the pore by coupling it to a polymerase, which ‘ratchets’ the samples through a base at a time.

The GridION system was ‘reviewed’ in a post over at Genomes Unzipped by Luke Jostins almost exactly one year ago. And whilst this post has scant details on the sequencing to be fair very little additional has been revealed in the following 12 months. Lukes post does describe the compute-cluster-like architecture that ONT have developed to house future sequencers. A user could buy one or one thousand and do as much or as little sequencing as they need. What the impact will be on service providers like BGI will be interesting to see.

“Run until” technology: ONT’s exonuclease or strand-sequencing approaches, packaged in the GridION format, will allow scientists to load the instrument and continue to sequence until ‘enough’ data has been generated. How much is enough is determined by the user ahead of the run and the real-time analysis and monitoring will mean only the required data are generated.

It is theoretically possible to make use of this approach, or something similar today. By running highly multiplexed samples and performing analysis between runs it would be possible to generate almost exactly the required depth of sequencing per sample. Once a sample has been shown to be of high quality then only the number of indexed reads is required to calculate how much more sequencing is needed. This could be implemented on the other sequencing platforms, but I suspect is too difficult to implement in most labs today. Even if the efficiency gains could be worthwhile.

ONT and Illumina: In 2009 Illumina invested $18M in ONT and bought an exclusive license to "market, sell, distribute, and service BASE( BAyley SEquencing) technology.What this means for ONT and Illumina today is less clear and I am not certain if other milestone payments have been paid. Illumina certainly want to stay ahead of the competition (LifeTech, Complete and Roche). ONT might be their not-so-secret weapon.

What does the future hold: Dr Sanghera (ONT CEO) was quoted saying that the $1,000 genome will be possible “within three to five years” in a March2011 interview with The Economist. Things have sped up faster than ever in the last year and the $1000 genome looks like it is already here. Can ONT deliver us the $100 or $10 genome? I am certainly looking forward to having them piling the heat on Illumina and Life Tech.

Lastly, back in 2009 when Clive Brown ONT CTO) was interviewed in BioIT World he said "before launching a product, you have to run it in house for months, doing genome-centre type things". Hopefully we are about to find out exactly what those genome-centre type things are.

How does a nanopore sequencer work?

You may have come here from my other Nanopore post, if not then this is a follow-up to that and is meant to outline Nanopore sequencing technologies.

How does a nanopore sequencer work: A nanopore is a very small hole, the generally under 1nM in width in a membrane of some kind. It can be made from a biological molecule or ‘punched’ into a solid surface using an electron beam. Nanopore sequencing has a very simple basic principle, DNA strands or single nucleotides are driven through a nanopore electrophoretically. As each nucleobase passes through the pore the current is affected and this change allows sequence to be read out. Each base has a characteristic change in current and, perhaps just as importantly a specific dwell time in the pore. One of the first publications of the idea was from George Church in a 1995 patent (Church et al), this was a year or two before Shankar Balasubramanian and David Klenerman invented the SBS chemistry and formed Solexa. Nanopore sequencing has been around as an idea for a while!

The earliest demonstrations of the technology used α-haemolysin or Mycobacterial porin A (MspA) biological nanopores. Biological and solid-state pores have now been demonstrated and hybrid systems have also been discussed.

Biological nanopores: cells are very good at making biological nanopores like α-haemolysin, they also make many other similar molecules and it is possible to design pores with specific characteristics using site-directed mutagenesis. These can be checked at the atomic level with X-ray crystallography to verify their structure. α-haemolysin pores have been widely used as the hole in the middle is only wide enough for single-stranded DNA to pass through. Unfortunately DNA moves through these pores very rapidly making detection of each base almost impossible. Slowing its translocation down has been a goal of nanopore research. DNA can be coupled with enzymes like DNA polymerase and “ratcheted” through the pore. Oxford Nanopore have been working on “sequencing-by-digestions” where an exonuclease sits above the pore cleaving individual bases from a strand of DNA which pass through the pore allowing sequence to be read out. Biological pores also offer the promise of detecting not just DNA sequence but also interacting DNA:protein molecules and possibly protein sequence. A big challenge for biological nanopores is that they are often embedded in fragile lipid bi-layers and can be affected by physical conditions such as temperature, pH, etc. It can also be difficult to make large arrays of nanopores and we would need many thousands of pores to sequence a Human genome.

Solid-state nanopores: Many groups are working on solid-state systems using ion or electron-beam sculpting of pores in Silicon nitride membranes. Making the pores and the membranes is challenging as spacing and thickness need to be carefully controlled but these systems are much less affected by physical conditions and may also be coupled to electronic or optical read-out systems. The latest advancement appears to be the use of graphene, a two dimensional sheet of carbon atoms as the membrane of choice. The use of solid-state nanopores is not as advanced as the biological approach and similar challenges for controlling the speed of DNA translocation are as yet unmet.

How will my sequencing be affected: Nanopores offer the promise of sequencing a base per millisecond with high accuracy and detection of base modifications like methyl-C. At these speeds one million base pairs can be read-out in about 20 minutes. With 1000 pores in an array a Human genome might be sequenced in an hour or so.

Nanopores also offer a major advantage over current methods even if speeds don’t quite approach this, no labelling is required and single molecules are analysed. This means there are few reagents and possibly no sample prep other than extracting DNA, so it should be easier and cheaper than current methods!

Church et al. Characterization of individual polymer molecules based on monomer-interface interactions. US patent 5,795,782 (1995).
Venkatesan and Bashir. Nanopore sensors for nucleic acid analysis. Nature NanoTech 2011
Branton et al. The potential and challenges of nanopore sequencing. Nat Biotechnol. 2008

Wednesday, 25 January 2012

Roche to buy Illumina?

Looks like Roche want a chunk of Illumina. There are press releases on both companies websites (Illumina and Roche) about Roches offer. There is little on the Illumina release except confirmation about the unsolicited acquisition proposal at $44.50 per share, this amounts of $5.7Bn according to Roche and is a 64% premium on current stock price.

Roche speak about the combination of the two companies and a strengthening of diagnostics potential. They also state that the will merge headquarters to the San Diego Illumina site (who incidentally just built nice new offices).

The press release quotes Roche's CEO Severin Schwan as saying "It is our strong preference to enter into a negotiated transaction with Illumina, and we remain willing to engage in a constructive dialogue" whether Illumina are so eager is another thing entirely.

Dear Jay... the press release finishes with a letter from Franz Humer (Roche Chairman) to Jay Flatley (Illumina CEO). It says a lot about the unwillingness of Illumina to engage and how great the merger will be for both companies.

I had also heard that Qiagen were hovering over Illumina. Illumina have been incredibly successful over the past five or six years, I am not surprised someone is wiling to pay a lot of money to buy them. Is $44.50 good enough? I am not sure, especially given the price was nearly $80 last Summer.

MiSeq: possible growth potential part 3

This post (and others like it) are pure speculation from me. I have no insider knowledge and am trying to make some educated guesses as to where technologies like this might go

This is number three in a series of MiSeq posts focusing on what we might get out of the system as it evolves. I originally suggested in my very first post that a 25GB output might be possible, and revised this with some comments on very long 1000bp runs. Our last flowcell generated 2.24GB from a PE151 run and we continue to get very high-quality data and good yields (when the instrument works). Recently Illumina announced improvements to the MiSeq and suggest 7GB will now be possible using a combination of more imaging and longer reads (PE250bp).

I have created a little spreadsheet that allows you to play with read numbers and how much flowcell gets imaged. I am sticking with my assumption that a MiSeq flowcell is around one third of a HiSeq lane, but of course Illumina might increase the width of the MiSeq lane and further increase data output that way.

The spreadsheet gets 7GB from a 250bp paired-end run so confirms what Illumina announced (I'll take this as a sign that the spreadsheet works!)

Using the default 300bp run we should get 2, 4 and up to 12 GB from a run as more surface area gets imaged.

If we move to 250bp paired-end runs (500bp) and increase clusters to 10M over the current 7M average then these numbers jump quite considerably, but not beyond reason. It looks like 5-30GB could be possible with additional tweaks. The 30GB output would be some 80M reads. Quite a lot from a little box!

I'd like to see 1000bp reads, even in the form of "mini-contigs" from PE550bp.

What would you do with 60M 1000bp reads? There were almost 6M whole-genome shotgun reads used in the Human Genome project, of similar length.

Monday, 23 January 2012

HiSeq 2500 – what's in the upgrade and first impressions of the new flowcell

Last week I posted on the press releases from Illumina and Life Technologies. I wanted to follow up in this post with some more details after talking to Illumina. I have to say that the details on the new flowcell are not directly from Illumina. I'll try to get a better image than my drawing below and some confirmation, then I'll follow up.

I’ll kick off with a little HiStory (excuse the pun).

HiSeq Jan 2010-Jan 2012: The release of HiSeq in January 2010 made a big splash in labs like mine, it meant we could start to run whole genome sequencing projects and not just leave these to the big boys (Sanger, Broad, WashU, etc). Instrument capacity had become something of a hindrance to our aspirations.

The leap from GAIIX and 25-30GB to HiSeq's 100GB per flowcell coupled with a decrease in run time (14 down to 10 days) effectively meant we could aim to sequence 80 genomes a year at 30 fold coverage. The cost per genome also dropped precipitously from around $35,000 to $9,000. (see the bottom of this page for my calculations).

The v3 chemistry released last Summer which increased data volumes to 300GB per flowcell brought us another 3x drop in costs and we can now sequence almost 250 genomes per year at 30x coverage on a single HiSeq. With a single genome costing around $3000 in reagents.

Illumina do appear to be holding back on releasing improvements to allow 1TB runs. These were presented at AGBT '11 using v3 chemistry and PE150bp runs. I also saw a slide late last year at a MiSeq road show where 1TB had been completed in 10 days using PE100 and more clusters. 1TB runs would bring the cost per genome down to around $1500 per 30x genome.

HiSeq 2500, what’s new: The HiSeq 2500 is available as either a new instrument or an upgrade to an existing HiSeq 2000 (more on that later). It will support both the current v3 chemistry delivering 600Gb per run, as well as the new “MiSeq on steroids” reagent format which uses a new flow cell that is clustered onboard producing a 120GB output and the much vaunted “genome in a day”.

The “genome in a day” runs will most likely cost around 3-5 times more per genome than a 600Gb run that currently works out at about $30 per GB. The information so far released is suggestive of a reagent cartridge for the “MiSeq on steroids” runs, which would include cluster generation, sequencing and cluster regeneration reagents. Running the current v3 chemistry on HiSeq2500 will still require the use of a cBot.

Instrument changes include alterations to the flowcell stage, fluidics, software and other minor modifications. Illumina have said that HiSeq 2500 and upgraded HiSeq2000 systems recently purchased will be able to run more quickly than older upgraded HiSeq 2000 systems. Those of us that were early into the switch from GA to HiSeq will lose out in this but the additional run time is likely to be less than half a day, so I can’t see a major impact on my lab. Of course a “genome in a day and a half” is not nearly so nice a marketing message.

The new 2500 flowcell: I did get a little information on the new flowcell. Apparently it has two lanes, coupled to four ports at either end. This allows more reagent and possibly faster sequencing chemistry to keep quality high but increase cycle-times. The two lanes format also keeps imaging time low. This does mean that if users want to do high coverage amplicon sequencing projects on the new format then they will need to multiplex to 96+ samples per lane to get the best performance out of the run. If you want while Human genomes quickly though it should delivery nicely.

Should I upgrade or buy a Proton? The new machine from Ion Torrent looks like competition for HiSeq and should certainly keep Illumina on their toes. It has a much lower price tag than HiSeq and you could conceivably buy 5 vs 1 HiSeq. However it is not clear what additional costs will be required and Proton still uses ePCR, something few people talk about without wincing. If Ion push paired end sequencing out past 400bp on both ends then this really will put pressure on Ilumina.

However the HiSeq 2000-2500 upgrade is substantialy cheaper than even a single Proton sequencer and should keep HiSeq competitive as far as output in 24 hours goes. For labs that have HiSeq already the excitment to try a new technolgy will be balanced in many core director's heads buy the difficulty in running two very different systems side-by-side. I expect many HiSeq labs to upgrade rather than buy Proton. It will be interesting to see what percentage of HiSeq instruments in the big genome centres get upgraded, or if smaller labs upgrade all their HiSeq's where they only have two or three.

An issue many facilities may face is funding in our current economic climate. $50,000 of upgrade buys a couple of very nice sequencing projects and possibly some publications. Things appear to be very tight for almost everyone when it comes to capital expenditure. I don’t expect this upgrade to be a simple “I’ll take one, when can you deliver”. And I’m certainly expecting to compete for the funds with other projects.

Do people need lots of genomes so rapidly? This is a question still to be answered in most labs. Whilst I am sure many users would immediately say yes to the decrease in turnaround they probably would not choose to take advantage too often. In a research setting 9 days difference between 120GB and 600GB will almost always lose out to the very significant cost differential. Most users can simply wait a little longer for their data.

I am sure many other core facility managers have been asked to rush samples only to find out the data sat around for a couple of weeks to get analysed. I even offered a “platinum” service for Affymetrix array processing in 2003/4 that promised data in 3 days from delivery of RNA samples. It cost twice as much as normal and no one was willing to pay to jump the queue.

Clinical amplicons, exomes and/or genomes may well require a quick turnaround but there don’t seem to be so many centres that will make use of the very fast, but more expensive whole genomes. Fast amplicons and exomes are likely to be transformative in the next couple of years, but is an additional 9 days too long to wait for Genomes? This will certainly be interesting to watch.

One email I received after my last post said I may have to eat my words “who realistically needs their genome back in 24 hours, not a lot of users!”. If I do have to eat them I prefer battered with chips and lots of ketchup!

PS: All calculations assume perfect cluster density on all runs and no instrument down time.

Tuesday, 17 January 2012

$1000 genomes need $1 sample prep

The recent announcements from Life Tech and Illumina, and the expected announcements from the competing technologies are giving us a hint of what it will be like to live in the $1000 genome world. From a genomics perspective the reduction in sequencing costs is very welcome and is likely to be reflected in significantly larger projects (from groups already sequencing), in more projects (from groups not yet sequencing) or from both.

All of these people are going to have to make sequenceable libraries and many will find there are challenges to be faced in generating something that gives high quality data for analysis. There are many methods for making libraries, kits can be bought from sequencing instrument providers (e.g. Illumina), from third party vendors (e.g. NEB, Agilent, Beckman) or by making home-brew versions (e.g. Enzymatics). We have used all of these at some point over the past four years in my lab and all have been successful. It used to be that the cost was usually lowest with home-brew, higher with 3rd party suppliers and highest from the instrument vendor, competition has seen that shift so there is very little in it for most small to medium sized labs. The costs today are around $50 for a DNA sequencing library and compared to 18 months ago this price is great.

However the increases in sequencing capacity and corresponding drop in costs mean we can run many more samples than before. When a genome cost $250,000 or even $20,000 then $500 or even $100 for a library prep was fine. As we are able to process multiple genomes per sequencer run the sample-prep cost starts to be something we can no longer ignore.

As soon as we move away from whole Human genome sequencing then library prep cost can be a significant factor in total project costs. Anyone sequencing bacterial or model organism genomes that wants to sequence 100-1000 genomes in a single run is confronted with a large sample prep bill with $50+ prep costs. Sometimes the sample-prep costs more than the actual sequencing.

The same issues faces projects using differential RNA-seq where only around 5-10M reads and as few as 1M may be comparable to microarrays. At this read depth 30-300 samples can be processed in a single lane using $1000 of sequencing reagents. At $50 per samples, $1500-15,000 for library prep is not going to be maintained long term.

Hopefully this issue will push labs to develop new methods, similar to Epicentre/Illumina’s Nextera transposome technology, or microfluidic methods that use vanishingly small amounts of reagent and sample. A $1 bacterial genome prep or a $10 transcriptome prep are likely to be well received.

If there is anyone working on this yet please do get in touch!

Thursday, 12 January 2012

Targeted Cancer sequencing methods

In a previous post I discussed Ilumina and LifeTechs Amplicon sequencing products: TruSeq Custom Amplicon and AmpliSeq. Here I wanted to expand on the methods available and discuss some of the issues faced by people wanting to run amplicon sequencing projects.

Below I briefly discuss how many samples can be analysed in a run and multiple amplicon targeting methods including: Fluidigm, RainDance, Halo Genomics, MIPs, Multiplex PCR, Long range PCR with Nextera and Custom capture.

I hope you find this a useful summary and feel free to suggest other methods to add to the list.
Why sequence amplicons? Amplicon sequencing is big news in Cancer medicine, and next-gen sequencing allows us to target multiple amplicons in a single test. It is not immediately clear which amplicons to resequence and many questions remain:

Are small panels for particular cancers are better than a larger generalised panels?
What are the logistical and cost implications of small vs large panels?
Can we amplify and sequence the lager panel but only report tests doctors order (and delete the rest of the information)?
Should genes where a mutation is not currently actionable be included for RUO?
How should we return data to doctors/patients?

An amplicon panel that covers all 11 exons of TP53 is very cheap to manufacture but it costs the same to process the samples in the lab as a test covering many more genes.

Using COSMIC as a guide we can start to think about how many mutations might be covered by panels of different sizes. Only a few genes in COSMIC account for the majority of mutations, Jak2, TP53, KRAS, BRAF and EGFR account for 54% of mutations. The top 10, 20, 30 and 100 most mutated genes account for 65%, 77%, 80% and 85% of mutations. A core panel of all currently actionable mutations, plus the top 5 genes is around 100 exons/amplicons (an average of 17 exons per gene as Jak2, BRAF and EGFR all have lots of exons).

In the genome the average is under 10 exons per gene and 80% are less than 200bp in length. As such most genes will be targetable by fewer than 10 250bp amplicons.

How many samples can you analyse in a run: This is very much an “it depends” question. The sequencing platform used, coverage required, number of samples pooled and number of amplicons sequenced all impact the final number it is possible to run in a given sequencing system. The same issues affect how much the sequencing is going to cost per sample but this can become incredibly cheap on a per sample or per amplicon basis.

Costs per sample for amplicon sequencing on different platforms

What amplicon sequencing methods are available: There is a lot of choice for anyone wanting to sequence amplicons and it is not easy deciding which approach to go for. However once a choice is made it can be difficult to move to an alternative technology as the primers used are generally a one off purchase and changing your mind on which amplicons to target or which primers to use can be difficult and expensive. It is also generally difficult to move amplicon panels from one technology platform to another without making a new set. Most technologies allow users to run on almost any sequencing platform with a small change in protocols, we have run Fluidigm, RainDance and lrPCR in my lab on Illumina and Fluidigm on Ion PGM.

Fluidigm's Access Array: The Access Array system (including disposable chips, loader, thermal cycler and harvester) uses a microfluidic chip to partition very small 9nl PCR reactions from panels of samples and PCR primer pairs. The standard chip allows 48 DNA samples to be amplified with 48 primer pairs producing 2304 amplicons. Locus specific primers are tailed with sequences that allow a second round of PCR to add barcoded sequence adapters for all sequencing systems. It is possible to multiplex primer pairs with careful design and I am aware of one group that has tried 20plex with some success, allowing a theoretical 46000 amplicons. A single chip can be processed very easily by anyone who has done a PCR, using a multichannel pipette. The hardware and chips are only available from Fluidigm. Primers can be designed by anyone and there are no special requirements other than the addition of specific 5’ tail sequences to forward and reverse primers. Any sequencing platform can be used after the 2nd round of PCR using platform specific primers that include a 10bp barcode sequence. We have run up to 384 samples per Illumina flowcell lane, although 48 have been more common. The cost of a chip is £250 and reagents are about £150 (although a set of 48 primers will be around £500 as a one off up front cost), this makes the per sample cost about £7.50.

The Access array hardware

RainDance's ThunderStorm: The Thunderstorm system uses emulsion PCR to separate single molecules of DNA and specific primer pairs for PCR. Up to 96 samples can be processed in an unattended two day hour run for between 500 and 20,000 amplicons. The ThunderStorm system allows up to 8 different amplicon panels to be processed across the 96 samples so a mix of projects is quite possible. Primer pairs are designed, synthesised and then pooled. Each pair if then dropletised as an emulsion and the emulsions are mixed into a final primer set for PCRPCR master mix. Both are loaded into a microfluidic chip where droplets of DNA/Master mix and primers combine to form individual emulsion PCR reactions. RainDance have some great videos on their website which explain the system far better than I do. The final amplicon emulsions are pooled, cleaned and further PCR’d with ten cycles to add platform specific adapters.

The production of amplicon panels is costly but around 2000 samples can be run from the oligo pool. Currently RainDance have a $12,000 design fee which covers production of a 500 amplicon pool, additional amplicons are charged at $12 each. A 20,000 amplicon panel costs $252,000 to manufacture but only $126 per sample to use in a large project. The microfluidic chips are disposable and cost $125 per sample.

ThunderStorm

Halo Genomics: The Haloplex PCR kits allow up to 2000 amplicons to be analysed in a single reaction. Kits are available for 48 or 96 sample processing and amplification is complete in 24 hours. They offer predesigned panels e.g. BRCA 1&2.

800ng of gDNA is digested in eight different reactions (each using 100ng input DNA) containing a pair of restriction enzymes in a “controlled fragmentation” creating a predictable digestion of the genome, which makes the method very specific. Multiple restriction fragments cover the targeted sequence. The digests are denatured and pooled for further processing. Restrictions can be done in 96well plates and plates combined for a simple processing workflow.

Halo probes are designed to have 20bp of complementarity at either end to restriction fragments containing target DNA. Hybridisation of probes and DNA produces a circular molecule that can be ligated. Probes are biotinylated and unbound DNA is removed with a streptavidin-bead cleanup. They also contain the sequences for PCR amplification to add Illumina adapters and barcodes, producing sequencing ready libraries.

You capture the ROI by sequencing all the bits of the digests that cover the region of interest.

95% of target bases are covered by more than one digestion fragment, 60% of bases sequenced in a PE100bp run are on target.

Halo currently use 96 barcodes but there is essentially no limit and they could also use the dual-indexing approach recently launched by Illumina with Nextera kits.

The 8 digests cover the target region multiple times, the light blue fragemnts will be sequenced

MIPs: A Molecular Inversion Probe or Padlock Probe is an oligonucleotide, between 100-300bp in length that targets a genomic locus. The ends of the probes are complementary to the outer ends of the target locus and the gap between the ends after hybridisation is about 100-500bp. Probes are hybridised to DNA, polymerase extends the probe between the two ends and ligase joins the ends to create a circularised copy of the target region. All probes in a set are then amplified using universal primers in the non-target probe sequence as a universal PCR reaction. Platform specific sequencing adapters can be added as part of the MIP or as a secondary PCR to “tailed” universal primers. Several companies produce MIPs and these can be ordered from any oligo synthesis company to your own design.

Illumina TruSeq custom amplicon: Illumina’s TCSA uses the GoldenGate assay to target up to 384 loci. GoldenGate is a multiplex extension/ligation assay that is highly specific, and the use of universal PCR primers removes many PCR artifacts seen in traditional multiplex PCR. Illumina have a very nice design wizard and I reviewed the one for custom capture last year. Up to 384 amplicons are processed in a single tube reaction allowing 36864 amplicons to be analysed in a single run (96*384). The universal PCR reaction uses 20 oligos (8 forward and 12 reverse) allowing up to 96 samples to be processed and sequenced using Illumina’s dual index barcoded sequencing.

Standard multiplex PCR: At least two companies are selling kits that use a standard multiplex PCR. Primer sets are carefully designed to reduce primer interactions and prevent spurious sequencing artifacts.

LifeTechnologies Ampliseq: sell the AmpliSeq kit in 48 or 480 amplicon sizes. LifeTech suggest just 10ng of input DNA.There is a design tool available but although you can submit a design, you won't get results until March when it launches! They aim to cover regions of 1kb to 1Mb with up to 1536 200bp amplicons per pool and 150bp amplicons will be available for FFPE samples. It would not be surprising if the kits move to longer amplicos as the 400bp chemistry gets rolled out.

Life Tech have a cancer demo kit that targets 700 mutations in 190 amplicons and generates 500x coverage on a 314 chip. The costs are about £75-100 per sample for library prep.

Multiplicom: I saw a presentation from this company at a meeting in London late last year and wanted to add them into this post. I have no other information than what is on their website and what was in the talk, but as soon as I get in touch for more info I'll follow up. The MASTR (Multiplex Amplification of Specific Targets for Resequencing) assays are multiplex PCR panels designed using a proprietary algorithm. Multiplexes are generally around 50 amplicons per panel in single-tube standard PCR reactions and PCR optimisation is based around adjustment of primer concentrations. Their design has focused on 454 to date and the do offer a protocol based on concatenation and fragmentation for short read sequencing.

Multiplicom’s BRCA 1&2 panel targets all the coding sequence using 93 amplicons and just 50ng of DNA. The panel was designed for 454 sequencing but with the increase in read length on both PGM and MiSeq (http://core-genomics.blogspot.com/2012/01/agbt-previews.html) this kind of panel might be analysed on other platforms.

Long-range PCR combined with Nextera: A method demonstrated by Illumina combines a simple long-range PCR with the Nextera library prep technology to produce sequence ready libraries from multiple genes fairly easily in any lab with little additional hardware or expertise. Long-range PCR primers are designed to regions of interest. Multiple lr-PCRs can be run on a sample and pooled before Nextera library prep. A final PCR allows dual-index barcoding from 20 oligos. Up to 96 samples can be pooled for sequencing.

Custom capture: Of course it is possible to use the same approach as exome sequencing for a much smaller set of targets. Whilst this is not amplicon sequencing per se it can give similar results. One of the drawbacks is having to make a sequencing library first but a way to make this much easier is to combine capture with Nextera library prep to produce a pretty ideal workflow.

Sequencing libraries can be made in a very high-throughput manner using Nextera and dual indexing allows up to 96 samples to be run in a single lane. Whilst no-one has published this workflow yet the trick seems to be to include blocking oligos in the capture pull down to the Nextera sequences.

This way library prep for 12x 96 samples can be done in a week by one person and a multichannel pipette. Capture can be done as 12 plex reactions so in theory a plate could be pooled along rows and 12 plates of library prep could be run in a single plate of capture. This would allow amplicons in 1152 samples to be sequenced very quickly, and with enough barcodes in a single HiSeq lane making the per sample sequencing cost about $1.

How should I choose which one to use? This is a difficult question for anyone to answer and everyone will have their own ideas on the pro's and con's of the different systems. For most it will be the project costs that make the biggest impact, for others DNA input requirements might be the key factor.

Personally I see the multiplex PCR based systems, or Nextera plus capture protocol being adopted in a clinical setting. Both of these are easy to work with in standard 96 well plates with no additional hardware. Libraries are ready to run and barcoded and could be run fast on a system like PGM or MiSeq, or slow and highly multiplexed to make them extremely cheap on HiSeq.

As I have said before the cost per sample is going to make one of these the dominant technology and $10 per sample rather than $100 is what we want.

Updates on methods not already covered:
Kailosgenetics: The "Target Rich" system uses a Nested PatchPCR, not a method I had come across before. It uses two rounds of target-specific enrichment and as it is using four oligonucleotide hybridizations per locus, versus the two in a standard PCR it should show higher specificity.

The first Target Rich kit assays the following genes: BRAF, EGFR, FLT3, JAK2, KIT, KRAS, PIK3CA, PTEN, TP53 and VEGFA. The kit requires only 250ng of DNA, exactly the same as TCSA which includes a ligation as well. The amount of DNA required for these methods is higher than the 10ng for AmpliSeq for instance but higher specificity will be necessary for some targets.

Tuesday, 10 January 2012

AGBT previews...

LifeTech and ABI have been battling it out for supremacy in next-gen sequencing for the last four years. Traditionally the lead up to the AGBT (Advances in Genome Biotechnology) conference has seen the major sequencing companies announce their latest developments. It is always exciting and the last four years have seen GAIIx, HiSeq and MiSeq from Illumina as well as Helicos, Ion Torrent, and PacBio on the hardware front.

Announcements from both Life Tech and Illumina helped push their stock prices a little higher. I was surprised by both announcements but had been expecting them to release something before AGBT. Moving MiSeq technology onto HiSeq was not unexpected and many users had been asking if it would be possible.

These announcements certainly made a lot of noise and there are at least 100 news articles out at the time I write this.

The announcements: Proton vs HiSeq 2500
Life Technologies Proton: The Ion proton sequencer will give a $1000 genome in one day (nothing about the coverage of this genome though, or if it will use the new 400bp chemistry). The instrument will cost $150,000 and this will certainly make it affordable for many labs. The chips look about the same size as the PGM ones, there is a nice picture of Jonathan Rothberg holding one here. You can get more information over at Nick Loman's blog, "The Chip is (not) the machine".

Illumina’s HiSeq 2500: It did not take long for Illumina to catch up in developments today (the press releases were only hours apart), and I’d say outgun Life Technologies. The HiSeq2500 is a significant step forward for the platform and looks to give it another year or two of life. The big news on the press release was 120Gb in 27 hours! It sounds like the upgrade to HiSeq is going to include a few hardware changes ($50,000 has to go on something) and this appears to mean goodbye cBots and may also allow faster fluidics and imaging. Right now I’m not sure and will be digging for more information.

600Gb is the standard v3 output using paired end 100bp runs (although my lab has not quite made it that high so far, currently at just over 260Gb on our best flowcell). This run takes 10 days for a pair of flowcells and Illumina have already discussed 1Tb runs from more clusters on the lanes. Perhaps this will be the upgrade held in reserve until the end of the year?

600Gb per 2 flowcells costs under £20000 for 20 genomes at 10 fold coverage in 10 days, this is equivalent to 2 genomes per day at under £1000 per genome. OK so you have to wait ten days to get the genomes out the other end but who realistically needs their genome back in 24 hours, not a lot of users! And on HiSeq2500 you have a choice to do this anyway.

The ability to generate 120Gb of data in 27 hours suggests that chemistry is significantly sped up and that imaging will be limited to a few tiles per lane for this type of output. This will make the per Gb cost five times higher than a 600Gb run but data will be ready in just over a day.

The announcements: PGM vs MiSeq:
Life Technologies: There was no announcement on the PGM, although I believe some users are now testing the 400bp chemistry which is a nice improvement.

Illumina: The Miseq output has doubled (either another tile or dual imaging of the flowcell) and a new PE250bp kit configuration allows even longer runs. I know a group that has run some PE300 already by topping up the cartridges and we have done some single end 300bp sequencing on amplicons. The quality is good enough, but of course it always depends on your requirements.
Faster rune times will probably allow the PE250 to be completed in 27 hours keeping run times short.
It looks like 4-5Gb might come out of the instrument compared to the 1-2Gb from PGM.
A PE250 run will allow a 450-500bp amplicon to be completely sequenced.

Which companies machine(s) should I buy? I’d certainly agree that HiSeq has a high instrument cost compared to the Proton and many labs struggle with the $750,000 capital cost. However the cost of the genomes appears pretty comparable with Ion Torrents Proton instrument.And we should not forget that the capital cost of the instrument is only one consideration. Most users will be canny enough to weigh this carefully against all the other variables when making their choices.

It’ll be great when the users who are comparing Illumina and Ion Torrent publish some of their results. It is difficult to judge performance and which platform might be best when things are developing as fast as they are but this is the one piece of information users want to make purchasing decisions on. I’d guess a couple of high-impact publications one way or the other might swing sales of 100s of units.

PS: If BGI upgrade all their 137 HiSeq instruments they could theoretically generate 17.4Tb of data in just over one day using the fast run technology.

PPS: We should not forget Oxford Nanaopore Technology. Nearly everyone I know is expecting ONT to finally make an announcement that they have sequenced a genome. I hope it's a Human one but I'd be happy with PhiX!