Thursday, 12 January 2012

Targeted Cancer sequencing methods

In a previous post I discussed Ilumina and LifeTechs Amplicon sequencing products: TruSeq Custom Amplicon and AmpliSeq. Here I wanted to expand on the methods available and discuss some of the issues faced by people wanting to run amplicon sequencing projects.

Below I briefly discuss how many samples can be analysed in a run and multiple amplicon targeting methods including: Fluidigm, RainDance, Halo Genomics, MIPs, Multiplex PCR, Long range PCR with Nextera and Custom capture.

I hope you find this a useful summary and feel free to suggest other methods to add to the list.
Why sequence amplicons? Amplicon sequencing is big news in Cancer medicine, and next-gen sequencing allows us to target multiple amplicons in a single test. It is not immediately clear which amplicons to resequence and many questions remain:
  • Are small panels for particular cancers are better than a larger generalised panels?
  • What are the logistical and cost implications of small vs large panels?
  • Can we amplify and sequence the lager panel but only report tests doctors order (and delete the rest of the information)?
  • Should genes where a mutation is not currently actionable be included for RUO?
  • How should we return data to doctors/patients?

An amplicon panel that covers all 11 exons of TP53 is very cheap to manufacture but it costs the same to process the samples in the lab as a test covering many more genes.

Using COSMIC as a guide we can start to think about how many mutations might be covered by panels of different sizes. Only a few genes in COSMIC account for the majority of mutations, Jak2, TP53, KRAS, BRAF and EGFR account for 54% of mutations. The top 10, 20, 30 and 100 most mutated genes account for 65%, 77%, 80% and 85% of mutations. A core panel of all currently actionable mutations, plus the top 5 genes is around 100 exons/amplicons (an average of 17 exons per gene as Jak2, BRAF and EGFR all have lots of exons).

In the genome the average is under 10 exons per gene and 80% are less than 200bp in length. As such most genes will be targetable by fewer than 10 250bp amplicons.

How many samples can you analyse in a run: This is very much an “it depends” question. The sequencing platform used, coverage required, number of samples pooled and number of amplicons sequenced all impact the final number it is possible to run in a given sequencing system. The same issues affect how much the sequencing is going to cost per sample but this can become incredibly cheap on a per sample or per amplicon basis.

Costs per sample for amplicon sequencing on different platforms

What amplicon sequencing methods are available: There is a lot of choice for anyone wanting to sequence amplicons and it is not easy deciding which approach to go for. However once a choice is made it can be difficult to move to an alternative technology as the primers used are generally a one off purchase and changing your mind on which amplicons to target or which primers to use can be difficult and expensive. It is also generally difficult to move amplicon panels from one technology platform to another without making a new set. Most technologies allow users to run on almost any sequencing platform with a small change in protocols, we have run Fluidigm, RainDance and lrPCR in my lab on Illumina and Fluidigm on Ion PGM.  

Fluidigm's Access Array: The Access Array system (including disposable chips, loader, thermal cycler and harvester) uses a microfluidic chip to partition very small 9nl PCR reactions from panels of samples and PCR primer pairs. The standard chip allows 48 DNA samples to be amplified with 48 primer pairs producing 2304 amplicons. Locus specific primers are tailed with sequences that allow a second round of PCR to add barcoded sequence adapters for all sequencing systems. It is possible to multiplex primer pairs with careful design and I am aware of one group that has tried 20plex with some success, allowing a theoretical 46000 amplicons. A single chip can be processed very easily by anyone who has done a PCR, using a multichannel pipette. The hardware and chips are only available from Fluidigm. Primers can be designed by anyone and there are no special requirements other than the addition of specific 5’ tail sequences to forward and reverse primers. Any sequencing platform can be used after the 2nd round of PCR using platform specific primers that include a 10bp barcode sequence. We have run up to 384 samples per Illumina flowcell lane, although 48 have been more common. The cost of a chip is £250 and reagents are about £150 (although a set of 48 primers will be around £500 as a one off up front cost), this makes the per sample cost about £7.50.

The Access array hardware

RainDance's ThunderStorm: The Thunderstorm system uses emulsion PCR to separate single molecules of DNA and specific primer pairs for PCR. Up to 96 samples can be processed in an unattended two day hour run for between 500 and 20,000 amplicons. The ThunderStorm system allows up to 8 different amplicon panels to be processed across the 96 samples so a mix of projects is quite possible. Primer pairs are designed, synthesised and then pooled. Each pair if then dropletised as an emulsion and the emulsions are mixed into a final primer set for PCRPCR master mix. Both are loaded into a microfluidic chip where droplets of DNA/Master mix and primers combine to form individual emulsion PCR reactions. RainDance have some great videos on their website which explain the system far better than I do. The final amplicon emulsions are pooled, cleaned and further PCR’d with ten cycles to add platform specific adapters.

The production of amplicon panels is costly but around 2000 samples can be run from the oligo pool. Currently RainDance have a $12,000 design fee which covers production of a 500 amplicon pool, additional amplicons are charged at $12 each. A 20,000 amplicon panel costs $252,000 to manufacture but only $126 per sample to use in a large project. The microfluidic chips are disposable and cost $125 per sample.


Halo Genomics: The Haloplex PCR kits allow up to 2000 amplicons to be analysed in a single reaction. Kits are available for 48 or 96 sample processing and amplification is complete in 24 hours. They offer predesigned panels e.g. BRCA 1&2.

800ng of gDNA is digested in eight different reactions (each using 100ng input DNA) containing a pair of restriction enzymes in a “controlled fragmentation” creating a predictable digestion of the genome, which makes the method very specific. Multiple restriction fragments cover the targeted sequence. The digests are denatured and pooled for further processing. Restrictions can be done in 96well plates and plates combined for a simple processing workflow.

Halo probes are designed to have 20bp of complementarity at either end to restriction fragments containing target DNA. Hybridisation of probes and DNA produces a circular molecule that can be ligated. Probes are biotinylated and unbound DNA is removed with a streptavidin-bead cleanup. They also contain the sequences for PCR amplification to add Illumina adapters and barcodes, producing sequencing ready libraries.

You capture the ROI by sequencing all the bits of the digests that cover the region of interest.

95% of target bases are covered by more than one digestion fragment, 60% of bases sequenced in a PE100bp run are on target.

Halo currently use 96 barcodes but there is essentially no limit and they could also use the dual-indexing approach recently launched by Illumina with Nextera kits.

The 8 digests cover the target region multiple times, the light blue fragemnts will be sequenced

MIPs: A Molecular Inversion Probe or Padlock Probe is an oligonucleotide, between 100-300bp in length that targets a genomic locus. The ends of the probes are complementary to the outer ends of the target locus and the gap between the ends after hybridisation is about 100-500bp. Probes are hybridised to DNA, polymerase extends the probe between the two ends and ligase joins the ends to create a circularised copy of the target region. All probes in a set are then amplified using universal primers in the non-target probe sequence as a universal PCR reaction. Platform specific sequencing adapters can be added as part of the MIP or as a secondary PCR to “tailed” universal primers. Several companies produce MIPs and these can be ordered from any oligo synthesis company to your own design.

Illumina TruSeq custom amplicon: Illumina’s TCSA uses the GoldenGate assay to target up to 384 loci. GoldenGate is a multiplex extension/ligation assay that is highly specific, and the use of universal PCR primers removes many PCR artifacts seen in traditional multiplex PCR. Illumina have a very nice design wizard and I reviewed the one for custom capture last year. Up to 384 amplicons are processed in a single tube reaction allowing 36864 amplicons to be analysed in a single run (96*384). The universal PCR reaction uses 20 oligos (8 forward and 12 reverse) allowing up to 96 samples to be processed and sequenced using Illumina’s dual index barcoded sequencing.

Standard multiplex PCR: At least two companies are selling kits that use a standard multiplex PCR. Primer sets are carefully designed to reduce primer interactions and prevent spurious sequencing artifacts.

LifeTechnologies Ampliseq: sell the AmpliSeq kit in 48 or 480 amplicon sizes. LifeTech suggest just 10ng of input DNA.There is a design tool available but although you can submit a design, you won't get results until March when it launches! They aim to cover regions of 1kb to 1Mb with up to 1536 200bp amplicons per pool and 150bp amplicons will be available for FFPE samples. It would not be surprising if the kits move to longer amplicos as the 400bp chemistry gets rolled out.

Life Tech have a cancer demo kit that targets 700 mutations in 190 amplicons and generates 500x coverage on a 314 chip. The costs are about £75-100 per sample for library prep.

Multiplicom: I saw a presentation from this company at a meeting in London late last year and wanted to add them into this post. I have no other information than what is on their website and what was in the talk, but as soon as I get in touch for more info I'll follow up. The MASTR (Multiplex Amplification of Specific Targets for Resequencing) assays are multiplex PCR panels designed using a proprietary algorithm. Multiplexes are generally around 50 amplicons per panel in single-tube standard PCR reactions and PCR optimisation is based around adjustment of primer concentrations. Their design has focused on 454 to date and the do offer a protocol based on concatenation and fragmentation for short read sequencing.

Multiplicom’s BRCA 1&2 panel targets all the coding sequence using 93 amplicons and just 50ng of DNA. The panel was designed for 454 sequencing but with the increase in read length on both PGM and MiSeq ( this kind of panel might be analysed on other platforms.

Long-range PCR combined with Nextera: A method demonstrated by Illumina combines a simple long-range PCR with the Nextera library prep technology to produce sequence ready libraries from multiple genes fairly easily in any lab with little additional hardware or expertise. Long-range PCR primers are designed to regions of interest. Multiple lr-PCRs can be run on a sample and pooled before Nextera library prep. A final PCR allows dual-index barcoding from 20 oligos. Up to 96 samples can be pooled for sequencing.

Custom capture: Of course it is possible to use the same approach as exome sequencing for a much smaller set of targets. Whilst this is not amplicon sequencing per se it can give similar results. One of the drawbacks is having to make a sequencing library first but a way to make this much easier is to combine capture with Nextera library prep to produce a pretty ideal workflow.

Sequencing libraries can be made in a very high-throughput manner using Nextera and dual indexing allows up to 96 samples to be run in a single lane. Whilst no-one has published this workflow yet the trick seems to be to include blocking oligos in the capture pull down to the Nextera sequences.

This way library prep for 12x 96 samples can be done in a week by one person and a multichannel pipette. Capture can be done as 12 plex reactions so in theory a plate could be pooled along rows and 12 plates of library prep could be run in a single plate of capture. This would allow amplicons in 1152 samples to be sequenced very quickly, and with enough barcodes in a single HiSeq lane making the per sample sequencing cost about $1.

How should I choose which one to use? This is a difficult question for anyone to answer and everyone will have their own ideas on the pro's and con's of the different systems. For most it will be the project costs that make the biggest impact, for others DNA input requirements might be the key factor.

Personally I see the multiplex PCR based systems, or Nextera plus capture protocol being adopted in a clinical setting. Both of these are easy to work with in standard 96 well plates with no additional hardware. Libraries are ready to run and barcoded and could be run fast on a system like PGM or MiSeq, or slow and highly multiplexed to make them extremely cheap on HiSeq.

As I have said before the cost per sample is going to make one of these the dominant technology and $10 per sample rather than $100 is what we want.

Updates on methods not already covered:
Kailosgenetics: The "Target Rich" system uses a Nested PatchPCR, not a method I had come across before. It uses two rounds of target-specific enrichment and as it is using four oligonucleotide hybridizations per locus, versus the two in a standard PCR it should show higher specificity.

The first Target Rich kit assays the following genes: BRAF, EGFR, FLT3, JAK2, KIT, KRAS, PIK3CA, PTEN, TP53 and VEGFA. The kit requires only 250ng of DNA, exactly the same as TCSA which includes a ligation as well. The amount of DNA required for these methods is higher than the 10ng for AmpliSeq for instance but higher specificity will be necessary for some targets.


  1. Great post! As you point out, price per sample is a key factor. Would be great if you could add some rough estimate of costs/sample for all of those technologies!

    By the way, the volume of the Fluidigm Access Array reaction chambers is 35nl. (I think 9 nl is true for their analytical arrays.)

  2. Excellent compilation. Thanks James..

    Bharani Kumar

  3. Hi James--great posts and excellent education for me! Am just a little confused with the equation you used for amplicon sample number calculation:what does 'sample'on the left upper side stand for? I somehow could not add up the numbers from your examples in the table based on the equation.

    Thanks a lot!

  4. Sorry I should have been clearer.
    You need to multiply the number of samples by the number of amplicons and the requireed coverage; divide the result by the nuber of reads per lane and you get the maximum number of samples you can squeeze into a lane of sequencing.

  5. Thanks for the quick reply! Then I guessed it correctly. But my question that is still remaining is the cacculation with Roche GS Junior and HiSeq: for example if I use 1M reads of Junior with 48 amplicons each sample and 50X coverage I should expect 1M/(48*50)=417 samples? and with HiSeq 250M/lane, 384 amplicons&1500 coverage, I should get 250M/(384*1500)=434 samples in one lane?
    Sorry if I missed something; am new in the field!
    And also sorry for 'Anonymous' but I don't have those other profiles listed.
    Thanks for your help!

  6. Way of writing about this blog is awesome dear. I really impress with this post. It’s good and I really feel happy when I want to read something good and find such informative blog post...

    1. Hi, and thank you for your post, it is very interesting. I´m planing a research project in wich we are going to analyze about 15 genes (genomic coding regions) I think that an amplicon-based sequencing is the best approach, but I can´t decide wich multiplexing amplification methodology is the best for us. We don´t have Raindance or another sofisticated techonologies for the multiplexing, but we can access to NGS like Hiseq or 454. Could you help me with this?.

  7. Sorry I cant advise on specific projects from this blog. You'll have read on other posts that there are many methods out there. A simple one to get started with is simply high-throughput PCR but this may be unweidly for you. Take a look at something like Science Exchange and see if someone can offer you a service perhaps?