Thursday, 1 March 2012

Choosing between exome-arrays and exome-seq

Illumina and Affymetrix both recently released Exome microarrays.

The "death" of Gene expression microarrays: This is an interesting development considering there has been a lot of debate over the “death of arrays” in the last year or so.

It does looks like 3’ expression arrays are fading fast due to the cost and quality of RNA-seq data and improvements in RNA-seq analysis. Although we’ve not quite stopped using GX arrays yet.

SV-seq vs snpCGH: The relative merits of SV-seq vs snpCGH for copy number and LOH are less clear. 1M (or more) high-quality genotype calls can be obtained cheaply and quickly using arrays and the intensity information allows sensitive copy number detection. Whilst sequencing is ultimately more sensitive to structural-variation as we assay the DNA structure directly, there are real limitations in per-genotype quality with low to medium coverage. It is this that holds sequencing back and I think keeps a market open for snpCGH arrays. The costs and time for medium to high coverage SV-seq can’t yet compete with arrays where you can run 1000 samples in a couple of days for a few hundred dollars each.

Exome-arrays vs Exome-seq: Now exome arrays are vying to compete with exome-seq and to be honest I was quite surprised that array companies are bothering. Especially when one of them happens to be the leading next-gen sequencing company!

GenomeWeb covered the announcement of the products at last years ASHG.

Affy and Illumina aim to offer a fast and economical method to assess exomic variants. They suggest we might use these to follow up exome-seq or to complement GWAS in large cohorts to achieve statistical power in exome centred studies. There has been a huge amount of content generated for microarrays over the past few years of sequencing. And variants in the exome are likely to be functionally relevant (although we should make sure regulatory regions are equally well assayed).

Affy’s exome array targets around 320,000 snps, InDels and other variants. It is being offered at $70 to customers.

Illumina’s exome array comes in three versions, the HumanExome targets around 250,000 snps, the OmniExpressExome targets 950,000 and the Omni5Exome targets 4.75M snps. According to the companies press release the 250,000 exonic SNPs were identified from an analysis of over 12,000 sequenced genomes. It was being offered at $45 to early customers. The HumanExome chip is being offered at $80.

A difference of 70,000 exonic SNPs sounds quite large but I’m not sure what the real impact will be on research projects. Whilst Affymetrix have not revealed the scale of interest Illumina have said they are expecting to process over 1M samples on their Exome-chips.

The costs quoted appear to be chip only prices and one of the issues facing researchers as chips drop in cost is the relatively static price of the processing. It looks like Illumina’s OmniExpressExome for instance will cost about £150-200 to run once service costs are incorporated.

At around £200-250, Exome-seq is slightly more expensive but currently much more labour intensive. Illumina recently dropped the price of an exome to $50, if you include a TruSeq library prep at $50 and a one-sixth of a lane of PE75bp sequencing the total Exome-seq cost is £200-250. The major limitation compared to arrays may not be cost but is more likely to be time as it takes about 7 days to sequence 96 exomes on HiSeq 2000.

Which might you use? I think the arrays are going to fill a real need for researchers who have large sample collections and want to access exome content. Many more samples can be run in a given time for a given budget on arrays than on a sequencer. I think smaller projects are likely to be run on a sequencer though. And improvements to the Exome-seq workflows might make chips seem clunky to use.

Can we get rid of a two day hyb and still generate high quality exome data?

When might we start using newer library prep technologies like Nextera to access low DNA samples for Exome-seq?

Can we analyse FFPE exomes? Both chips and sequencing suffer here but this is a huge potential market.

Arrays RIP? My career in genomics started over ten years ago when I set up an Affy lab processing expression arrays with just 8000 probes. I have run lots of arrays since then including one project of 2500 HT12’s which we completed in just five weeks, running this number of samples on a sequencer is tough. Whether arrays will completely disappear or not will be a discussion for a while yet I think.


  1. The exome array vs exome seq issue is an interesting and subtle one. The key factor here is that sequencing is now cheap enough to use for variant discovery experiments extending up into the low thousands of samples, but those sample sizes are still inadequate for well-powered association studies for rare variants (say 1% frequency or less). Since a lot of us are gambling that there is some very interesting biology hidden in that frequency range, we need a cheaper assay than exome seq that can be cost-effectively run on tens of thousands of samples. Hence exome-chips.

    Whether exome arrays will prove effective at nailing down rare complex disease risk variants remains to be seen (although I'm optimistic), but the demand is clear: Illumina claims to have on the order of a million pre-orders for their array.

    It's worth noting that the Illumina exome array targets relatively few short indels - the Affy chip appears to be doing a better job when it comes to this important class of coding variation.

    It's also worth noting that both chips come with the ability to add custom content. Affy's chip has capacity for 100,000 additional markers; from memory, ILMN's chip has space for about 25K or so.

    (Disclaimer: I played a peripheral role in variant selection for both of these chips.)

  2. While reading this piece, I was really surprised to find the cost of exome-seq that low. Most of the companies out there are offering their services at a cost of (atleast) $1000 per sample (compared to $400 mentioned here).

    And is the drop in Illumina's TruSeq Enrichment kit due to their poor performance?