Friday, 18 November 2011

MiSeq: possible growth potential part 2

This post (and others like it) are pure speculation from me. I have no insider knowledge and am trying to make some educated guesses as to where technologies like this might go. This is part of my job description in many ways as I need to know where we might invest in new technologies in my lab and also when to drop old ones (like GA).

A while ago I posted about MiSeq potential and suggested we might get to 25Gb per flowcell. This was my first post on this new blog and I am sorry to say I forgot to divide the output on HiSeq by two (two flowcells) so my 25Gb should really have been closer to 12Gb. Consider it revised (until the end of this post).

In October 2007 the first GAI was installed in our lab. It was called the GAI because the aim was to deliver 1Gb of sequence data. It was a pain to run, fiddly, and the early quality was dire compared to where we are today. I remember thinking 2% error at 36bp was a good thing!

Now MiSeq is giving me twice the GAI yield "out-of-the-box".

Here is our instrument:
MiSeq at CRI

And here is the screenshot for run performance: this was taken at about 13:00 today after starting at 15:30 yesterday. If you look closely you can see we are already at cycle 206!
MiSeq installation run

MiSeq in MyLab: So now I can update you on our first run that has processed the in run yield prediction and other metrics. This is a PE151bp PhiX installation run.

MiSeq installation run metrics:
    Cluster density: 905K/mm2
    Clusters passing filter: 89.9%
    Estimated yield: 1913.5MB (I think this means about 6.4M reads)
    Q30: 90.9%  

What is the potential:
So our first run is double the quoted values from Illumina on release.

Broad have also performed a 300bp single end run and if some extra reagents could be squeezed into the cartridge (reconfigured tubes that are a bit fatter perhaps) then PE300 is possible if you wanted to run for 2 days. This would yield 4Gb based on my current run.

We only need an increase of 3x in yield to hit my revised 12Gb estimate, read on...

At the recent Illumina UK UGM we had a discussion in one of the open floor sessions on what we wanted from an instrument like MiSeq. The Illumina team discussed options such as reducing read quality to allow faster runs. This would be achieved by making chemistry cycles even shorter. Currently chemistry takes 4 minutes and imaging takes 1, for a combined 5 minute cycle time.
Reducing chemistry cycle times would speed up the combined cycle time and allow longer runs to be performed, this would impact quality (by how much is not known and Illumina would not say). If you do teh same with imaging then you increase yield but make run times longer.

If you play with chemistry and imaging cycle times you can generate a graph like this one.
In this I have kept cycle time constant but varied chemistry and imaging times. The results are pretty dramatic. The peak in the middle of the table represents a 1min Chemistry / 1 min Imaging run, giving the same number of clusters as today (nearly 7M in my case) on a staggering 720bp run. This may be achievable using the standard reagent cartridge if less chemistry is actually used in the cycling (I just don't know about this). If you are happy to increase run times to two days then a low quality (maybe Q20) 1400bp (PE700) run would be pretty cool.

Even if this is a step too far then dialling in quality and playing with imaging could allow some really cool methods to be developed. What about a strobe sequencing application that gave high quality data at the start, middle and end of a 1000bp cluster for haplotyping but did not collect images in the middle? The prospects are interesting.

As I said at the start this is speculation by me and the reality may never get quite as far as 1400bp on SBS chemistry. We can keep our fingers corssed and I hope that exactly this kind of sepculation drives people to invent the technologies that will delvier this. After all if Solexa had not tried to build a better sequencer we would not be where we are today.

I thought I might trade in my remaining GA's for a HiSeq but perhaps I'd be better off asking for two more MiSeq's instead?

Who knows; HiSeq2000 at 600GB (2 flowcells),  HiSeq1000 at 300GB (1flowcell), MiSeq at 35GB (equivalent to 1 lane)?

Competitive by nature: Helen (our FAS) would not let me try to max out loading of the flowcell, I do feel a little competitive in getting the highest run yield so far. Did you know Ion offer a $5000 prize for a record braking run each month? Their community is actually quite a good forum, and I hope they don't kick me off!

3 comments:

  1. James, assuming that it will be possible to get 35Gb out of a MiSeq run, it will take some development time to get to that point. Are you getting the sense from Illumina that during that time there won't be any more improvements on the HiSeq platform?

    ReplyDelete
  2. Any thought as to how insert length affects cluster formation? I would imagine at some point (perhaps >1kb) clusters would become difficult to discern. I guess you could always decrease your density on the flowcell, but then you're sacrificing output.

    ReplyDelete
  3. Shawn: I hope HiSeq is not dead yet! There are likely to be improvements but we have seen such a ramp up in data volumes from the instrument since its release that I guess we should not get too greedy. If some of the quality improvments can be ported over from MiSeq then PE150 and 1TB becomes possible outside of Illumina labs.
    Can't you shed any light on this?

    djschlesinger: You would sarifice numbers of reads but would get higher physical coverage of the genome of interest. 1kb inserts is essentialy mate-pair without the mate-pair protocol. Yes clusters would be larger and less intense and this may impact %PF rates as well. Something to think about though.

    ReplyDelete