Thursday 19 June 2014

V4some: 1TB here we come…

Our HiSeq 2500 v4 validation runs are just about to finish and I thought I’d share some details. Ideally I’d give you access to the runs so you can dig around yourselves but until Illumina makes this possible on a per lane basis in BaseSpace you’ll have to make do with my plots.

We now have two HiSeq 2500 1TB machines running v4 SBS chemistry, we’re hoping to see the performance gains promised by Illumina and seen by others. More reads, longer reads and more genomes, exomes, RNA-seq & ChIP-seq. We’ve yet to run the full PE125 but these should go on early next week. The validation data presented here is all from a pair of paired-end 50x25bp dual-index runs with a mix of different samples.

The first thing we noticed is that v4 clustering is just 2.5 hours, or almost twice as fast as previous chemistry. This was a surprise as it was not mentioned in any of the release notes I’d seen but for my lab staff this, and automated PE turnover may be the most welcome improvements in sequencing workflows. Other than this everything about setting up a v4 run is about the same as v3 except clustering density. We decided to titrate loading concentrations for a couple of multi-lane libraries to verify optimal loading concentration. A recent support notification from Illumina highlighted the need for careful loading, as v4 may be more prone to over-clustering than v3, although the effect on data quality for many applications is likely to be small.

The runs: We ran such a funny pair of flowcells mainly because of the delay in v4 shipments. The PE50x25 allowed us to run a pair of paired-end flowcells in high-output mode to check fluidics and the PE turnover all with a single 250cycle SBS kit.

Cluster densities were good to great with an average of 280M reads non-PF for LT libraries loaded at 18pM. However our HT libraries consistently under-clustered which is something we’ve seen for a long time so we’ll load at 24pM. We saw 95% Q30 and 0.13% error rate on PhiX lanes. This suggests we might be able to push densities, but until we’ve run a few more flowcells and some PE ones too we’ll stick on the cautious side. Better to return 250M reads than none!

Highly multiplexed sequencing: several of the lanes had multiple samples and with such high-density of reads and high-quality pooling we are looking to get 23M reads per sample from a multiplex of 48 RNA-seq samples run on four lanes. This means a whole plate of RNA-seq can be run on one flowcell with library prep and sequencing potentially completed in one week! V4some indeed!

New RTA: The latest RTA also brings improvements in low-diversity sequencing previously available on MiSeq. We ran single amplicons with good success on Miseq and will immediately test v4 for RRBS libraries, which suffer from very low-diversity in the first three bases. The current RRBS protocol requires a template read with dark-cycles and it is a pain to run and flaky in our hands. If v4 chemistry and the new RTA allow us to run RRBS as a standard sample then we’ll be over the moon.

What’s the outlook for PE125: If we keep consistently high clustering and error rate is acceptable to our users then our first 1Tb run will hopefully be achieved on Sunday the 3rd of July after a 6-day run. I think many users are going to see real benefits from v4 in terms of experimental timescales, costs and quality. There is a bit of a question mark over how we’ll run exomes as Illumina do not sell a v4 150 cycle SBS kit, we’ll need to decide on 3x50 kits or 1x250 and throw the reagents away, or perhaps 2x75 + 1x50 flowcell from a single 250 cycle kit!

What’s the outlook for rapid-runs: My lab does not plan to run another rapid flowcell unless we get a specific request from a user. Rapid run was a useful innovation but in hindsight it quadrupled the amount of work my team have to do when clustering, sequencing and washing the HiSeq (and the amount of kits Illumina had to produce, package and ship). On-board clustering was nice but potentially suffers from the same carryover issues as MiSeq so some of our users simply waited to fill a high-output flowcell. In a lab like mine that processes large numbers of lanes rapid run has been useful, but all the same we might be glad to see the back of it. We did like being able to run stuff faster and not have users waiting around or playing matchmaker, we’ll watch this space before ditching rapid entirely.

What’s the outlook for BaseSpace: We are connected and both these flowcells were uploaded to BaseSpace. The main attraction for me right now is the ability to run core apps like RNA-seq and use these for very fast application specific QC. This will allow us to check samples cluster sensibly and are replicates are not obviously different to one another on several technical and biological metrics. If we have issues with library prep then we should be able to see this and fix the problem before we’ve run the next batch. As we aim to process perhaps 192 RNA-seq samples per month this is hugely important to doing the best we can as a core facility. Eventually the raw data is going to the group or our Bioinformatics core for a formal analysis using the latest tools and project specific parameters. Maybe one-day some of the final analysis will go directly form BaseSpace to publication?

1 comment:

  1. Hi James
    Great blog
    Is your loading conc based on Kapa qPCR kit calculations?
    From the table is seems you compare RNA-seq HT with DNA LT. Lower output from RNA-seq is quite common I´ve heard, but do you see the same patternt with DNA HT/HS and DNA LT/LS. If yes, what do you think is the most reasonable explanation for this difference?


Note: only a member of this blog may post a comment.