CoreGenomics: (almost) everything you wanted to know about @illumina HiSeq 4000...and some stuff you didn't

The HiSeq 4000 was Illumina's way of making the patterned flowcell technology available to non X Ten customers, and opening up patterned flowcells to applications other than Human genomes. The list of supported library preps is still relatively small: TruSeq Nano, TruSeq PCR Free, Nextera rapid capture, Truseq mRNA stranded, Truseq total RNA stranded, TruSeq RNA access - basically TruSeq! However customers are running many unsupported library types after a bit of internal testing. For some applications HiSeq 2500 is likely to remain the platform of choice - particularly for the libraries which are HiSeq 4000's "classes of evil".

BTW: this is a big post - sorry! I'm going to follow up with a post about the "ExAmp" duplicates and how they may be a problem for some applications. And I'll also follow up as we get some real data comparisons between HiSeq 4000 and 2500v4

The HiSeq 4000 is pretty much an X One in all but name. It has a new camera, a wider field of view, and more powerful lasers that together allow scanning of just two swaths and result in an almost two-fold increase in scan speed vs. 2500; a new 12x2 core server PC brings more compute to manage the significant increase in data, and new versions of the RTA software process more of this data in memory moving further away from raw data ever being stored on the instrument PC (anyone remember re-processing GA images).

Lower project sequencing costs: Because the HiSeq 4000 has an increased maximum read-length (PE150) and increased cluster density (312M clusters vs HiSeq 2500's 250M) users can expect to see lower costs for sequencing. As a guide expect to run the following number of samples per application.

Genomes - 6 Human genomes (30x coverage) per flowcell in just 3 days
Exomes - 90 Nextera exomes (4Gb per exome) per flowcell in under 2 days.
RNA-seq - 125 mRNA-seq DGE (20M reads per sample) per flowcell in under 2 days.

The big change is of course the introduction of the X Ten patterned flowcell technology. I believe this works by coating the two halves of the flowcell with the lawn oligos, the halves are then machined/polished to remove oligo from the top of the nanowell silicon and finally the flowcell is assembled. This goes along with the new clustering chemistry...

ExAmp cluster generation: clustering on the HiSeq 4000 and X Ten platforms use Illumina's latest "Exclusion Amplification" chemistry. ExAmp ensures only a single template molecule binds and forms a cluster. Cluster amplification is almost instantaneous and excludes other molecules from binding. Most wells have a single molecule cluster, some wells are polyclonal but excluded from the by the chastity filter and will not appear in your %PF reads. A few nanowells will be empty. Some will be "ExAmp duplicates", which form because a single molecule which formed one cluster is able to hybridise to another nanowell nearby and create a second "ExAmp duplicate" cluster.

Want to know more, then watch the ExAmp Cluster Amplification Workflow training video from Illumina. If you are clustering and want to see the cBot bit then look at 3.5 "Prepare ExAmp Reaction" and "Tips and Tricks". The ExAmp EPX1, EPX2 & EPX3 mastermix needs to be prepared in order and mixed without vortexing until it is a uniform cloudy solution. EPX3 is particularly viscous - make sure you're doing the right sort of pipetting! Aspirate and dispense just below the meniscus, aspirate at 90 degrees and dispense at 45 degrees, pause after dispensing and don't blow out your pipette tip while mixing!

Patterned flowcell "Classes of Evil": The new flowcells contain billions of nanowells with a structured pattern (hence "patterned" flowcells - see this post from 2013 where I first described them). They provide uniform cluster spacing and size and hopefully will be released with higher and higher loading densities over time. Each nanowell is coated with the P5 & P7 oligo lawn very similarly to "normal" flowcells, but the clustering chemistry is completely new.

Whilst the patterned flowcell should mean that the yield per lane remains pretty stable and loading concentration is not quite as important as it is on HiSeq 2500, there are some library issues users need to be aware of. HiSeq 4000 has three "Classes of Evil"

1. Intolerance to variable insert sizes: the original clustering chemistry developed by Solexa (Manteia) has come a long way, and is pretty good at clustering any template up to and above 1000bp. Clustering has always favoured smaller molecules, which can make getting yields spot on complicated, but the 2500 is relatively easy to get right. However the new ExAmp chemistry is much less tolerant and this has been a major reason why library prep types and recommended insert sizes are so strict.

If a library with a wide distribution is clustered then a significant imbalance of reads may be the result, with the majority coming from the smaller molecules in the library. People running PCR amplicons, or using preps that results in multiple library peaks will have to be careful and run some tests before moving over to HiSeq 4000.

You may also see variability in the number of reads from the PhiX spike-in. We aim to add 1% but because the spike may be a different size from your library it will cluster more or less efficiently (usually better) and so the final percentage can be quite different.

2. Higher susceptibility to adapter contamination: HiSeq 4000 has higher adapter contamination rates compared to 2500 due to ExAmp favouring smaller molecules. Shorter library molecules have always clustered more efficiently but this effect is amplified on patterned flow cells due to the rate of diffusion in the ExAMp chemistry. This is a big problem if you have adapter contamination over 1% visible in your BioAnalyser trace, and Illumina recommend keeping adapter contamination below 0.5%. Use an additional cleanup if necessary. The same problem (diffusion rate of small vs large libraries) affects projects where two populations of fragment length are expected (e.g. PCR amplicons). A 1-2-1 pM ratio will end up unbalanced due to the ExAmp favouring the smallest fragments. If you are mixing your libraries with PhiX bear in mind that this is a 200-30bp library and so may clister more efficiently than your library.

In the worst cases a 5% adapter contamination can lead to 60% of reads being from adapter molecules, even a 1% contaminant can result in over 5% adapter reads. If you see adapter dimers in your BioAnalyser trace perform and additional 1:1 SPRI bead cleanup, or use a gel extraction if you need to.

3. Increased duplication rate: HiSeq 4000 has higher duplicate rates compared to 2500 due to "ExAmp duplication". This arises from the original seeding molecule re-seeding a second (or multiple) cluster nearby. These duplicates reduce genome coverage but expected coverage levels are achievable even with this slightly elevated duplicate rate due to the higher yield of 4000 so most users need not worry. There are tools to mark and remove duplicates, Picard Tools being the prime example, however Picard marks all duplicates in the same way. I am not aware of any parameter, or other tool, that will mark PCR duplicates separately from "optical duplicates" (two-clusters called as one and unique to non-patterend flowcells), or these new "ExAmp duplicates" (duplicates caused by re-seeding and unique to patterend flowcells). I think this is something we will need to consider as there are different ways to fix duplication depending on where it comes from.

Working with HiSeq 4000 in the lab: You'll need to be quick and work without interruptions as the ExAmp chemistry needs to be on the cBot within 30 minutes of mixing. A "Do Not Disturb" sign might be required.

The Patterned Flow cells are shipped dry (like NextSeq flowcells) and must be stored at 4°C, leave them to come to room temperature for 30 minutes before opening and then use them within 4 hours. After clustering the flowcell can be stored at 4°C for up to 48 hours. Unfortunately the flowcell is not provided with barcode for the tube you'll store it in after clustering (Illumina do ship barcodes with their library prep kits, hopefully they'll start doing this with the flowcells).

We will continue to add 1% PhiX as a control, but may no longer add 5% to lane 8 (flowcells cannot be mis-oriented). Rather than add 0.5ul of 3nM PhiX to 49.5ul 3nM library we add 5ul of 300pM PhiX to 45ul 3nM library as this is more accurate, however if you want lots of PhiX to help with low-diversity libraries please talk to us beforehand as this may not work quite so well. We're also looking into SeqMatics indexed PhiX - although why Illumina's control does not have an index is quite simply beyond me!

HiSeq 4000 sequencing: The latest sequencing chemistry is 3x faster than 2500 V4 with an improved polymerase. You can make life a little easier buy thawing your sequencing and paired-end turnover reagents at 4°C overnight rather than at room temp in a water bath on the day.

RTA now runs in memory: The move to RTA running in memory does speed everything up but comes at a price as when it fails the run cannot be rescued. There is no ability to pause a run which will make life in the lab a little more complicated than before. The changes also mean we'll stop looking at the first base reports and wait for cycle 25 and the analysis software to provide the run metrics. First base used to be important when cycle times were an hour each, but HCS provides more useful and more accurate run metrics and is now quick enough for most labs. The new RTA has a different template generation method (not sure what this will mean for low diversity libraries) and an empirical phasing correction. There is also a totally new bcl2fastq (cleverly called bcl2fastq2) which improves the processing of data.

Washing your HiSeq 4000: Make your Maintenance Wash Solution (Tween/Proclin) from 10% Tween (add 25 ml Tween20 to 225ml lab grade water), add 1.5 ml of Proclin 300 and 750ml of lab grade water, stir until thoroughly mixed with a magnetic stir bar. Finally add 4000ml of lab grade water and mix. This can be stored at room temperature for up to 30 days. You can reuse your wash bottles up to 3 times.

Enjoy! We're just getting started on our instruments and I'm sure we'll learn a lot over the next few weeks and moths. PLease do let me know about inaccuracies in this post, or about your own HiSeq 4000 experiences. Feel free to contact me directly: james.hadfield@cruk.cam.ac.uk.

7 comments:

Unknown19 January 2016 at 08:56
This comment has been removed by a blog administrator.
Anonymous20 January 2016 at 00:11
PhiX v2 is barcoded. In the "old" days, users couldn't spike-in PhiX if their libraries were barcoded. With enough complains, Illumina came out with PhiX v3 (without barcodes). PhiX v2 had index #2 of the 6bp adapters I believe.
Nandita16 July 2016 at 15:03
Hi James, I enjoy reading your blog and and deconvolution of NGS tech. I was looking for your opinion/experiences on automation solutions for NGS.
James@cancer18 July 2016 at 19:04
Hi Nandita, I am available for consulting if that is what you are after? Contact me on LinkedIn.
James@cancer18 July 2016 at 19:05
PhiX without barcodes is a problem. Replacing barcoded phix with a non-barcoded version was dumb.
Anonymous27 July 2016 at 13:47
Any plans for Illumina to provide lower cluster pattern density for larger insert libraries (2-3 times) - 600-1000bp?

Note: only a member of this blog may post a comment.

CoreGenomics

Pages

Monday, 18 January 2016

(almost) everything you wanted to know about @illumina HiSeq 4000...and some stuff you didn't

7 comments: