ENCODE was a mammoth endeavor, and one that is helping to better shape our understanding of biology, but the project required a large multi-national collaboration to generate the 1000’s of ChIP-seq and RNA-seq libraries. Last week Duncan Odom’s research group at the Cambridge Institute published an automated ChIP-seq pipeline in Genome Biology capable of generating 96 ChIP-seq libraries with just 2 hours hands-on time making the lab-work for projects of ENCODE scale possible in just a few weeks. With all the samples on a new higher-density patterned flowcell perhaps?
The paper references a much earlier methods publication Schmidt et al: ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions and the authors admit that the standard ChIP-seq method is laborious. In fact we have five or six groups in the building performing ChIP-seq and an awful lot of time is spent on library production (a little less on QC sometimes). In fact we’ve an experiment going in my lab right now to look at bringing in a higher-throughput method (although not the fully automated one yet) that dispenses with the gel size-selection making the workflow much easier.
The amount of tissue and antibody required for ChIP-seq are discussed by Aldridge et al, and the group spent time working on the minimum number of cells that can be used. They ran a titration of two cell lines for H3K4me3 & HepG2 ChIPs, using 10 million down to 100 cells. At 1 million cells they demonstrate a high degree of specificity but reduced sensitivity and suggest this as a minimum for “normal” experiments. This is exactly what you’d hope from an experiment; as input nucleic acid decreases you are less likely to find a particular result, but the results you do find are real! Interestingly the AHT-ChIP-seq protocol only uses 10ul of the 30ul eluted library for PCR amplification. Whilst this allows experiments to be repeated should PCR go wrong, it probably lowers sensitivity 2 or 3 fold.
An obvious problem for anyone who has done ChIP-seq is that the automation solution presented does not start until you’ve got sonicated cells or tissue. Sonication in many labs is still performed one sample at a time, although that sample may be used in multiple ChIPs. There are systems for 96-well sonication but these are generally very expensive potentially making the prep for an automated ChIP-seq run the daunting part of the experiment. In the paper the group sonicated livers and split the DNA across multiple ChIPs making things a little easier. Also as many users are running 4-8 ChIP-seq at a time they get feedback during the experiment as to the overall quality. With an automated method you might find out your sonicator was not performing to spec only after completing 2 HiSeq flowcells of ChIP-seq!
Does automated ChIP-seq work well compared to manual methods? The paper gives a resounding “yes”. Although manual ChIP-seq is modestly better quality in some cases the comparison was not direct, previous manual experiments used more tissue and antibody so the comparison is more “cox’s to russet” than “apples to oranges”, not perfect but certainly good enough.
How much does the automated ChIP-seq cost: The paper quotes a final price of £7.50 per ChIP-seq library prep, this is almost 10-fold cheaper than Illumina’s TruSeq ChIP-seq kit, hopefully they’re reading this part of the blog and start to think about dropping prices on sample-prep kits! However the group used around 30M reads per sample, which even on HiSeq high-output mode today requires 1/6th of a lane of sequencing. Using single-end 50bp sequencing this equates to about £100 in my lab. ChIP-seq for £110 is pretty good!
The automation described in the papers was completed on the Agilent Bravo NGS robot. They used a couple of other platforms for some steps, but almost certainly could have run everything on one instrument (practicalities no doubt). Don’t underestimate the money and space you might need to find to get up and running. Speaking to one of the authors they said their experience had changed their perception of robots and there were more keen on automation now. Although my personal experience (with robots) has not been great I’m sure I should be more positive to robots and the impact they can have.
Run a qPCR before pooling libraries: The paper describes the use of Kapa’s SYBR Fast Illumina Kit (link) to quantify the 96-library pool before clustering. However no qPCR was done on the individual libraries, they were simply pooled based on volume; we’d always perform qPCR before pooling to try and get the best possible balance. Instead the group used the Miseq run to rebalance the pool before HiSeq sequencing. With 96 samples on a MiSeq run the cost is only about £5 per sample so not a lot more than qPCR. I’d use MiSeq more for this rebalancing if there were a MiSeq QC app that did the calculations for us and a MiSeq kit that brought the cost down to £1 per sample. Would anyone else use a £100 MiSeq kit, PE25 and 5M reads?
Who runs PE75bp ChIP-seq: No-one else as far as I am aware. In the paper the group used paired-end 75bp sequencing, there was no explanation of why. Almost all published ChIP-seq has been short read and single-end. The only papers that have used paired-end sequencing have usually done so because they were looking in highly repetitive regions and need to anchor reads in non-repetitive regions of the genome or if the organism being studied does not have a high-quality genome for alingment. The cost of ChIP-seq almost double using PE75 instead of SE50. It is not clear if the authors trimmed the reads to compare back to earlier data, or if the decision to run PE75 was taken for any particular reason.
How easy is it to bring automation in-house: Illumina’s website now has an automation page and lists vendors who have programs for their instruments. Of course Illumina are building their own automation around ALL but whether this will be rolled out for ChIP-seq remains to be seen. I’m not a fan of robots, I prefer people. My experience is that robots take time to program and verify and that by the time a protocol is ready for use the likelihood is that it has been updated, NGS moves fast. I think we’re reaching a more stable period in the sample-prep development cycle so perhaps now is the time to put in the effort and get our Tecan doing a bit more. Unfortunately they are one of 2 vendors with nothing released for Illumina workflows! Time for me to go and write an email to their development team!