AGBT had lots of presentations on both clinical and single-cell work last February and on Wednesday, February 12th Aviv Regev, of the Broad Institute described her groups experience with the Fluidigm C1 system, and their early work in understanding cell-to-cell communication in the immune system. The results were published in Nature a couple of days ago. A more exciting story for me scientifically was published by Aviv Regev and Bradley Bernstein in Science awhere they describe the use of single-cell sequencing to understand tumour heterogeneity.
The big headlines for me were that just 1M RNA-seq reads were enough to get high-quality gene expression estimates, and that single cells are great: but we’re going to need lots of them to delve deeply into biological systems. Both papers showed a massive loss of data at QC stages, around 1/3rd of cells were lost and only 30% of reads mapped to the transcriptome. Hopefuly both of these are things we can improve in the next few years to make single-cell mRNA-seq even more powerful.
|The Fluidigm C1 mRNA-seq workflow|
Regev Nature 2014: After sequencing 2785 primary mouse bone-marrow-derived dendritic cells with three stimuli (LPS, PAM, PIC) at several time-points (0,1,2,4,6h), they were able to show that rather than the expected normal distribution of gene expression in individual cells within the population, a few “precocious” cells are early responders to pathogens that coordinate an interferon-mediated paracrine signalling to the general population. Without single-cell analysis these precocious cells would have been swamped by the rest of the population making their early response undetectable.
They link the work to disease via bacterial quorum sensing (an old friend on mine worked on this for his PhD) pointing out that the threshold for activation needs to be just right if “inappropriate immune response” is to be avoided as in e.g. lupus, rheumatoid arthritis or ulcerative colitis.
The paper has been picked up by a couple of blogs: NextGenSeek describes it alongside another single cell paper, and the RNASeqBlog describes the main conclusions. It also ranks quite highly on Altmetric.
Bernstein Science 2014: This paper uses the same C1 strategy but to sequence primary glioblastomas from five patients by dissociating and flow sorting cells into populations for C1 chips (Figure A). They demonstrated the inherently variable expression of established glioblastoma subtype classifiers across single-cells which may have prognostic implications.
They were able to detect variability in expression of splice variants with the oncogenic mutant EGFRvIII in one patient sample. This showed 7% of cells with wild-type EGFR splicing, 19% with EGFRvIII and 25% with a novel oncogenic deletion variant, splicing was almost always mutually exclusive (Figure 8b).
The group also reported a very nice copy-number analysis from the RNA-seq data which showed good correlation with SNP-arrays (Figure 23). This was achieved by averaging relative expression across windows of 100 genes and calling copy-number. This works for large regions of the genome and may be a method applied to other data sets. Similar to copy-number calling from exomes there are likely to be biases in the final CNV calls, but if CNV data can be got for free many will ask for it as part of their standard analysis pipeline.
|Figure A from Patel et al Science 2014.|
|Supplemental figure 8b from Patel et al Science 2014.|
|Supplemental figure 23 from Patel et al Science 2014.|
Single-cell sequencing: today there are three main choices for people interested in sequencing single-cells, manual/automated picking (pipette or flow-sorting), laser-capture micro dissection or microfluidics (C1). Whilst it may seem that the C1 is the natural choice (it is an awesome piece of kit) I hope people are not put off trying to do single-cell analysis using other methods. Most of us have a flow cytometer/sorter and these can be set up to put single cells into microtitre plate wells. Oftentimes the most important thing is to do the experiment early, low-hanging fruit are the easiest to pick off and rather than wait for technology to be easy to use we should be encouraged by this paper to dive into single-cell analysis with whatever tools we have available. Other groups have published some lovely research using flow sorted cells, here and here.
The Fluidigm C1 uses the SMART-seq method from Rickard Sandberg’s (link to paper) lab which converts poly(A)+ RNA to full-length cDNA using oligo(dT) priming and SMART template switching (Clontech website) technology. As the work was completed before the C1 was available the group had to resort to manual picking of individual cells. SMART-seq 2: is a refined protocol using improved reverse transcription, template switching and preamplification with off-the-shelf reagents that increases cDNA yield from single cells, improves sensitivity, and reduces both technical bias and variability.
Whilst almost 3000 single-cell SMART-seq libraries were prepared for Regev Nature 2014 over 1000 were discarded after QC. They also removed 537 ‘cluster-disrupted’ dendritic cells which were described as "an artifact of isolation and culturing". In Bernstein Science 2014 over a third of cells were lost at QC (242 of 672 cells) and only 30% of Paired-End 25bp reads aligned to the transcriptome in each cell.
Replicates not reads: The Nature paper describes the use of around 4.5M reads (possibly PE25bp but maddeningly they don't say what sort of read was used). They report a comparison of high to low read depth analysis with very high correlations seen when comparing 30M reads with just 1M. The power of this study came form the large numbers of replicates used.
Even in very low read-depth analysis they reported a very low false-positive rate, that had a negligible affect on results. The main issue was that a very small number of genes can be estimated as present in a sub-sample of the cells due to erroneous mapping. During her presentation at AGBT Aviv Regev argued (very convincingly) that number of sample replicates is more important than number of reads per replicate; and that 250,000 reads allow you to infer expression at a similar quality to 30M reads!