Tuesday, 1 July 2014

Single cell extravaganza

AGBT had lots of presentations on both clinical and single-cell work last February and on Wednesday, February 12th Aviv Regev, of the Broad Institute described her groups experience with the Fluidigm C1 system, and their early work in understanding cell-to-cell communication in the immune system. The results were published in Nature a couple of days ago. A more exciting story for me scientifically was published by Aviv Regev and Bradley Bernstein in Science awhere they describe the use of single-cell sequencing to understand tumour heterogeneity.

The big headlines for me were that just 1M RNA-seq reads were enough to get high-quality gene expression estimates, and that single cells are great: but we’re going to need lots of them to delve deeply into biological systems. Both papers showed a massive loss of data at QC stages, around 1/3rd of cells were lost and only 30% of reads mapped to the transcriptome. Hopefuly both of these are things we can improve in the next few years to make single-cell mRNA-seq even more powerful.



http://www.fluidigm.com/c1system.html
The Fluidigm C1 mRNA-seq workflow


Regev Nature 2014: After sequencing 2785 primary mouse bone-marrow-derived dendritic cells with three stimuli (LPS, PAM, PIC) at several time-points (0,1,2,4,6h), they were able to show that rather than the expected normal distribution of gene expression in individual cells within the population, a few “precocious” cells are early responders to pathogens that coordinate an interferon-mediated paracrine signalling to the general population. Without single-cell analysis these precocious cells would have been swamped by the rest of the population making their early response undetectable.



They link the work to disease via bacterial quorum sensing (an old friend on mine worked on this for his PhD) pointing out that the threshold for activation needs to be just right if “inappropriate immune response” is to be avoided as in e.g. lupus, rheumatoid arthritis or ulcerative colitis.

The paper has been picked up by a couple of blogs: NextGenSeek describes it alongside another single cell paper, and the RNASeqBlog describes the main conclusions. It also ranks quite highly on Altmetric.



Bernstein Science 2014: This paper uses the same C1 strategy but to sequence primary glioblastomas from five patients by dissociating and flow sorting cells into populations for C1 chips (Figure A). They demonstrated the inherently variable expression of established glioblastoma subtype classifiers across single-cells which may have prognostic implications.

They were able to detect variability in expression of splice variants with the oncogenic mutant EGFRvIII in one patient sample. This showed 7% of cells with wild-type EGFR splicing, 19% with EGFRvIII and 25% with a novel oncogenic deletion variant, splicing was almost always mutually exclusive (Figure 8b).

The group also reported a very nice copy-number analysis from the RNA-seq data which showed good correlation with SNP-arrays (Figure 23). This was achieved by averaging relative expression across windows of 100 genes and calling copy-number. This works for large regions of the genome and may be a method applied to other data sets. Similar to copy-number calling from exomes there are likely to be biases in the final CNV calls, but if CNV data can be got for free many will ask for it as part of their standard analysis pipeline.

Figure A from Patel et al Science 2014.
Supplemental figure 8b from Patel et al Science 2014.
Supplemental figure 23 from Patel et al Science 2014.

Single-cell sequencing: today there are three main choices for people interested in sequencing single-cells, manual/automated picking (pipette or flow-sorting), laser-capture micro dissection or microfluidics (C1). Whilst it may seem that the C1 is the natural choice (it is an awesome piece of kit) I hope people are not put off trying to do single-cell analysis using other methods. Most of us have a flow cytometer/sorter and these can be set up to put single cells into microtitre plate wells. Oftentimes the most important thing is to do the experiment early, low-hanging fruit are the easiest to pick off and rather than wait for technology to be easy to use we should be encouraged by this paper to dive into single-cell analysis with whatever tools we have available. Other groups have published some lovely research using flow sorted cells, here and here.
The Fluidigm C1 uses the SMART-seq method from Rickard Sandberg’s (link to paper) lab which converts poly(A)+ RNA to full-length cDNA using oligo(dT) priming and SMART template switching (Clontech website) technology. As the work was completed before the C1 was available the group had to resort to manual picking of individual cells. SMART-seq 2: is a refined protocol using improved reverse transcription, template switching and preamplification with off-the-shelf reagents that increases cDNA yield from single cells, improves sensitivity, and reduces both technical bias and variability.
Whilst almost 3000 single-cell SMART-seq libraries were prepared for Regev Nature 2014 over 1000 were discarded after QC. They also removed 537 ‘cluster-disrupted’ dendritic cells which were described as "an artifact of isolation and culturing". In Bernstein Science 2014 over a third of cells were lost at QC (242 of 672 cells) and only 30% of Paired-End 25bp reads aligned to the transcriptome in each cell.
Replicates not reads: The Nature paper describes the use of around 4.5M reads (possibly PE25bp but  maddeningly they don't say what sort of read was used). They report a comparison of high to low read depth analysis with very high correlations seen when comparing 30M reads with just 1M. The power of this study came form the large numbers of replicates used.

Even in very low read-depth analysis they reported a very low false-positive rate, that had a negligible affect on results. The main issue was that a very small number of genes can be estimated as present in a sub-sample of the cells due to erroneous mapping. During her presentation at AGBT Aviv Regev argued (very convincingly) that number of sample replicates is more important than number of reads per replicate; and that 250,000 reads allow you to infer expression at a similar quality to 30M reads! 

References:

7 comments:

  1. Really nice post. It was very interesting from the perspective of the value of replicates over read depth in single-cell RNA-seq.

    Just a note on your comment "possibly PE25bp but maddeningly they don't say what sort of read was used".

    It looks like, the results on "effect of shallow depth on expression estimates" (Nature paper ext. Figure 2) were based on data from an earlier study from the group. http://www.nature.com/nature/journal/v498/n7453/full/nature12172.html
    The previous study data were 101 bp PE. Hopefully the results are from the 101bp PE data. (the Nature paper has 125bp PE and the Science paper has 25bp PE data :-))

    ReplyDelete
  2. I used "maddeningly" for a reason: this is important information to have in the paper, and certainly very clearly in the supplementals. Both papers have oodles of supplementary data, but digging through this to find out how I might repeat the experiment was a little too hard for my liking.

    ReplyDelete
  3. Totally agree with you given the importance of how it will be useful to future single-cell RNA-seq studies.

    ReplyDelete
  4. It's really an informative and well described post. I appreciate your topic for blogging. Thanks for sharing such a useful post.

    ReplyDelete
  5. Good post, but I don't think the Science paper was done with Fluidigm C1. Supp Data says "MoFlo XDP or Astrios high speed flow sorter" and "Strict singlets were selected for using pulse area/pulse width gates".

    For the read length, the Nature paper was 2x125bp (the new standard on HiSeq2500) and the Science paper was with 2x25bp (which I cannot remember having seen before, at least not since 2008). It's probably fine for expression, I am not so sure about splicing. The figure of EGFR seems convincing, the figure 1E much less...

    ReplyDelete
  6. The Science paper used flow to sort tumour cells from normal cells (presumably to increase the number of tumour cells in the final analysis) which were then introduced into the C1 for SMART-seq. If flow-sorting had not been used then more normal cells may have had to be discarded, but there is undoubtedly an expression change introduced by sorting that will be a technical artifact. This is unlikely to be a problem if all samples are processed the same way, but in a C1 experiment the sorting will be confounded with date of processing as only one C1 chip can be run per day. Designing experiments s tough!

    ReplyDelete
  7. I love your blog. This is a cool site and I wanted to post a little note to tell you, good job! Best wishes!!!

    ReplyDelete