AGBT had lots of presentations on both clinical and single-cell work last February and on Wednesday, February 12th Aviv Regev, of the Broad Institute described her groups experience with the Fluidigm C1 system, and their early work in understanding cell-to-cell communication in the immune system. The results were published in Nature a couple of days ago. A more exciting story for me scientifically was published by Aviv Regev and Bradley Bernstein in Science awhere they describe the use of single-cell sequencing to understand tumour heterogeneity.
The big headlines for me were that just 1M RNA-seq reads were enough to get high-quality gene expression estimates, and that single cells are great: but we’re going to need lots of them to delve deeply into biological systems. Both papers showed a massive loss of data at QC stages, around 1/3rd of cells were lost and only 30% of reads mapped to the transcriptome. Hopefuly both of these are things we can improve in the next few years to make single-cell mRNA-seq even more powerful.
The Fluidigm C1 mRNA-seq workflow |
Regev Nature 2014: After
sequencing 2785 primary mouse bone-marrow-derived dendritic cells with three
stimuli (LPS, PAM, PIC) at several time-points (0,1,2,4,6h), they were able to
show that rather than the expected normal distribution of gene expression in individual cells within the population, a few “precocious” cells
are early responders to pathogens that coordinate an interferon-mediated
paracrine signalling to the general population. Without single-cell analysis
these precocious cells would have been swamped by the rest of the population making
their early response undetectable.
They
link the work to disease via bacterial quorum sensing (an old friend on mine
worked on this for his PhD) pointing out that the threshold for activation
needs to be just right if “inappropriate immune response” is to be avoided as
in e.g. lupus, rheumatoid arthritis or ulcerative
colitis.
The paper has been picked up by a couple of blogs: NextGenSeek describes it alongside another single cell paper, and the RNASeqBlog describes the main conclusions. It also ranks quite highly on Altmetric.
Bernstein Science 2014: This paper uses the same C1 strategy but to sequence primary glioblastomas from five patients by dissociating and flow sorting cells into populations for C1 chips (Figure A). They demonstrated the inherently variable expression of established glioblastoma subtype
classifiers across single-cells which may have prognostic implications.
They were able to detect variability in expression of splice variants with the oncogenic mutant EGFRvIII in one patient sample. This showed 7% of cells with wild-type EGFR splicing, 19% with EGFRvIII and 25% with a novel oncogenic deletion variant, splicing was almost always mutually exclusive (Figure 8b).
The group also reported a very nice copy-number analysis from the RNA-seq data which showed good correlation with SNP-arrays (Figure 23). This was achieved by averaging relative expression across windows of 100 genes and calling copy-number. This works for large regions of the genome and may be a method applied to other data sets. Similar to copy-number calling from exomes there are likely to be biases in the final CNV calls, but if CNV data can be got for free many will ask for it as part of their standard analysis pipeline.
Figure A from Patel et al Science 2014. |
Supplemental figure 8b from Patel et al Science 2014. |
Supplemental figure 23 from Patel et al Science 2014. |
Single-cell
sequencing: today there are three main choices for people
interested in sequencing single-cells, manual/automated picking (pipette or
flow-sorting), laser-capture micro dissection or microfluidics (C1). Whilst it
may seem that the C1 is the natural choice (it is an awesome piece of kit) I
hope people are not put off trying to do single-cell analysis using other
methods. Most of us have a flow cytometer/sorter and these can be set up to put
single cells into microtitre plate wells. Oftentimes the most important thing
is to do the experiment early, low-hanging fruit are the easiest to pick off
and rather than wait for technology to be easy to use we should be encouraged
by this paper to dive into single-cell analysis with whatever tools we have
available. Other groups have published some lovely research using flow sorted cells, here and here.
The Fluidigm C1 uses the SMART-seq method from Rickard Sandberg’s (link to paper) lab
which converts poly(A)+ RNA to full-length cDNA using oligo(dT) priming and
SMART template switching (Clontech website) technology. As the work was
completed before the C1 was available the group had to resort to manual picking
of individual cells. SMART-seq 2:
is a refined protocol using improved reverse transcription, template
switching and preamplification with off-the-shelf reagents that
increases cDNA yield from single cells, improves sensitivity, and
reduces both technical bias and variability.
Whilst
almost 3000 single-cell SMART-seq libraries were prepared for Regev Nature 2014 over 1000
were discarded after QC. They also removed 537 ‘cluster-disrupted’
dendritic cells which were described as "an artifact of isolation and
culturing". In Bernstein Science 2014 over a third of cells were lost at QC (242 of 672 cells) and only 30% of Paired-End 25bp reads aligned to the transcriptome in each cell.
Replicates not reads:
The Nature paper describes the use of around 4.5M reads (possibly PE25bp but maddeningly they don't say what sort of read was used). They report a
comparison of high to low read depth analysis with very high
correlations seen when comparing 30M reads with just 1M. The power of
this study came
form the large numbers of replicates used.
Even
in very low read-depth analysis they reported a very low false-positive
rate, that had a negligible affect on results. The main issue was that a
very small number of genes can be estimated as present in a sub-sample
of the cells due to erroneous mapping. During her presentation at AGBT
Aviv Regev argued (very convincingly) that number of sample replicates is more important than number
of reads per replicate; and that 250,000 reads allow you to infer expression at
a similar quality to 30M reads!
References:
I used "maddeningly" for a reason: this is important information to have in the paper, and certainly very clearly in the supplementals. Both papers have oodles of supplementary data, but digging through this to find out how I might repeat the experiment was a little too hard for my liking.
ReplyDeleteTotally agree with you given the importance of how it will be useful to future single-cell RNA-seq studies.
ReplyDeleteIt's really an informative and well described post. I appreciate your topic for blogging. Thanks for sharing such a useful post.
ReplyDeleteGood post, but I don't think the Science paper was done with Fluidigm C1. Supp Data says "MoFlo XDP or Astrios high speed flow sorter" and "Strict singlets were selected for using pulse area/pulse width gates".
ReplyDeleteFor the read length, the Nature paper was 2x125bp (the new standard on HiSeq2500) and the Science paper was with 2x25bp (which I cannot remember having seen before, at least not since 2008). It's probably fine for expression, I am not so sure about splicing. The figure of EGFR seems convincing, the figure 1E much less...
The Science paper used flow to sort tumour cells from normal cells (presumably to increase the number of tumour cells in the final analysis) which were then introduced into the C1 for SMART-seq. If flow-sorting had not been used then more normal cells may have had to be discarded, but there is undoubtedly an expression change introduced by sorting that will be a technical artifact. This is unlikely to be a problem if all samples are processed the same way, but in a C1 experiment the sorting will be confounded with date of processing as only one C1 chip can be run per day. Designing experiments s tough!
ReplyDeleteI love your blog. This is a cool site and I wanted to post a little note to tell you, good job! Best wishes!!!
ReplyDelete