Friday, 29 January 2016

FlowCNV-seq: an almost novel metthod for single-cell copy number analysis

I recently presented at the Festival of Genomics on a proof-of-concept experiment we ran in my lab to generate single-cell copy number profiles. The experimental work was done in 2014 and we're now looking to port the method over to Fluidigm's C1 chips, possibly using the smallest IFCs to capture nuclei rather than cells. Although there are some wonderful systems out there for single-cell analysis most labs already have flow cytometry facilities conveniently to hand, and sorting cells in to plates is reasonably simple – even into 384 well plates. We showed that coupling flow sorting to low-volume library-prep, using Rubicon's PicoPlex kit, produced good results at reasonable cost.

Our FlowCNV-seq method: we have previously shown that copy-number is a driving event in Breast cancer (figure below, take a look at the METABRIC paper from the Caldas lab at CRUK-CI) . More recently we've been pushing a pipeline for low-coverage WGS of pre-capture exome libraries to generate high-quality CNV calls (see previous post). Ideally it would be great to look at tumour heterogeneity by analysing single-cells, hence our experiment. We flow-sorted single cells that , live, in G1, and in the population of interest (e.g. Luminal). For genome amplification we used Rubicon's PicoPlex kit with a very simple3-step workflow (A above), previously shown to outperform MDA in preserving copy-number concordance. Sequencing on HiSeq 2500 generated 1-5M single-end 50bp reads with similar read-profiles across all samples (B). Analysis was using qDNAseq. Our MCF7 results were very similar to previously published work (C), bulk DNA and normal cell line controls worked as expected (D), and clustering of our MCF7 single-cells was pretty good (not shown).

We're now looking to port this method onto the Fluidigm C1using their script builder. This should allow us to process many more cells much more quickly and cost-effectively, and allow an off-the-shelf protocol for user to play with.

Improvments in single-cell analysis: There have been huge strides in single-cell analysis, particularly around gene expression where low read-counts (less than 1M reads per cell) can give high-quality information on transcript abundance. However copy-number sequencing generally requires higher read depth per single-cell, particularly if there is an expectation of focal amplifications/deletions in the samples being studied. For copy-number analysis single-end reads are fine, paired-end reads will not add to the CNV analysis, and you’re unlikely to get far with SV analysis given the low read depth. SNS (described below) suggested using 2 million reads single-end 50, we generate about the same number, although we get better data when using 5-20M (which is our standard for tumour exomes).

There are multiple methods now published for single-cell CNV analysis: Single-nucleus sequencing (SNS) published in 2012 was one of the first methods and uses a similar workflow to the one we implemented. They authors also described their flow sorting strategy in some detail and I’d recommend getting hold of this paper if you are considering doing the same. They make the point that whilst you can carefully sort for diploid cells, if you are working on cancer samples then aneuploidy needs to be taken into consideration and your flow gating strategy carefully considered. A paper published in PNAS in 2013 described the analysis of CNV in single circulating tumour cells from lung cancer patients.

In 2014 Christoher Walsh’s group at Boston Children’s Hospital and Harvard Medical School used flow-cytometry to sort single nuclei, and compared MDA with GenomePlex (a PCR-based method). They also used a microarray QC tool: MAPD, to estimate noise, and suggest the use of the MAPD score to compare different single-cell CNV methods - something we may well look into here.

In 2015 Timour Baslan (author on the 2012 SNS paper) published a paper describing their optimisation of low-coverage WGS for single-cell CNV. In the paper they describe a modified DOP-PCR method for simple library prep, and multiplexing to generate 2 million reads per sample. They demonstrated that this was sufficient to generate robust CNV profiles at 50K bin resolution. Reducing resolution to 20k or even 5k bins still gave good results although some focal amplifications were lost, but this was with up to a theoretical limit of 500 single-cells on a HiSeq lane, which ends up with a cost per cell of just $30!

A 2015 comparison of GenomePlex, MDA and MALBAC amplification bias, amplification uniformity and reproducibility, suggested that overall MALBAC and GenomePlex had better performance than MDA for CNV detection. They generated around 30 million reads per sample. Also in 2015 the G&T-seq paper described low-coverage genome sequencing of flow-sorted single-cells amplified with either MDA or PicoPlex (the same method we used). They reported that PicoPlex outperformed MDA for CNV analysis and chose this as their method of choice . They went on to sequence to 33x coverage on X-Ten, much higher than every other single-cell CNV analysis.

The most recent paper, published in Genome Research this year, assessed the performance of single cell sequencing for CNV detection they report their analysis of the sensitivity and specificity of different approaches for megabase-scale CNV detection in single cell sequencing data. They concluded that CNVs of greater than 5Mb could be detected in single cells sequenced at just 0.1X coverage.

Low-coverage WGS for single-cell CNV is likely to become an ever more popular tool.


  1. Have you had any issues with the doublet thing on the C1?

  2. We have not see this issue ourselves, but we're planning our experiments more carefully and will be working with Fluidigm as closely as possible to resolve this in future experiments.