This was not a planned post but it follows on nicely from today's other one about exomes. This time I'm writing about Fluidigm's new single-cell exome-seq protocol. Yup that's right, whole exomes from 96 single cells! The C1 is an amazing piece of kit (wish I had one) and we've used it a little bit for mRNA-seq. The ability to sequence single-cell genomes and exomes means you can pretty much do whatever you want with a single-cell now. So how do the exomes look?
C1 on YouTube: Fluidigm have a video presentation from their R&D scientist Keith Szulwach who gives a walkthrough the data. They prepared exomes from the breast cancer/normal CRL-2338/2339 cell lines. These are part of the ICGC-TCGA DREAM Genomic Mutation Calling Challenge, an international effort to create standard methods for identifying cancer-induced mutations in whole-genome sequencing data. This global competition aims to find the most accurate mutation calling techniques and hopefully allow other groups to adopt standardised methods.
The exomes were sequenced to 27x coverage, and it looked like about 70% of the exome was covered (they say 90-95% but it does not look like that to me on the graph!) SNV concordance was 92%, and allelic dropout was 14%; both of these seem pretty good considering there is only one genome in the cell. I'd say it's pretty likely not to capture the whole genome in a library, and even more will be lost in the amplification and exome hybridisation.
Fluidigm claim that you can "reduce exome enrichment time by 12x", but this does not make sense to me. Our current workflow is 24 hours, but with most hybridisation capture systems being completed in 2-4 days I'd say the reduction is more like 2-4x faster.
Cell line heterogeneity: Fluidigm demonstrate the ability to detect mutations in single cells in a population, and can easily cluster tumour from normal. The data may shed light on cell line heterogeneity. In fact it opens up the question "how heterogeneous are cell lines?" I wonder if it is possible to use the 50x coverage data on the ICGC data portal for the NCI60 and CCLE cell lines to interrogate the heterogeneity of each line? We've recently been working with Horizon diagnostics who produce single-cell derived homogeneous cell lines for their genome engineered controls. They've gone to a lot of effort to get isogenic lines, but I'm not seen any published work demonstrating if the rest of us have a significant problem or not. Could we use whole genome data to look for heterogeneity in cell lines?