Yesterday Nature published what I think will become the definitive molecular classification of Breast Cancer. Curtis et al: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups or METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) as we have referred to it over the past five years; is an integrated analysis of gene expression and copy number data in almost 2500 patient samples funded by Cancer Research UK (link).
Other groups have published big studies and the mother of them all (ICGC) is proceeding very nicely (I have been involved in the Prostate and Oesophageal projects at CRUK). A recent paper on triple-negative breast cancer showed that no two tumours were the same . Even when we try to carefully subcategorise we still don’t get to a single disease. There is also an awful lot of intra-tumour heterogeneity we have to get to grips with.
The METABRIC project was, led by Carlos Caldas a senior group leader at the CRI (where I work) and Sam Aparicio in Vancouver, Canada. They looked at the copy-number and gene expression and their interaction in tumours with high-quality clinical information. They were able to look at what influenced survival or things like age at diagnosis. The real power in the study came from the large numbers used and because of this they were able to detect new sub-groups in breast cancer bringing the number from five up to at least ten. Each disease has its own molecular fingerprint that might be used to help diagnosis and treatment.
A specific aim of the project was to identify at least four groups of patients by molecular classification:
- Patients without lymph nodes metastasis at very low risk of relapse who might be spared chemotherapeutic treatment.
- Patients with ER+ lymph nodes metastasis who might only need hormone therapy.
- Patients with ER- lymph nodes metastasis who might have a better prognosis with drugs other than hormones.
- Patients with more aggressive disease who are likely to relapse and may well benefit from intensive preventative therapy and follow-up.
Breast Cancer sub-classification: We have known for over twenty years that there are at least five breast cancer subtypes. Initial observations on oestrogen and progesterone receptor status gave is the first three, ER+, PR+ or double-negative (ER/PR-). The discovery that Her2 over-expression had a major impact on a proportion of Breast cancers gave us the next two, Her-2+ or triple-negative (Er/Pr- and Her2-normal). Most people have heard of Her2 as the drug Herceptin is given to patients with the amplification (and my earliest work as a researcher after university was developing a test for Her2 amplification). Tests based on histopathology for Er, Pr and Her2 or ones like PAM50 (also used in the METABRIC paper) can classify tumours but even so some women respond better or worse than their classification might have suggested they would. Probably because our understanding of breast cancer biology is not complete; and METABRIC gives us a huge leg-up in our understanding.
The analysis team led by Christina Curtis at USC performed a sub-classification of tumours based on 2477 DNA copy-number measurements on Affymetrix SNP6 and 2136 gene expression measurements on Illumina HT12 arrays as well as the matched clinical data. This was the first map of CNVs on breast cancer and is the largest study of its kind for either CNV or GX. It was the clustering of this data that led to discovery of at least ten clinical sub-types. And these clusters contained hotspots of activity suggesting drivers of breast cancer at these locations, Her2 amplifications in the Her2+ patients for instance. They also contained genes never linked to breast cancers before, for which there are drugs available for use in other cancers. This opens the door to new treatments for some patients hopefully in the next three to five years.
|BrCa clustering figure from the paper|
|Prognosis of the ten clusters|
Incorporating the data from METABRIC into databases like the NCBI’s International Standards for Cytogenomic Arrays db that contains data on over 30,000 CNV tests, will help clinicians improve patient treatment. The ISCA database provides the first rating scale for CNVs from 0 (no evidence in disease) to 3 (evidence of clinical significance) and should improve as more data gets added.
What did my lab do for the METABRIC project: When I went to the interview for my current job about seven years ago the lead author on METABRIC spoke about his ideas and hopes for the project. It was an inspiring one for me as I was primarily a microarray expert at the time (no NGS then) and the prospect of generating so much data on a single project was exciting.
My lab collaborated on some preliminary work to decide the best way to extract DNA and RNA from the samples. We also helped on a project to determine which microarray to use for copy-number analysis and get early access to the Illumina HT12 arrays allowing high-throughout gene expression analysis.
It took almost a year to process 5-6000 sectioned tumour samples for DNA and RNA. Because of tumour heterogeneity we used a complicated SOP where each tumour was sectioned for H&E, DNA and RNA multiple times. Duplicate nucleic acid preps were processed in batches using Qiagen DNeasy and miRNAeasy kits. We also QC’d every sample on gels, bioanalyser and nanodrop. A similar amount of work went on in the other major collaborators labs in Vancouver, Canada.
The 2136 HT12 arrays were processed in my lab over just five to six weeks. We did not use robots to do any of the processing and just one person prepared the labelled cRNA for array hybridisation. This was done to minimise the technical variation in the samples. We also used a very carefully designed sample allocation to particular wells in the plates and on the final arrays. Again this allowed us to account for technical variables in the final data analysis.
The copy-number arrays were all processed by Aros in Denmark who did a fantastic job.
The discussions leading up to lab work were intensive to say the least. I’d really recommend reading the supplementary methods for details of sample collection, pathology review, nucleic acid extraction and QC, and the microarray processing. Of ocurse there was all the work that went on in the Bioinformatics lab as well, but that is a story I’ll leave someone else to tell.
PS: Please don't comment on how much better this may have been if we had used NGS! It was not even available when we started and processing 2000+ RNA-seq and SV-seq samples would have been almost impossible in a lab with just four sequencers today!
Curtis, C. et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups
Sam Aparicio's group published their research on triple-negative breast cancer last week:
Shah et al. (2012). The clonal and mutational evolution spectrum of primary triple-negative breast cancers.