Tuesday, 24 November 2015

Communicating with our users: status updates via Twitter

Me and therest of the Genomics Core team (past and present) put quite a bit of effort into communicating with our users. This ranges from simple communications about when a project will be complete, or a sample might be run; to more complex communications like experimental design discussions. The group website had a major update two years ago and we started a group blog, we have an email HelpDesk, and have just added a new Twitter account specifically to inform users how long our Illumina sequencing queue is: @CRUKgenomecore puts out live Tweets directly from our Genologics Clarity LIMS.

Friday, 20 November 2015

Ten X updated...rare disease anlaysis made easier?

We're about to start a pilot work project with 10X Genomics so I put together an update to share internally and thought it would make a good post here too...I've been thinking about how 10X might be used in cancer and rare disease, as well as non-human genomes. The NIST results described below are particularly interesting as the same sample has been run on all of the major long-read technologies: 10X Genomics, Moleculo, CGI LFR, PacBio, BioNano Genomics and Oxford Nanopore; as well as both Illumina and SOLiD mate-pair. Enjoy.

Monday, 16 November 2015

Comparing reference standards for NGS cancer genomics

Cancer genomics is rapidly moving away from pure discovery to a much more translational perspective. An important area to consider, as the tools being used and developed in research labs are deployed to a more clinical setting, is an understanding of the reproducibility, sensitivity and specificity of these tools. A big help in this is the availability of high quality reference materials, which are likely to form an important part of any clinical lab QA program. Several companies sell reference standards but the most interesting in the context of this post are those from Horizon Discovery, AcroMetrix and Seracare; each of which is briefly reviewed below.

Horizon Diagnostics approach is one of several

Wednesday, 4 November 2015

How many liquid biopsies per year by 2020: in the UK it might be well over 1 million

I've been doing some work looking at CRUKs cancer incidence statistics and this is what sparked the idea for this post. There are just under 350,000 new cancer cases in the UK each year. Although ctDNA can be found in many/most tumour contexts (Bettegowda et al 2014), it may not be used universally. I'll assume that in all common cancers ctDNA sequencing, a "liquid biopsy"(not sure how Cynvenio got there first with the trademark - smart move), will be used as a first-line and/or follow up test run at least once per year per patient on average - by 2020. Assuming that patients will live for 10-20 years with their disease this works out at 1.75 million to 3.5 million tests per year. I'm sure people are doing much better modelling than me but by any measure this is a lot of sequencing tests to run!

Sequencing can seem difficult, however to be able to sequence all those tests someone's got to make 1,750,000+ libraries! An automation platform capable of processing 96 samples in 24 hours would need to run for fifty years! Or fifty labs would be required just to keep up with demand. Even at just £25 per sample for an amplicon based test (no-one is offering a test at this price today) the costs would be almost £50M-100M. This assumes that the cost of collecting samples is zero, and I'll totally ignore Bioinformatics which I think will disappear for simple analysis by 2020*.

Right now there are no library prep methods that truly allow you to go from sample to sequence in one day. Innovation here is going to be vital, and I suspect will become more and more the focus for investment. The company that can come up with a fast and robust method, and sell it to someone like the NHS in large volume is going to come out on top. Might we even get to a similar situation as with forensics where only a couple of tests are internationally recognised, making the sharing of data much easier?

NIPT is being rapidly adopted partly because the tests have been rapidly defined - the coverage required for a specified sensitivity/specificity is known. Somatic sequencing is tougher due to lack of clarity of the sensitivity required to have a clinical impact. Clinical trials are happening now but it might remain an open question for a while as to whether you should swap to a different drug, e.g. EGFR inhibitor, or combination, when e.g. T790M gets to 1%, 10% or 50% mutant allele frequency.

Worldwide there are over 14 million new cancers per year, if the logic above translates then the number of tests climbs fast - maybe 150M liquid biopsies per year.

*Whilst the bioinformatics challenges are huge right now I do believe that we'll have well developed pipelines that can call mutations and generate clinical reports with minimal effort for "simple" cancer panels by 2020.

Monday, 2 November 2015

1000 citations and counting

I updated my Google Scholar account today and saw that a paper I was a co-author on has had over 1000 citations: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups! This now puts it up near the top of Natures paper mountain, although still behind Watson and Crick's 1953 paper. The main reason I am writing about this here is not self-promotion (there were 33 co-authors, and I was part of the METABRIC group - itself a collection of around 80 other co-authors), but rather a chance to discuss how we might approach this with RNA-seq as opposed to the Illumina microarrays we used.

METABRIC: The Nature 2013 METABRIC paper (Molecular Taxonomy of Breast Cancer International Consortium) described the analysis of over 2000 breast cancers using copy-number profiling (on Affymetrix SNP6.0 arrays), and gene expression (on Illumina HT12 arrays). It highlighted the molecular complexity of breast cancer in a way not previously revealed and stratified cases into one of ten distinct molecular clusters (IntClusts), that had different outcomes. Read the paper, or the CRUK blog for more info.

Microarrays not RNA-seq: The METABRIC paper was published in 2013, but the ground work began in 2006, in fact this project was a deciding factor in my taking on the role of Genomics core facility head at CRUKs Cambridge Institute. Back in 2006 only a very small number of people were working with NGS and no-one for RNA-seq (the first RNA-seq paper was not published until 2008 Nagalakshmi et al: Science). Microarray gene expression analysis was the state-of-the-art and we had invested in Illumina's beadarray technology rather than the more common Affymetrix chips).

We planned the microarray processing incredibly carefully, it took months to decide on how samples would be randomised across plates, how extractions would be performed, etc. In the end we processed around 2000 arrays in the first round. To increase the quality and comparability of data across the project the whole lot were processed in one six-week batch (we did no other array work during this time), and all the samples were prepped as labelled cDNA by one person*. An additional 900 arrays were completed as follow up, some repeats, some new samples.

Illumina's HT12 array really made the microarray processing much simpler. 12 samples were hybridised on one array with a full 96-well plate being run each day. All arrays were run as a batch and scanned overnight. I'm sure we could have got away with an HT24 format, and if this had been available then the costs might still be competitive with RNA-seq.

Doing it all again with RNA-seq: We are just completing a 600 sample RNA-seq project using our Agilent Bravo and Illumina TruSeq stranded mRNA-seq kits. The processing is pretty straightforward and we can go from RNA to a normalised pool of 96 samples (actually 88 but don't ask why) for sequencing in about a week. Our normal project size has increased each year, and although 600 is still large, we usually process a batch of RNA-seq samples every week or two.

If we were to run METABRIC again I'd like to increase the amount of automation and the number of indexes to 384 or even 1536. With a kit cost of about £30 ($50) per sample, and 20M single-end 50bp reads per sample costing about £50 ($80), the whole project could be completed pretty quickly and for just £160,000 ($250k).

But what really amazes me is that we can sequence 2000 samples for high-quality and sensitive mRNA gene expression in a little over three weeks on our HiSeq 2500, probably even quicker on the newer HiSeq 4000. We'd also be able to go back to specific samples, or sub-groups, and sequence deeper to look into alternative splicing.
RNA-seq continues to develop with newer kits, better analysis of non-coding transcripts and development of methods to call mutations and isoforms. Recent advances like the SEQUEL from PacBio offer the chance to directly count isoforms rather than infer them from short-reads. And who knows how soon RNA-seq on the MinION will be a reality for the masses?

Microarrays not CNV-seq: We processed all the samples for METARBIC copy-number analysis through Aros in Denmark using Affymetrix SNP6.0 arrays. But we have been using low-coverage whole genome sequencing for a number of years now as part of our exome capture pipeline. Pre-cap libraries are pooled for very low coverage sequencing (about 20M reads) and analysed with QDNA-seq (Scheinin et al: Genome Research). We spent probably over £1M on Affy SNP6.0 arrays versus about £100,000 for sequencing . The sequencing workflow is similar to RNA-seq and we'd be able to zoom into specific samples, or sub-groups, to get a better handle on those that are driven by copy-number changes. We'd also be able to detect other structural variants.
Big projects can present many challenges and I am sure the authors that prepared the METABRIC paper had to defend against the use of arrays just as RNA-seq was starting to gain traction. It can be difficult to decide which technology to use, I am certain that we chose the right things for METABRIC, but that choice was one made after lots of discussion. If you're planning a big project, or even a very small one, take the time to think about the methods your using, their biases, and yours, before starting the lab work.

*Many thanks should go to Michelle who did all the sample prep. She was pregnant at the time and left shortly afterwards to have her baby, and for a well earned rest! Michelle left the lab in 2014 and now works at Iniviata.

Tuesday, 20 October 2015

Why do you read science blogs: a research project funded by Experiment

This is not wholly a direct request for feedback on the Core Genomics blog! Dr. Paige Jarreau from LSU is surveying science blog readers about their social media habits and perceptions of the science blogs they read. Her project is funded partly through Experiment: the kick-starter of the experimental world. Her project brought this platform to my attention and it looks pretty cool (see the latter part of this post).


Tuesday, 6 October 2015

X Ten: now available for non-Human

Finally. Today Illumina announced that X Ten users can perform whole-genome sequencing of non-human species. Does this mean exomes and RNA-seq are on their way to X Ten? Or that an X One is going to be announced at JP Morgan? I might just hold off on buying that shiny new HiSeq 4000 until this is a bit clearer.

For small to medium labs the X Ten and even X Five were the stuff of dreams and nightmares. Dreams of what would be possible with this technology; nightmares of what might happen to smaller labs.

The dreams turned out to be true and the nightmares have pretty much gone away. But the rapid developments from Illumina continue to be difficult to keep up with. I only wrote about the performance of HiSeq 2500 V4 chemistry a little over a year ago, and it looks like we'll ditch it in favour of HiSeq 4000 in 2016.

The link on Illumina's webpage does not appear to be live just yet but I'm sure we'll hear more in the next couple of days.

PS: If your Wheat, Horse, Whale, etc, etc, etc genome sequencing grant just got approved you can look forward to a nice slush fund. Time to buy that Apple watch maybe?

Friday, 2 October 2015

Pub-Bed: beds, not papers

Would you stay at the home of another academic you had some loose connection with? Could the Airbnb model be successfully applied to help find accommodation for scientists travelling to meetings, visiting another lab, or even for longer sabbatical stays? I'm not sure but Pub-Bed was born from an idea I cooked up on the train.

Thursday, 1 October 2015

The new Pacific Biosciences sequencer

PacBio announced a baby RSII yesterday, which should be in the shops just in time for Christmas! The Sequel System (sounds like SequalPrep from Thermo for PCR cleanup) sounds like a big advance on the enormous RSII. Most aspects of the sequencing work flow are unchanged. Sequel has been developed as part of the collaboration with Roche to develop a diagnostics instrument, milestone payments of $20-40M are expected on the back of this.
  • $350,000 for Sequel (versus $1,000,000 for RSII)
  • Seven times more reads than RSII
  • 1/3rd the size and weight (so only 2000 or so MinIONs will fit inside)
  • New SMRT cells with 1,000,000 ZMWs compared to RSII's 150,000

Tuesday, 29 September 2015

Do you have a HiSeq 2500 V4 to sell

My lab is looking to put an additional HiSeq 2500 v4 in place, Illumina do not have any (refurbs) to sell in the US or Europe so I thought I'd post here to see if any Core Genomics readers have an instrument they are looking to retire? If we can offer you more than Illumina's trade-in on a HiSeq 4000 then we might be able to come to some arrangement!

Drop me an email at James.Hadfield@cruk.cam.ac.uk if you might be able to help.