Tuesday, 30 September 2014

Blogger's spell checker tries to fix "blog"

Why does blogger's spell check try to correct 'blog', and why does it suggest 'bog' or 'blag'. Are they telling me I'm writing s**t or trying to get something for free?

Please fix this one blogger.com.

Monday, 29 September 2014

Thanks for reading

This morning someone made the 500,000th page view on the CoreGenomics blog. It amazes me that so many people are reading this and the last couple of years writing have been really good fun. I've met many readers and some fellow bloggers, and received lots of feedback in the way of comments on posts, as well as at meetings. I've even had people recognise my name because of my blogging; surreal! But the last few years have seen some big changes in how we all use social media like blogs, Twitter, etc. I don't think there is a K-index for scientific bloggers, perhaps Neil can look at that one next ;-)

Question: What do you see?

Sunday, 28 September 2014

Making BaseSpace Apps in Bangalore

I'm speaking at the BaseSpace Apps developers conference in Bangalore tomorrow. It's my first App and my first time in India, so I'm pretty excited about the whole thing.

Tuesday, 23 September 2014

Welcome to a new company built around ctDNA analysis: Inivata

Inivata, is a new company spun out of Nitzan Rosenfelds research group at the CRUK Cambridge Institute (where I work). His group developed and published the TAm-seq method for circulating tumour DNA amplicon sequencing. The spin-out aims to develop blood tests measuring circulating tumour DNA (ctDNA) for use as a "liquid biopsy" in cancer treatment. Inivata has been funded  by Cancer Research Uk's technology arm CRT, Imperial Innovation, Cambridge Innovation Capital and Johnson & Johnson Development Corporation; initial funding has raised £4million.


Inivata is currently based in the Cambridge Institute and the start-up team include the developers of the TAm-seq method: Nitzan Rosenfeld (CRUK-CI), Tim Forshew (now at UCL Cancer Institute), James Brenton (CRUK-CI) and Davina Gale (CRUK-CI).

The research community has really taken hold of cell-free DNA and developed methods that are surpassing expectations. Cell-free DNA is having its largest impact outside of cancer in the pre-natal diagnostics market. And has been shown to be useful in many types of cancer. The use of ctDNA to follow tumour evolution was one of the best examples of what's possible I've seen so far and it's been exciting to be involved in some of this work. Inivata are poised to capitalise on the experience of the founding team and I'll certainly be following how they get on over the next couple of years.

If you fancy working in this field then they are currently hiring: molecular biologist, and computational biologist posts.

This is likely to become a crowded market as people pick up the tools available and deploy them in different settings. ctDNA is floating around in blood plasma and is ripe for analysis, I expect there is still lots of development space for new methods and ultimately I hope we'll be able to use ctDNA as a screening tool for early detection of cancer.

If we can enrich for mutant alleles using technologies like Boreal or Ice-Cold PCR then detection (not quantitation) may be possible far earlier than can be achieved today.

Monday, 15 September 2014

Are PCR-free exomes the answer

I'm continuing my exome posts with a quick observation. There have been several talks recently that I've seen where people present genome and exome data and highlight the drop-out of genomic regions primarily due to PCR amplification and hybridisation artefacts. They make a compelling case for avoiding PCR when possible, and for sequencing a genome to get the very best quality exome.

A flaw with this is that we often want to sequence an exome not simply to reduce the costs of sequencing, but more importantly to increase the coverage to a level that would not be economical for a genome, even on an X Ten! For studies of heterogeneous cancer we may want to sequence the exome to 100x or even 1000x coverage to look for rare mutant alleles. Unfortunately this is exactly the kind of analysis that might be messed up by those same PCR artefact's, namely PCR duplication (introducing allele bias) and base misincorporation (introducing artifactual variants).

PCR free exomes: In my lab we are running Illumina's rapid exomes so PCR is a requirement to complete the Nextera library prep. But if we were to use another method then in theory PCR-free exomes would be possible. Even if we stick to Nextera (or Agilent QXT) then we could aim for very low-cycle PCR libraries. The amount of exome library we are getting is huge, often 100's of nanomoles, when we only need picomoles for sequencing.

Something we might try testing is a PCR-free or PCR-lite (pardon the American spelling) exome to see if we can reduce exome artefacts and improve variant calling. If anyone else is doing this please let me know how you are getting along and how far we can push this.

Thursday, 4 September 2014

The newest sequencer on the block: Base4 goes public

I've heard lots of presentations about novel sequencing technologies, many have never arrived, some have come and gone, all have been pretty neat ideas; but so far not one has arrived that outperforms the Illumina systems many readers of this blog are using.

Base4's pyrophosphorolysis sequencing technology

The latest newcomer is Base4's single-molecule microdroplet sequencing technology. The picture above explains the process very well: a single molecule of double-stranded DNA is immobilised in the sequencer, single bases are cleaved at a defined rate from the 3' end by pyrophosphorolysis (the new Pyrosequencing perhaps?), as each nucleotide is cleaved it is captured into a microdroplet where it initiates a cascade reaction that generates a fluorescent signal unique to each base, as microdroplets are created at a faster rate than DNA is cleaved at the 3' end the system generates a series of droplets that can be read out by the sequencer (a little like the fluorescent products being read of a capillary electrophoresis instrument).

Base4 are talking big about what their technology can deliver. They say it will be capable of sequencing 1M bases per second with low systematic error rates. The single-molecules mean no amplification and read-lengths should be long. Parallelisation of the technology should allow multiple single-molecules to be sequenced at the same time. How much and when will have to wait a little longer.

I've been speaking to Base4 over the past few years after meeting their founder Cameron Frayling in a pub in Cambridge. Over the past two years Base4 has been developing their technology and recently achieved a significant milestone by demonstrating robust base-calling of single nucleotides in microdroplets. They are still small, with just 25 employees and are based outside Cambridge. I hope they'll be growing as we start to get our hands on the technology and see what it's capable of.

Low-diversity sequencing: RRBS made easy

Illumina recently released a new version of HCS v2.2.38 for the HiSeq. The update improves cluster definition significantly and enables low-diversity sequencing. It’s a great update and one that’s making a big impact on a couple of projects here.

Thursday, 28 August 2014

SEQC kills microarrays: not quite

I've been working with microarrays since 2000 and ever since RNA-seq came on the scene the writing has been on the wall. RNA-seq has so many advantages over arrays that we've been recommending them as the best way to generate differential gene expression data for a number of years. However the cost, and lack of maturity in analysis meant we still ran over 1000 arrays in 2013, but it looks like 2014 might be the end of the line. RIP: microarrays.

Thursday, 21 August 2014

FFPE: the bane of next-generation sequencing? Maybe not for long...

FFPE makes DNA extraction difficult; DNA yields are generally low, quality can be affected by fixation artefacts and the number of amplifiable copies of DNA are reduced by strand-breaks and other DNA damage. Add on top of this almost no standardisation in the protocols used for fixation and widley different ages of samples and it's not suprising FFPE causes a headache for people that want to sequence genomes and exomes. In this post I'm going to look at alternative fixatives to formalin, QC methods for FFPE samples to assess their suitability in NGS methods, some recent papers and new methods to fix FFPE damage.
Why do we use formalin-fixation: The ideal material to work with for molecular studies is fresh-frozen (FFZN) tumour tissue, as nucleic acids are of high-quality. But many cancer samples are fixed in formalin for pathological analysis and stored as Formalin-Fixed Parrafin-Embeded (FFPE) blocks, preserving tissue morphology but damaging nucleic acids. The most common artifacts are, C>T base substitutions caused by deamination of cytosine bases converting them to uracil and generating thymines during PCR amplification, and strand-breaks. Both of these reduce the amount of correctly amplifiable template DNA in a sample and this must be considered when designing NGS experiments.
Molecular fixatives: Our Histopathology core recently published a paper in Methods: Tissue fixation and the effect of molecular fixatives on downstream staining procedures. In this they demonstrated that overall, molecular fixatives preserved tissue morphology of tissue as well as formaldehyde for most histological purposes. They presented a table, listing the molecular-friendly fixatives and reporting the average read-lengths achievable from DNA & RNA (median read-lengths 725 & 655 respectively). All the fixatives reviewed have been shown to preserve nucleic acid quality, by assessment of qPCR Ct values or through RNA analysis (RIN, rRNA ratio, etc). But no-one has performed a comparison of these at the genome level, and the costs of sequencing probably keep these kind of basic tests beyond the limits of most individual labs.

The paper also presents a tissue-microarray of differently fixed samples, which is a unique resource that allowed them to investigate the effects of molecular fixatives on histopathology. All methods preserved morphology, but there was a wide variation in the results from staining. This highlights the importance of performing rigourous comparisons, even for the most basic procedures in a paper (sorry to any histpathologists reading this, but I am writing from an NGS perspective).

The first paper describing molecular a fixative (UMFIX) appeared back in 2003, in it the authors describe the comparison of FFZN to UMFIX tissue for DNA and RNA extraction, with no significant differences between UMFIX and FFZN tissues on PCR, RT-PCR, qPCR, or expression microarrays. Figure B from their paper shows how similar RNA bioanalyser profiles were from UMFIX and FFZN.

UMFIX (top) and FFZN (bottom)


Recent FFPE papers: A very recent and really well written paper in May 2014 by Hedegaard et al compared FFPE and FFZN tissues to evaluate their use in exome and RNA-seq. They used two extraction methods for DNA and three for RNA with different effects on quality and quantity.  Only 30% of exome libraries worked, but with 70% concordance (FFZN:FFPE). They made RNA-seq libraries from 20 year old samples with 90% concordance, and found a set of 1500 genes that appear to be due to fixation. Their results certainly make NGS analysis of FFPE samples seem to be much more possible than previous work. Interestingly they made almost no changes to the TruSeq exome protocol, so some fiddling with library prep, perhaps adding more DNA to reduce the impact of strand-breaks for instance would help a lot (or fixing FFPE damage - see below). The RNA-seq libraries were made using RiboZero and ScriptSeq. Figure 2 from their paper shows the exome variants with percentages of common (grey), FFZN-only (white) and FFP-only (red), there are clear sample issues due to age (11, 7, 3 & 2 years storage) but the overall results were good.

Other recent papers looking at FFPE include: Ma et al (Apr 2014): they developed a bioinformatics method fo gene fusion detection in FFPE RNA-seq. Li et al (Jan 2014): they investigated the effect of molecular fixatives on routine histpathology and molecular analysis. They achieved high-quality array results with as little as 50ng of RNA. Norton et al (Nov 2012): they manually degraded RNA in 9 pairs of matched FFZN/FFPE samples, and ran both Nanostring and RNA-seq. Both gave reliable gene expression results from degraded material. Sinicropi et al (Jul 2012): they developed and optimised RNA-seq library prep and informatics protocols. And most recently Cabanski et al published what looks like the first RNA-access paper (not open access and unavailable to me). RNA-access is Illumina's new kit for FFPE that combines RNA-seq prep from any RNA (degraded or not) with exome capture (we're about to test this, once we get samples).

QC of FFPE samples: It is relatively simple to extract nucleic acids from FFPE tissue and get quantification values to see how much DNA or RNA there is, but tolerating a high failure rate, due to low-quality, in subsequent library prep is likely to be too much of a headache for most labs. Fortunately several groups have been developing QC methods for FFPE nucleic acids. Here I'll focus mostly on those for DNA.

Van beers et al published an excellent paper in 2006 on a multiplex PCR QC for FFPE DNA. This was developed for CGH arrays and produces 100, 200, 300 and 400bp fragments from nonoverlapping target sites in the GAPDH gene from the template FFPE DNA. Figure 2 from their paper (reproduced below) demonstrate a good (top) and a bad (bottom) FFPE samples results.

Whilst the above method is very robust and generally predictive of how well an FFPE sample will work in downstream molecular applications, it is not high-throughput. Other methods generally use qPCR as the analytical method as it is quick and can be run in very high-throughput. Illumina sell an FFPE QC kit which uses comparison of a control template to test sampeples and a deltaCq method to determine if samples are suitable for arraya or NGS. LifeTech also sell a similar kit but for RNA, Arcturus sample QC, using two β-actin probes and assessing quality via their 3'/5' ratio.Perhaps the ideal approach would be a set of exonic probes multiplexed as 2, 3, or 4-colour TaqMan assays. This could be used on DNA and RNA and would bring the benefits of the Van beer and LifeTech methods to all sample types.

Fixing FFPE damage: Another option is to fix the damage caused by fomalin fixation. This is attractive as there are literally millions of FFPE blocks, and many have long-term follow up data. A paper in Oncotarget in 2012 reported the impact of using uracil-DNA glycosylase (UDG) to reduce C>T caused by cytosine deamination to uracil. They also showed that this can be incoporated into current methods as a step prior to PCR, something which we've been doing for qPCR for many years. There are not strong reasons to incorporate this as a step in any NGS workflow as there is little impact on high-quality templates.

NEB offer a cocktail of ezymes in their PreCR kit, which repairs damaged DNA templates. It is designed to work on: modified bases, nicks and gaps, and blocked 3' ends. They had a poster at AGBT demonstrating the utility of the method, showing increased library yields and success rates with no increase in bias in seqeuncing data.

Illumina also have an FFPE restoration kit; restoration is achieved through treatment with DNA polymerase, DNA repair enzyme, ligase, and modified Infinium WGA reaction, see here for more details.

These cocktails can almost certainly be added to: MUTYH works to fix 8-oxo-G damage, CEL1 is used in TILLING analysis to create strand-breaks in mismatched templates and could be included, lots of other DNA repair enzymes could be added to a mix to remove nearly all compromised bases. It may be possible to go a step further and fix compromised bases rather than just neutralise their effect.

Whatever the case it looks very much like FFPE samples are going to be processed in a more routine manner very soon.

Monday, 18 August 2014

$1000 genomes = 1000x coverage for just £20,000

It strikes me that if you can now sequence a genome for $1000, then you could buy 1000x coverage for not much more than a 30x genome cost a couple of years ago! Using a PCR-free approach I can imagine that this would be the most sensitive tool to determine tumour, or population, heterogeneity. I’m sure that sampling statistics might limit the ability to detect low-prevalence alleles but I’m amazed by the possibility none-the-less.
  • 1 X-Ten run costs $1,000
  • 1000x requires 33 X-Ten runs (30x each)
  • $33,000 = £20625
If you’re running a ridiculously high Human genome project on X-Ten do let me know!