CoreGenomics: October 2012

Monday, 29 October 2012

Millionaire's shortbread

We have a tradition in the lab that you bring cakes in on your birthday. Home-made is always encouraged and one of my favourite party bakes is this Millionaire's shortbread. Enjoy.

Mmmm, sweeet!!!

Make shortbread:

200g/7oz butter
115g/4oz caster sugar
285g/10oz plain flour
Cream the butter and sugar, beat till fluffy, add flour to make a dough. Spread out in an even layer in a 22x30cm/9x12inch tin. Bake in a pre-heated oven at 180C/350F/gas mark 4 for 20 minutes or until golden-brown. Allow to cool to room temperature.

Make caramel filling:

115g/4oz butter
115g/4oz caster sugar
400g/14oz can of condensed milk
Put all the ingredients into a saucepan and gently melt the butter and dissolve the sugar. Once dissolved turn up the heat and bring to the boil for five minutes, stirring constantly. Do not touch this as it most closely resembles sweet napalm! Cool for one minute in a sink filled with cold water. Pour over the cool shortbread and allow to coll and set at room temperature.

Make the topping:

115g/4oz plain chocolate
Melt the chocolate and pour over the cooled and set caramel. Mark into fingers but do not cut until the chocolate is cooled and set.

Cut into fingers or squares and serve.

Recipie from Leiths baking bible, buy it and never look back!

Wednesday, 17 October 2012

Foundation Medicine's cancer genomics test

In a few years hopefully every cancer patient in the UK will be screened for the most common somatic mutations. CRUK's Stratified Medicine Project has already tested over 5000 patients as part of a program to roll-out more uniform testing in the NHS.

There has been an explosion of interest in genomic medicine driven along by the release of next-generation sequencing instruments like MiSeq and PGM, as well as the development of methods to assay small numbers of loci at very low cost and fast turnaround. Many academic centre's are working on NGS tests using amplicons or capture of gene panels or even exomes.

We have been working with the Fluidigm AccessArray system for a number of years and using this with a MiSeq it is possible to sequence, in duplicate, 48 patients for 48 loci of 150-250bp at over 1000x coverage for just £20 each. This is a very low cost compared to other medical tests and the potential of somatic screening is so great that I think it has to be made available to as many patients as possible.

Of course sequencing more can often be better and with technologies like Nextera capture, Ampliseq, Haloplex, etc an awful lot can be sequenced nowadays pretty easily.One company that sems to be putting all the right pieces together is Foundation Medicine...

Foundation Medicine: is a relatively new company making big strides in cancer genomic testing. They are trying to take recent advances in our understanding of cancer genomics to inform patient treatment, and aim to " improve day-to-day care for patients by serving the needs of clinicians, academic researchers and drug developers to help advance the science of molecular medicine in cancer". Find out more on their website www.foundationmedicine.com.

About six months ago they released the Foundation One caner genome profiling test. This targets all known cancer drivers as well as genes somatically altered in Human cancer that have known therapeutic impact. The test uses and NGS capture-based assay and reports on genomic alterations including substitutions, insertions, deletions, copy number changes and some rearrangements. It can be run on any solid tumor with an input requirement of just 50ng FFPE DNA, and is run in their own CLIA labs.

The Foundation One test: Foundation Medicine are using in-solution capture of NGS libraries. They prepare sequencing libraries from very low amounts of starting material, and it is not clear to me which technology they are using (TruSeq, Rubicon, Nextera, or something else). Libraries are captured using Agilent SureSelect and sequenced on HiSeq.
The poster: there is a very nice poster from 2012 ASCOmeeting available on their website. This presents the results from the first 304 commercial cases run on the assay, which sequences the all exons of 182 cancer genes (over 3000 exons) plus introns from 14 commonly rearranged cancer genes to >500x coverage.

The poster has a nice graph showing how penetrant each gene is in cancer. It reads like the usual Top 10 list with, TP53, KRAS and PIK3CA in the top 3.

Commonly altered gens in Cancer identified by FoundationOne

A recent paper in European Urology presented the Foundation One test results from 45 Prostate cancer patients using 50ng of FFPE DNA used in hybrid capture and sequencing of targeted loci to over 900x coverage. They found mutations and alterations in AR, TMPRSS2:ERG fusions; and loss of or mutation in PTEN, TP53, RB, MYC, and PIK3CA. They also found alterations in the key DNA repair genes BRCA2 and ATM which they suggested could be targets for PARP inhibitors, and discovered an actionable rearrangement involving BRAF. In an earlier paper in Nature Medicine, they found ALK and RET mutations when testing 64 colorectal or non–small cell lung cancers (this is also available as a poster on their website).

Interestingly the earlier paper described a test analysing just over 2500 exons in 145 cancer-relevant genes with the same number (37) of introns from 14 commonly rearranged cancer genes. This suggests to me that Foundation Medicine understand fully the need to keep the test current and are watching the literature for new candidates to add.

It does make me wonder though if smaller panels might be the best way to go for general screening of all patients first?

The reports: There is not a copy of a Foundation One report available on their website but you can get a good idea of what is covered by watching the video on their website. The test is supported by a bioinformatic pipeline that sits on top of a curated database of research and clinical publications, as well as previous cases. The mutational profile is reported along with suggestions for targeted therapy and clinical trial opportunities.

A FoundationOne report

One of the challenges discussed in rolling cancer genome analysis out to the clinic has been teaching people how to interpret the data. Foundation Medicine is not the first to offer genomic data backed up by published research (my 23andMe profile was pretty clearly explained), but they have obviously seen that to make the test easy to adopt they need to make it easy to understand.

Monday, 15 October 2012

AGBT 2013: lottery tickets available tonight!

The deadline for submitting your registration and abstract to ABGT closes tonight! As I said a few weeks ago this year the organisers are trying harder than ever to make sure the meeting reaches as wide an audience as possible. There are always grumblings from us but they really are working hard to please us!

You have until midnight tonight to register and send in your abstract if you want to present a talk or poster. After this everyone will have to wait until December 1st to see if they got a place. Then you can book your flights and buy some suncream.

Good luck to you. I am not going this year due to family commitments so I'll be reading blogs and listening (is that what you do) to Tweets to keep up with all the news.

Have fun.

Tuesday, 9 October 2012

The 2500 flowcell: is this Illumina's razor blade

I finally saw an image of a 2500 flowcell so you can stop looking at the one I mocked up based on discussions with Illumina personnel just before AGBT.

One thing immediately struck me and that was how much a HiSeq 2500 flowcell looks like an old fashioned razor blade? OK, you may have to squint to see the similarity!

I am sure many of you have heard of the razor and razor-blades business model? It is where the hardware (HiSeq 2500 in this analogy) is sold off pretty cheaply while the consumables (flowcells and SBS reagents) are marked up to bring in the real profits.

Why then does a HiSeq 2500 cost $700,000?

PS: I can't be the only person looking forward to real "genomes-in-a-day"? 20x coverage is for wimps! ;-)

Wednesday, 3 October 2012

How old can a PCR machine be whilst still being useful?

PCR was invented way back in 1985 and the first thermo-cyclers were released in

Before this PCRs were done in tubes placed in separate water-baths. Polymerase was also added at each new cycle until someone worked out that Taq (discovered in 1976 by University of Cincinnati researcher Alic Chien) might be a good alternative for the standard DNA polymerase. The first PCR machine was produced by Cetus in 1988 called "Mr Cycle" but required the use of fresh enzyme after each cycle. In 1998 Perkin Elmer released the first automated thermal-cycler that all instruments we know and love today are based on.

My Cycle

You can see some really old instruments on the LifeTechnologies website, where they are giving away a free Veriti PCR machine for entrants into their competition. You can see old Hybaid, MJR, Perkin Elmer and other instruments. Look hard for the one you first used, mine was the MJR PTC-100. Work got a lot easier with the release of the Tetrad machines because fights no longer broke out over who had/not booked the machine before going home!

Screenshot from Life Tech competition website

PS: If you fancy running 230,400 PCRs in one go give the Soellex a try. A water-bath PCR machine that hod 600 384 well plates.

How to do better NGS experiments

Design, replication, multiplexing.

These are the three things I spend most of my time discussing in experimental design meetings. In my institute we hold three 30-minute design sessions every week where users talk to my group and our bioinformatics core about how best to run a particular experiment. I do believe our relatively short discussions have a big impact on the final experiments and the most common issues are the three I listed at the start.

Of course we spend lots of time talking about the pro’s and con’s of different methods, RNA-seq vs arrays or mRNA-seq vs Ribozero RNA-seq for instance, but the big three get discussed even if it is clear what method should be used.

I'd encourage everyone to think about experimental design as much as possible. Simply thinking about the next step in your experimental process is not enough. Take time to plan where your experiments are going and what are the most logical steps to get there. Then make sure each experiment is planned to make best use of your available resources. Even cheap experiments can end up being expensive in lost time. Don't save experimental design for more costly array or sequencing based projects!

Design: This is important because it suggests that an experiment has had more than one person think about it more than once. Even “simple” experiments often have confounding factors that need to be considered; or require assumptions to be made about steps in the experiment where real data might turn out to be sorely lacking.

Designing an experiment often means sitting down and listing all the issues that might affect the results and highlighting the things that can, or can’t be done to mitigate of remove these issues. This can be done by anyone with sufficient experience of the experiment being performed. We find it is best done together over a cup of tea or lunch in an informal discussion, just like our design sessions!

Replicates: Replication is vital in almost all experiments. Only if an experiment is truly limited to being done once should replication be ignored. Most people can come up with multitudes of reasons as to why more replicates are a bad idea. However when confronted with data showing how increasing replicate numbers can make experiments more robust and more likely to find significant differences, many users are persuaded to add in four or even more replicates per group.

Biological replication is king and technical replicates are often a waste of time. Be wary of pooling samples to make replicates appear tighter, you are losing information about the biological distribution of your data that might be meaningful.

We find four replicates is the minimum to consider for almost all experiments. Three works well but if one samples fails to generate results a whole experiment can be rendered useless. Four gives a big step up in the ability to detect differences between groups, five adds even more power but after six replicates many experiments start to tail off in this additional power. Unfortunately it is difficult to predict the number of replicates needed to get the best £:power ratio. It is easily done after the experiment is complete, and I have yet to go to a statistical seminar that does not put Fishers “statistical post mortem" quote* up at some point to ram this home!

* Currently at number 3 in the famous statistician quotes chart!

Multiplexing: For me this is the one people seem to forget to think about. I think I am convincing many that the correct number of samples to multiplex in a single NGS run is all of your samples. Rather than run 4, 12 or 24 samples per lane and always stick to this I prefer to argue that having all the samples in a pool and running multiple-lanes makes the experiments more robust to any sequencing issues. If a lane has problems then there is still likely to be lots of data from all the other lanes in the run.

There are also some issues with demultiplexing low-plex pools on Illumina as the software struggles to identify clusters correctly if they are too similar. We have had users submit libraries for sequencing with just two or three samples pooled. These have failed to generate the usual yield of data and demultiplexing is horrible. There is nothing we can do and it has been frustrating explaining to users that there four carefully pooled libraries have all failed when if they had just mixed all the samples together in one super-pool and run 4 lanes everything would have been fine!

Putting it all together: When we plan experiments now I try to ask how many reads might be needed per sample for a specific application. Once this is fixed then we can decide on replicate numbers for the experiment. Finally we can work out how many lanes are likely to be needed given the variability of Illumina sequencing. If an extra lane is needed later on there is enough data to start analysis, but often we don’t need more data and the sequencing becomes as efficient as possible.

PS: feel free to comment on any aspect of design discussed or missed here. Please don't ask me for help designing your projects though!

Monday, 1 October 2012

Anyone fancy trying to “read DNA”? It goes something like this…01110010 01100101 01100001 01000100 00100000 01000100 01001110 01100001

George Church is one of the “godfathers of genomics”*. In one of his latest publications, Next-Generation Digital Information Storage in DNA he demonstrates how to use DNA as an information storage medium. He’s not the first to do this and the supplemental information to the paper lists ten other references, but his is the best example so far.

*George Church, along with 15 others including; Walter Gilbert, Leroy Hood and John Sulston was one of attendees at the 1984 Alta conference where the Human Genome Project was conceived. He published the first method for sequencing of methylation sites in 1984, Genomic sequencing in PNAS. George is very open access, see his unathorised autobiography. In fact he is so open-access I wonder if he could be a godfather of that too!

The paper describes how the text, images and a JavaScript program from the book Regenesis were converted to DNA in a readily amplifiable and readable form. This is not something just anyone can read though; in the paper they used 170M PE100bp reads from a HiSeq lane. This makes it a very expensive book in a format not compatible with a Kindle!

How do you turn a book into a library: George used Agilent’s programmable eArrays to make the DNA version of the book. After synthesis the oligo’s were cleaved from the array into a pool that was PCR amplified with Illumina compatible primers ready for sequencing. You can buy an 8x60k eArray for the equivalent of about £100 per book.

How much sequencing do you need to do: The sequencing was 3000x fold coverage and geivn the aibtily of Hmunas to raed smrlcbaed text I suspect that level of redundancy is massive overkill. Reducing read lengths to PE75 and using slightly longer fragments (150 vs 115) would decrease the costs of sequencing. George used 54,898 115bp oligos each carrying an address and 12x8bit sequences, increasing this to 16x8 would result in a 151bp oligo and only require 41,000 fragments. Even low coverage sequencing could be completed on a MiSeq or PGM.

"Encoding and decoding" DNA from the paper

As DNA read-lengths increase, especially out to the 100kb Oxford NanoPore presented, then reading become a matter of only a few reads. Georges book could be read by just 52x100kb ONT reads. Perhaps combining the oligo production with Craig Venter’s artificial lifemethods would be the way to go?

Fancy giving it a try yourself, the code is available here, Bits2DNA.pl and some of you have sequencers ready to run in the lab.

PS: George Church did the experiments himself. His supplementary information is excellent, probably the best I have read for being able to actually repeat the experiment. It also appears that George has written like this for most of his research career, the methods in his 1984 paper are just as comprehensive and concise. I wish everyone (myself included) wrote this level of detail so succinctly.

PPS: if George Church is reading this then please accept an open invitation to coffee next time you are in Cambridge, UK.

Pages