Wednesday 18 December 2013

Win a free MiSeqV3 600 cycle run: the Core Genomics Christmas Competition

Last month I posted a logical puzzle to see if people knew how much different next-gen sequencing applications cost? The repsonses have been varied!

Now Illumina have very kindly donated a MiSeq run and 250 iCredits as the prize for a festive competition. You can win a MiSeq 600 cycle run, which I'll put on one of the MiSeq's in my lab and deliver the data via BaseSpace, where you'll be able to use your free iCredits to analyse data using the BaseSpace apps.

Monday 16 December 2013

Will anyone get his or her genome sequenced at AGBT this year?

Given HiSeq 2500 and Ion Proton it is entirely possible for someone to sequence their genome in the Illumina or Life-Tech booth’s at AGBT, so will anyone attempt it?

Saturday 7 December 2013

"A bridge too far" for consumer genomics?

It’s amazing what is being done with DNA sequencing. Cancer genetics and personalised medicine make headlines, consumer genomics has been in the news and Genomics England are going to sequence 100,000 NHS patients. But all that glitters is not gold!

Thursday 5 December 2013

23andMe vs Lisa Casey

Updated after reading Dale Yazuki's blog pointing to this post by Lukas Hartmann, which I've sumarised next to Shaheen Pasha's below.

Poor old 23andMe; first the FDA and now Lisa Casey, can they survive? And what would their failure mean for personal genomics?

Saturday 30 November 2013

How much does NGS cost: a logical puzzle

At our recent institute symposium I added an additional graphic to try and get people thinking about how much their sequencing costs.

Can you work it out: Below are six pairs of circles representing genome, methylome, RNA-seq (GX), ChIP-seq, Exomes and amplicons, size is proportional to cost and the two colours represent library preparation and sequencing.
  • Can you determine which application is which?
  • Which colour is library prep?
  • Which colour is sequencing?
  • How much do you think each application costs?

Answers in the form of comments or drop me an email. The results here surprised me, I thought people had a better idea of the actual costs but it turns out people have some pretty wild ideas!

Friday 29 November 2013

Genome-infographics: making sense of what we do

We recently had out Institute symposium; lots of talks about the science going on in the building, mostly unpublished work and most of it very exciting. I am always amazed at how lucky I am to be working in a place that uses genomics so widely, this year we had talks from many of out non-genomics groups that still included some work performed in my lab. Genomics is getting (almost) everywhere!

For the poster session this year I wanted to use infographics to create some nice visualisations of what we've done over the past seven years. I thought I'd share these and point you to the resources I used to create them.

Thursday 28 November 2013

Myriad: school bully, or just sticking up for themselves?

Can some explain Myriad's strategy, other than using bullying tactics to kill competition I don't get it? Is there any real prospect of the law siding with their point of view? I hope not. And since the US market has some pretty wonderful advertising (surely you all remember the PGM vs MiSeq ad series), surely Myriad will end up damaging their own reputation with consumers therefore shooting themselves in the foot?

Monday 25 November 2013

Sanger-seq is dead: If you only read one paper this month read this one...

Foundation Medicine published a wonderful paper in Nature Biotechnology last month. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing describes their approach to personalised cancer  genomics.

Foundation Medicines "Foundation 1" test allows NGS analysis of 287 genes from FFPE samples for SNPs, InDels, CNA's and fusions. They describe the design and testing of this panel along with results from over 2000 clinical samples, finding "clinically actionable alterations in 76% of tumours" 3x more than they say are found with current methods.

Is Sanger-seq is dead:

Thursday 21 November 2013

Nanopore sir? should be delivered in 2014 or perhaps 2030!

Many of the readers of this blog will have seen the announcements from Oxford Nanopore Technologies on their MinIon early access program. The people I talk to are almost universally excited; if a little sceptical about how quickly we’ll be getting rid of our HiSeq’s and Proton’s.

Wednesday 20 November 2013

RIP Fred Sanger

Fred Sanger died today aged 95. A sad day for science but one where we can remember the phenomenal impact his work has had on us all. Of course being a genomics lab makes his work all the more important, but almost everything done in biology makes use of his advances in the form of sequence data.

We don't use Sanger sequencing as much as before, nearly all the biology I'm involved in uses next-generation sequencing. But I am sure Shankar Balasubramanian, David Klenerman, Johnathan Rothberg and George Church will all be raising their glasses to one of modern sciences greats.

The Cambridge News and The Telegraph both have very good obituaries.

Goodbye Fred. I wish I'd met him!

Monday 18 November 2013

Can it hold a pipette?

Don't we all get tired of being in the lab and wish we could work from home sometimes? Office workers do it fine and now they have a robot to help them keep in touch with colleagues in the office.

I'm not exactly sure how we might adapt this to hold a pipette, and I'm not sure but I don't know of many labs in bungalows!

96 ChIPs? That’ll fit on one of Illumina's new patterned flowcells

ENCODE was a mammoth endeavor, and one that is helping to better shape our understanding of biology, but the project required a large multi-national collaboration to generate the 1000’s of ChIP-seq and RNA-seq libraries. Last week Duncan Odom’s research group at the Cambridge Institute published an automated ChIP-seq pipeline in Genome Biology capable of generating 96 ChIP-seq libraries with just 2 hours hands-on time making the lab-work for projects of ENCODE scale possible in just a few weeks. With all the samples on a new higher-density patterned flowcell perhaps?

Friday 15 November 2013

How to access BaseSpace forums

If you are using Illumina's BaseSpace then you probably run into some of the same frustrations as myself and other users, however Illumina do provide a feedback mechanism and a forum to suggest ideas for development. Finding this can be a bit difficult so in order to remind myself next time I want to find it, and to help others who may be looking, I wrote this post.

Thursday 14 November 2013

$100million for the Broad: what have they done for genomics?

When the Prime Minister announced £100M and the creation of Genome England we thought we had made the big time here in the UK. But compare that to the Broad's latest gift of $100M and our national effort suddenly doesn't look so big!

Wednesday 13 November 2013

Dr Evil's exome seqeuncing services

Sequencing service providers are popping up everywhere and offering some great deals on genomes, exomes and RNA-seq. How can this sequencing be so cheap I hear some of my users saying? The costs are usually dependent on a certain volume of work and are likely to bring in lots of the same sort of sequencing at once, while the promotion is on. Running RNA-seq in 96-well plates and across multiple short-read flowcells is very efficient so big savings can be made this way and providers are keen to pass those savings onto you to get your business and hopefully keep it. I think we’ve almost reached the point where NGS reads are a commodity and people just want them as fast and cheaply as possible, just like Sanger sequencing.

Thinking about this made an idea pop into my head, not one I’m going to pursue but one I thought readers of this blog would like to hear about.

Welcome to Dr Evil’s Sequencing Centre:

Tuesday 12 November 2013

PubMed commons: how will we use it?

PubMed commons is hoping to create somewhere for researchers to "share their opinions about scientific publications", it is going to be for "open and constructive criticism and discussion of scientific issues" and will "depend on the scientific quality of the interchange". Your comments are made available under a CC license so everyone can reuse them and comments are moderated (I already saw one that had been removed).

Friday 8 November 2013

Personal Genome Project UK and Dr Evil’s frame-up

-->George Church started something great back in 2005, now Stephan Beck at the UCL Cancer Centre has kicked off the UK’s own Personal Genome Project. The idea has always been a simple one, get data from willing participants, make genome sequencing free and make the data available on a free-to-access model. The PGP has always aimed to be clear that there is little in direct benefit to participants; except of course the warm and fuzzy feeling that only your genome being sequenced can give you!

Dr Evil is back: The PGP website lists some of the benefits to science, I won’t well on those here. They also talk about some of the risks and one in particular caught my attention; that data might allow someone to “make synthetic DNA corresponding to the participant and plant it at a crime scene”.

Wednesday 23 October 2013

MinION early access program

Update from GenomeWeb at the bottom!

Get ready for millions of minions!  ONT have announced an early-access program for the MinIon (GridION later) and it is almost free to access. I'm sure they will see huge demand from eager NGS users. But who will have projects best suited to the MinION technology and who will be first to publish?

Let's not forget ONT's technology promises long-reads, how long is not completely clear but some applications will benefit more than others.

$1,000 buys you a MinION system, free flowcells (to an undisclosed limit), free sample prep and free sequencing reagents. Of course there is no such thing as a free lunch and ONT will require users to sign their End User License Agreement "to allow Oxford Nanopore to further develop the utility of the products, applications and customer support while also maximising scientific benefits for MAP participants". And the press release does give a lot of hope that ONT don't want to restrict your right to publish.

"MAP participants will be the first to publish data from their own samples. Oxford Nanopore does not intend to restrict use or dissemination of the biological results obtained by participants using MinIONs to analyse their own samples. Oxford Nanopore is interested in the quality and performance of the MiniION system itself."

I've signed up already!

You can too at by visiting the ONT contact page and selecting the box marked 'Keep me informed on the MinION Access programme'.

Update: Some more details came from GenomeWeb a few minutes after I posted. According to their coverage read-lengths may be up to 100kb but the number of pores could be as low as 500. This is exactly the kind of detail we are going to need to determine the best applications to run tests on.

Tuesday 22 October 2013

Bioinformatics at the top

A few years ago one of our junior group leaders made an interesting appointment; he recruited a bioinformatician into a research assistant role. Every lab has someone, or several people, who keep the lab running. They are the people making sure cells get cultured, supporting post-docs and PhD students, the nuts and bolts of most labs. But this recruitment stood out as the person was being appointed to look after "Big Data", not PCR & gels!

Now we have embedded bioinformaticians and bioinformatic research assistants in many of the groups, especially those heavily using Genomics technologies.

Computational groups seem to have changed too and all those in our Institute now have wet-lab scientists as part of their team. I think this is definitely the way to go and makes it much easier for groups to direct their research in a particular direction.

Computational biologists of all sorts are rapidly cropping up at the top end of the career ladder. In February 2013 Professor Simon Tavare FRS, FMedSci was appointed as Director of the Cancer Research UK Cambridge Institute (the place I work), and earlier this week BBSRC announced that Dr Mario Caccamo has been appointed the Genome Analysis Centre's (TGAC's) new Director.

With statisticians, computational biologists leading the way as Directors of research institutions and not just as group leaders I wonder if we'll see a slightly different angle on some of our research?

Monday 21 October 2013

Genomics England is go

Genomics England is steaming ahead to sequence 100,000 genomes from NHS patients. Today Genomics England and Illumina announced their intention to start the 1st 10,000 genomes as part of a seqeucning contract run by Illumina.
Set up by the Department of Health and announced by the PM (when visiting my lab) in December 2012, the Genomics England project has some lofty goals. If the team can deliver then the NHS and the UK population really could benefit from the advances in molecular medicine. I for one would be glad to see the NHS take the lead on the word stage as we've had some pretty big milestones so far in the UK:

1953 and the structure of DNA was discovered in the UK.
1977 Sanger DNA sequencing invented in the UK.
1997 Solexa sequencing invented in the UK.
2020 UK NHS 1st to screen all cancer patients with NGS?

The £100 million so far pledged by the UK government will (according to the Genome England website):
  • train a new generation of British genetic scientists to develop life-saving new drugs, treatments and scientific breakthroughs;
  • train the wider healthcare community to use the technology;
  • fund the initial DNA sequencing for cancer and rare and inherited diseases; and
  • build the secure NHS data linkage to ensure that this new technology leads to better care for patients.
See the Science working group report if you'd like to know more about where they are going.

This morning Genomics England and Illumina announced their intention to start a 3 year programme of sequencing genomes. The 1st 10,000 genomes will be for rare diseases and this has real potential to impact many patients; ideally with treatments for their disease, but at the very least a hope that a causal mutation is discovered.

This is the first step for the NHS to develop the infrastructure required to bring WGS into routine clinical practice. But the UK is likely to need a big and shiny new sequencing space if we are going to do what David Cameron said and do all the sequencing in England. Note that is very carefully stated, the seqencing will be in England; not China, not the US, and not Scotland!

Whether we will we realistically sequence whole genomes from 100,000 patients is not clear. The infrastructure to do this in a timely fashion does not exist in the UK (yet). And as the technologies for sequencing improve clincal exomes, amplicon panels and whole genomes will all need to be considered to find the best fit for different groups of patients.

With Synapdx in the US releasing a Autism Spectrum Disorder test using RNA-seq it is clear that genomes are just the tip of the iceberg.

Friday 18 October 2013

How good is your NGS multiplexing?

Here’s a bold statement: "I believe almost all NGS experiments would be better off run as a single pool of samples across multiple lanes on a sequencer." 

So why do many users still run single-sample per-lane experiments or stick to multiplexes that give them a defined number of reads per lane for each sample in a pool? One reason is the maths is easy: if I need 10M reads per sample in an RNA-seq experiment then I can multiplex 20 samples per lane (assuming 200M reads per lane). But this mean my precious 40 sample experiment now has an avoidable batch effect as it is run on two lanes which could be two separate flowcells on different instruments at different times by different operators in different labs…not so good now is it!

And why doesn’t everyone simply multiplex the whole experiment into one pool in the first place? When I talk to users the biggest concern has been, and remains, the ability to create accurate pools. A poorly balanced large-pool is probably worse than multiple smaller-pools ones, as with the latter you can simply do more sequencing on perhaps one of the sub-pools to even out the sequencing in the experiment.

We have pretty agreed standards on quality (>Q30) and coverage (>10x for SNP calling), but nothing for what the CV of pool of barcoded libraries should be. What’s good and what’s bad is pretty much left up to individuals to decide.

Here are some examples from my lab: pools 1, 2 & 3 are not great; 4 is very good.

Robust library quantification is the key: What do Illumina et al do to help? The biggest step forward in the last few years has been the adoption of qPCR to quantify libraries. Most people I speak to are using the Kapa kit or a similar variant. Libraries are diluted and compared to known standards. When used properly the method allows very accurate quantification and pooling however it has one very large problem; you need to know the size of your library to calculate molarity.

The maths once you have size is pretty simple: 

We find dilutions of 1:10,000 and 1:100,000 are needed to accommodate the concentrations of most of the libraries we get. We do run libraries in triplicate and qPCR both dilutions. It’s a lot of qPCR but the results are pretty good.
Unfortunately accurate sizing is not trivial and it can be a struggle to get this right. Running libraries on a gel or Bioanalyser is time consuming and some libraries are difficult to determine a very accurate size for, e,g, amplicons & Nextera. Some users don’t bother, they just trust that their protocol always gives them the same size. The Bioanalyser is not perfect either, reads this post about Robin Coope’s iFU for improved Bioanalyser analysis. Get the sizing wrong and the yield on the flowcell is likely to be affected.

Even with accurate QT pooling is still a complex thing to get right: Illumina try to help by providing guidelines to allow users to make pools of just about any size. However these are a headache to explain to new users without the Illumina documentation. And the pooling always has a big drawback in that you may need to sequence a couple of libraries again and this can be impossible if they are not compatible.

 We run MiSeq QC on most of the library preps completed in my lab. This is very cost effective if we are sequencing a whole plate of RNA-seq or ChIP-seq, at just £5 per sample. However if we only have 24 RNA-seq samples then we’ll only want 2 lanes of HiSeq SE50bp data, this means MiSeq QC is probably a waste of time and we’ll just generate the experimental data. Unfortunately the only way to know for sure that the barcode balance is good is to perform a sequencing run!

Mixing pools-of-pools to create "Superpools": We’ve been thinking about how we might handle pools-of-pools (Superpools) on HiSeq 2500, the instrument has a two-lane flowcell that requires a $400 accessory kit if you want to put a single sample on each lane. The alternative is to run two lanes, or a superpool of libraries from different users. We’ve tested this in our LIMS and can create the pools, the samplesheet and do the run but in thinking about the process we’ve come up with a new problem. What do you do when the libraries you want to superpool are different sizes?

We can accurately quantify library concentration (if you can accurately size your libraries) but the clustering process favours small molecules. Consider the following scenario: in a superpool of two experiments on one HiSeq 2500 flowcell we have an RNA-seq library (275bp) and a ChIP-seq library (500bp). These are equimolar pooled and sequenced. When demultiplexed the RNA-seq library accounts for 80% of the run and the ChIP-seq 20%; consequently the RNA-seq user has too much data and the ChIP-seq user has too little. And all because the smaller RNA-seq library clustered more efficiently. How do you work that one out!

We’ve not empirically tested this but I think we will soon on our MiSeq.

Top tips for accurate pooling:
  1. Perform robust QT
  2. Mix libraries with high volume pipetting (~10ul)
  3. Run MiSeq QC
PS: writing this post has got me thinking of better ways to confirm pooling efficiency than sequencing. Watch this space!

Tuesday 15 October 2013

Hacking MiSeq updated and now hacking your BioAnalyser too!

In a post earlier this week I talked about the hacking of a MiSeq run by MadsAlbertsen, one comment on the post drew my attention to another paper I'd missed where the authors hacked their MiSeq to perform 600bp reads (PE300). Considering this was a year before Illumina sold us kits I'd say that's quite an achievement!

The Genome Sciences Centre, British Columbia Cancer Agency in Vancouver, did the sequencing for the Spruce genome paper (1). One of the authors is Robin Coope (Group Leader, Instrumentation BCCA Genome Sciences Centre) and he has been behind some pretty cool engineering in the genome sciences. In the Spruce paper his group demonstrated how to crack open a MiSeq cartridge and replace the insides with a larger reagent reservoir so kits can be mixed allowing much longer runs than Illumina intended (at the time of publishing).

The image below is from their supplementary data, I don't recommend you do this at home!

I met Robin when he was speaking at European Lab Automation 2013 in Hamburg last June. He gave an excellent talk on Automation Challenges in Next Generation Sequencing; we also had excellent weiner-schnitzel and dark bier once the conference finished. He spoke about the problems of quantifying NGS libraries on Bioanalyser and qPCR; we want molarity but get DNA concentration and these are not the same thing! Current methods allow you to use a simple calculation to convert between the two but this is heavily reliant on library size estimation. It is pretty much impossible to get the size right in the first place without measuring it and most people use the BioAnalyser. This is where Robin's talk really got interesting for me...

Unfortunately I can't share the slides (it was a commercial conference) but you could email Robin and ask him for a copy (or to hurry up and publish). Basically he described the deficiencies of the Bioanalyser software and introduced the concept of intelligent Fluorescent Units (iFU) to change the way the Bioanalyser does its analysis.

The Bioanalyser does a reasonable job of calculating size and molarity that works well on “tight” libraries, equally a visual estimation of mean insert size gives good results and cluster errors are more likely to be from mass quantification errors than insert size estimation errors. However for wider library distributions like Nextera or amplicons, iFU improves cluster density prediction and reduces cluster density error by 60% in the set (n=28) of amplicons he presented.

Of course none of this would be needed if we were using probe-based assays or digital PCR to count library molecules, but that is a whole other post!

Finally Robin went on to describe his groups work on the Barracuda a robot for 96 samples gel size selection.

Monday 14 October 2013

Detecting trisomies in twin pregnancies: now available from Verinata

Illumina acquired Verinata earlier this year and their Verif prenatal test is a non-invasive one that detects foetal aneuploidies as early as 10 weeks. Many others are developing or selling similar tests and the real excitement for me (as I already have kids and don't plan on any more) is the impact that developments in foetal medicine have for Cancer diagnosis and prognosis.

A press release on Illumina's website today announces the development of the Verif test for use in twin pregnancies. A twin pregnancy means the allele fraction from each twin in maternal blood is lower than a single pregnancy making detection harder. They have verified the test in over 100 twin pregnancies and achieved 100% detection of aneuploidy for Downs, Edwards and Patau syndromes, trisomies 21, 18 & 13 respectively .

This shows how much development is still on-going in non-invasive testing by NGS.

Does this mean we can expect tests that will detect multiple cancers from ctDNA? Perhaps if we can improve sensitivity and can distinguish cancers based on specific patterns of mutation.

How do you count reads from a next-gen sequencer?

We’ve been asked a question many times over the years “should you count paired-end sequencing as one read or two?”

“Who cares” I hear you cry. But if you ask someone to give you back 200M reads for a sample and they give you 100M paired-reads who’s right? Again this may not be important to you but if you have to pay for 100M reads when you were expecting 200M you’re going to feel short-changed. And when those 100M might have cost £500 or more you care about the change!

My personal view is that it is better for us to count the number of molecules we sequenced from the initial library. This currency tells us something about how deeply we sequenced one sample compared to another. In the case of Illumina sequencing that means counting clusters (raw or PF is a whole other debate), for Ion Torrent I guess it would be positive wells (probably PF reads).

We get about 160-180M reads per lane on our HiSeq 2000, and we count a ‘pair’ of reads as a single data point. That is to say, a single end run or a paired end run with the same cluster density will give the same "read" count. This turns out to be a useful when we want to compare performance of single-end and paired-end runs. I’m happy to listen to the point of view that says their are actually two sequencing reads generated in the paired-end run but I find it adds to the confusion new users have. I know others do too as I’ve been to many talks where people have quoted the number of reads they get for an instrument and I know it was way too high to be possible (my suspicion being a paired-end run was quoted).

The nice simple metric number of clusters per lane works well for me in most cases. This also allows me and my users to compare between different instruments easily; GAIIx 40M, HiSeq 170M, MiSeq 15M, etc.
Unfortunately in trying to decide whether to upgrade to HiSeq 2500 and use rapid runs it gets confusing as I really need to consider the number of clusters per unit of time I can sequence to determine if the experiment will be cheaper on multiple rapid runs rather than one standard run! Instrument amortisation and maintenance charges are high so the more runs I can do in a week the better, I think. The life of a core lab manager is full of exciting stuff like this.

Thursday 10 October 2013

Hack your MiSeq and get $400 off a 600bp run

I’ll start off by saying not quite, but you can read on to get an idea on how to increase read length of a MiSeq 500cycle v2 kit to get 600bp of data.

MadsAlbertsen posted on a SEQanswers thread about their protocol to squeeze a little more out of the MiSeq. They are using a hacked MiSeq Reagent kit v2 (500 cycles) and running a 2x301bp which is not supported by Illumina. “Do it at your own risk! (although it works nicely.)” is the message on the website. The group are using the modified protocol and hacked kits for bacterial 16S rRNA gene amplicon sequencing of the V1-3 variable region. The target region in E. coli position is a total of 489bp, but depending on target species can vary up to significantly (making the 2x301 run necessary).

How to hack your MiSeq kit: Make sure you follow the instructions on adding a little extra reagent to some of the wells.
5 mL of incorporation buffer from well 1 of a left-over reagent cartridge to well 1 of the 2x301 cartridge.
7 mL of scan mix from well 2 of a left-over reagent cartridge to well 2 of the 2x301 cartridge.
6.8 mL of cleavage mix from well 4 of a left-over reagent cartridge to well 4 of the 2x301 cartridge.
80 mL of incorporation buffer from a left-over incorporation buffer bottle.

Now simply set the Miseq to 2x301 in the samplesheet and ignore the warning the software gives. Et voila 600bp for the price of 500. With a MiSeq v3 kit costing about $1400 that’s potentially a $400 saving. 

Will we be doing this in my lab? No way, I’m far too conservative with users samples to play around like this. But I wish I could do more stuff like this, as it’s fun. It makes me want to come up with my own genomics Instructables.

Watch out for their paper: Saunders, A.M., Albertsen, M., McIllroy, S.J. & Nielsen, P.H. (in prep) MiDAS: the field guide to the activated sludge ecosystem.

Friday 4 October 2013

It's not Open Acess's fault!

Update: ...see the bottom of the post for more coverage on this "sting"

GenomeWeb has coverage of a story all of us should take a look at. A fake manuscript produced by a journalist from Science was accepted by 157 open-access publishers, a damning indictment of OA? Probably not. Damning indictment of peer review? Quite possibly.

Read Michael Eisen's blog for a pretty good discussion of the whole problem (NB: Michael Eisen is co-founder of the PLoS).

The article has obviously struck a chord with it being in Sciences top 1% of articles ranked by Altmetric.

Peer review is full of problems and all of us suffer from that. But it is the system we have so we should work hard to do it right. So I'm off to re-read that paper I reviewed last week to see if the data really does stack up...

Update: GenomeWeb pointed to a live chat organised by Science on the impact of this paper. They bill this as a "chat about the dark side of open access and the future of academic publishing"! Still not making it clear what some of the  problems were with their "study".

Zen Falukes has a nice round-up of coverage on his blog, NeuroDojo (and a great header image).

PS: We still need to fix peer-review, anyone got any good ideas?


Monday 23 September 2013

Circulating RNA analysis

Analysis of circulating DNA has had a major impact in the last couple of years with 100s of publications in the last few years. I've previously posted about work at the Institute on circulating tumour DNA analysis of amplicons and exomes, but what about RNA?

Thursday 12 September 2013

Patterned flowcells: what can we expect?

There was a real buzz in the community when Illumina resurrected the idea of using patterned flowcells for SBS sequencing (Keith Robison covered lots of genomics news back in January, including patterned flowcells ). One of the big problems with clustering is the need to very carefully quantiy samples before loading onto a flowcell. Most labs use qPCR or BioAnalyser but even the best rarely achieve perfect density on every run if they are working with a diverse group of libraries. Jay Flately at a JP Morgan conference said that only 36% of clusters are useable and that the new technology should double that.

Patterned flowcells have been mentioned before in Illumina roadmaps, they'll potentially allow 1Tb or more from a standard HiSeq run and are one way we might see an end to cluster density variability. But a patterned flowcell might also allow improvement of some methods and even interesting new applications to be developed.

We'll have to wait until Illumina release the new flowcells at the end of the year (more likely in January just before AGBT) but for now I thought I'd put down some thoughts I've been having about what we might get and look through one of Illumina's more recent patents.

Friday 6 September 2013

Mouse models of Human disease constantly need to be improved

I’ve worked on model-organisms for a long time, originally in Plant research but nowadays I'm more likely to run genomic experiments for groups using Mouse models of cancer. It is impossible to do some experiments in Human patients for a variety of reasons, so we use Mouse models instead. Genetically engineered mice (GEMs) are used in many research programs and offer us the ability to tailor a disease phenotype, as our understanding of the driving events in cancer increases we can build GEMs that carry these same driver mutations; we can even turn the specific mutations on at specific time points to try and recapitulate Human disease.

Tuesday 3 September 2013

Finding your way around NGS sample prep

I'm often asked which sample prep method a user should consider for their experiments. In my lab we use a lot of Illumina TruSeq kits; we've tried other methods, and do use Rubicon's Thruplex, but Ilumina's end-to-end support is useful in a medium sized core facility. And the kits work!

I wanted to illustrate the fact that Illumina sample preps share many steps in their protocols to demonstrate that once you've mastered one protocol, you can easily move onto another. The map of sample prep below is my first try at that illustration. You can see a higher resolution image here.

Explaining how the kits work always takes time and I have always thought a strength of the Illumina technology is the flexibility given by the core of sample prep: end repair, ligate adapters & PCR. Users can think creatively about what they do to their DNA (or cDNA) before starting or during the prep to come up with novel techniques. Stranded RNA-seq, bisulfite sequencing, RRBS, exome capture are just a few of the methods developed by users of Illumina technology. Hopefully the image above shows how similar things really are with those key steps clearly highlighted as large "interchanges", along with PCR and qPCR & Bioanlayser for QT/QC.

The above image borrows heavily from the London underground maps developed by Harry Beck in 1931. Most of the Illumina protocols are listed and I've included Thruplex on its own network for comparison. I'm aiming to add more detail for some of the different RNA-seq protocols at some point as well as bisulfite sequencing. I'm also thinking about how this might be extended to include details on suggested sequencing depth; lines for Human genome vs CNV-seq or DGE vs splicing.

Let me know what you think.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Thursday 29 August 2013

Targeted RNA-seq methods are here

Illumina and Life Technologies are both launching targeted RNA-seq applications which are likely to become standard tools for many labs; if the price is right.

The ability to target a portion of the genome has revolutionised next-generation sequencing experiments. The analysis of exomes has exploded, custom panels for exome-style pull-down are being used to great effect in 1000’s of samples and amplicon analysis is making it possible to run 10,000’s of samples in a single experiment (we’ve run a HiSeq flowcell with 12288 samples on it, 1536 per lane using Fluidigm – currently unpublished).

The next frontier looks like it could be using the same techniques to target a portion of the transcriptome, again allowing many 1000’s or 10,000’s of samples to be analysed in a single experiment. These technologies are likely to replace real-time PCR for mid- to high-plex studies. Anyone that has tried to run a few 100 TaqMan or SYBR assays on their 96-well qPCR machine will see the potential. And users with BioMark, TLDA, Wafergen, and other high-throughput qPCR systems will see the potential of using just one analysis method (NGS) as their primary data collection tool.

Tuesday 27 August 2013

Back from my holidays

Two weeks away from the lab, from papers, from email and from Core Genomics - I’ve just got back from Finland’s wilderness: wood-fired sauna, lakes to swim in, fish to catch and no-one for miles; and all with fantastic 4G connection! Holidays are great and I think cutting yourself off from work is important if you’re truly going to relax and unwind. On my return to work there was the usual mass of email to work through. I thought I’d summarise the things that happened in the world of Genomics that I thought were interesting while I was away.

Tuesday 23 July 2013

Illumina's next-gen automation solution

Illumina just bought Advanced Liquid Logic. Never heard of them until now, neither had I but I suspect we'll see some pretty cool devices coming soon.

On the ALL website they have the video below, I could not help but watch at 9sec and see PacMan in action, even PacMan conjoining with his twin! ALL makes disposable digital microfluidic devices allowing cost-effective robotic automation; without the robots. Their technology is based around electrowetting and does not use pumps, valves, pipettes or tubes seen on other liquid handling systems.

Monday 15 July 2013

Managing your researcher profile in the modern age

I have between 16 and 24 publications in various databases and keeping track of these can be more difficult than I think it should be. I've always used PubMed as my primary search tool and have a link to publications by James Hadfield on my blog. However like most of you I have a name that is not so unique, and other James Hadfield's also pop up in my search results (see the end of this post, and feel free to comment if you're another James Hadfield).

I posted previously about the best way to link to a paper and I'm still suggesting the DOI is the thing to use. It can be found by search engines and aggregators making the collection of commentary a little easier. In the same post I also suggested that a unique identifier for an individual would be a big step forward. Well that was also made available recently and in several different forms, so my new quesiton is which ID should you be using?

Who's looking at my papers (or yours): Before I get onto unique researcher IDs I wanted to come back to the issue of how DOI's and other tools allow aggregators to capture content. The newest "killer-app" for me is Altmetric.

They track how papers are viewed and mentioned; in the news, on blogs, on Twitter, etc. The thing you'll probably be adding to you bookmark immediately after reading this post is their free bookmarklet which will give you a report on any paper you happen to be looking at online. Below is an image of the report for one of the recent papers I was involved with. I'd like to see citations tracked and I'm sure we'll see lots more development from the team!


Which profile managing system to use: There are 5 systems you might use and it is hard to say you can easily choose one. However in writing this post I tried to get all 5 up-to-date. Some talk to each other, which helps - but you can't get away from the fact that no-one has time to waste. I'll probably stick with keeping ORCID and GoogleScholar up-to-dte and then link my ORCID ID to my Scopus and ResearcherID accounts.

ORCID: My ORCID ID 0000-0001-9868-4989. The open access one! Started in 2010 ORCID is "an open, non-profit, community-driven effort to create and maintain a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers". The registry gives unique IDS to any registered scientists with data being open access. Organisations can also sign up to allow management of staff research outputs. ORCID stands for Open Researcher and Contributor ID.

Google Scholar: Me. Good listing of citations. Good presentation of citation metrics. Want to know how Google Scholar works, then read this.

Scopus:  My Scopus Author ID: 26662876800. Display of citation numbers and link to citations page. Links to web pages mentioning your work and patents referencing it.

ResearcherID: My ResearcherID: A-1874-2013. ResearcherID is a product from Thomson Reuters and provides a solution to the author ambiguity by assigning a unique identifier to researchers. Only articles from Web of Science with citation data are included in the citation calculations. Good presentation of citation metrics.

ResearchGate: Me. Good presentation of citation metrics. Founded by 2 doctors and a computer scientist, Research Gate has been billed as "social networking for scientists". 

Why bother? Given all this information is correctly assigned the the correct authors, at the correct institution and the correct funders then we should be able to get into some very interesting meta-analysis.

Who's collaborative and who's not? Do some institutions or grant funders punch above their weight? If so why, is there something cultural that can be translated to others? Does the journal you publish in impact the coverage your work gets over time? Many other questions can also be asked, although some may not want to see the answers!

My papers in a little bit more detail: Each has a link to its DOI, PbMed and Altmetric stats.

Mohammad Murtaza et al, Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA, in Nature 2013 May 2;497(7447):108-12. PMID:23563269 with 8 citations* at the time of writing. Altmetric stats.

Saad Idris et al, The role of high-throughput technologies in clinical cancer genomics, in Expert Rev Mol Diagn. 2013 Mar;13(2):167-81. PMID:23477557 with 2 citations* at the time or writing. Altmetric stats.

Tim Forshew et al, Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA, in Sci Transl Med. 2012 May 30;4(136):136ra68. PMID:22649089 with 31 citations* at the time or writing. Altmetric stats.

Christina Curtis et al: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups, in Nature 486 (7403), 346-352. PMID with 229 citations* at the time or writing. Altmetric stats.

Kelly Holmes et al, Transducin-like enhancer protein 1 mediates estrogen receptor binding and transcriptional activity in breast cancer cells, in Proc Natl Acad Sci U S A. 2012 Feb 21;109(8):2748-53. PMID:21536917 with 15 citations* at the time or writing. Altmetric stats.

Sarah Aldridge & James Hadfield, Introduction to miRNA profiling technologies and cross-platform comparison, in Methods Mol Biol. 2012;822:19-31. PMID:22144189 with 5 citations* at the time or writing. Altmetric stats.

Charlie Massie et al: The androgen receptor fuels prostate cancer by regulating central metabolism and biosynthesis, in EMBO J. 2011 May 20;30(13):2719-33. PMID:21602788 with 56 citations* at the time or writing. Altmetric stats.

Christina Curtis et al: The pitfalls of platform comparison: DNA copy number array technologies assessed, in BMC Genomics. 2009 Dec 8;10:588. PMID:19995423 with 57 citations* at the time or writing. Altmetric stats.

Dominic Schmidt et al: ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions, in Methods. 2009 Jul;48(3):240-8. PMID:19275939 with citations* at the time or writing. Altmetric stats.

Steve Marquadt et al: Additional targets of the Arabidopsis autonomous pathway members, FCA and FY, in J Exp Bot. 2006;57(13):3379-86. PMID:16940039 with X 34 citations* at the time or writing. Altmetric stats.

Raka Mitra et al: A Ca2+/calmodulin-dependent protein kinase required for symbiotic nodule development: Gene identification by transcript-based cloning, in PNAS 2004 Mar 30;101(13):4701-5. PMID:15070781 with 289 citations* at the time or writing. Altmetric stats.

Robert Koebner and James Hadfield: Large-scale mutagenesis directed at specific chromosomes in wheat, in Genome. 2001 Feb;44(1):45-9. PMID:11269355 with 2 citations* at the time or writing. Altmetric stats.

Barbara Jennings et al, A differential PCR assay for the detection of c-erbB 2 amplification used in a prospective study of breast cancer, in Mol Pathol. 1997 Oct;50(5):254-6. PMID:9497915 with 14 citations* at the time of writing. Altmetric stats.