CoreGenomics: July 2013

Tuesday, 23 July 2013

Illumina's next-gen automation solution

Illumina just bought Advanced Liquid Logic. Never heard of them until now, neither had I but I suspect we'll see some pretty cool devices coming soon.

On the ALL website they have the video below, I could not help but watch at 9sec and see PacMan in action, even PacMan conjoining with his twin! ALL makes disposable digital microfluidic devices allowing cost-effective robotic automation; without the robots. Their technology is based around electrowetting and does not use pumps, valves, pipettes or tubes seen on other liquid handling systems.

Managing your researcher profile in the modern age

I have between 16 and 24 publications in various databases and keeping track of these can be more difficult than I think it should be. I've always used PubMed as my primary search tool and have a link to publications by James Hadfield on my blog. However like most of you I have a name that is not so unique, and other James Hadfield's also pop up in my search results (see the end of this post, and feel free to comment if you're another James Hadfield).

I posted previously about the best way to link to a paper and I'm still suggesting the DOI is the thing to use. It can be found by search engines and aggregators making the collection of commentary a little easier. In the same post I also suggested that a unique identifier for an individual would be a big step forward. Well that was also made available recently and in several different forms, so my new quesiton is which ID should you be using?

Who's looking at my papers (or yours): Before I get onto unique researcher IDs I wanted to come back to the issue of how DOI's and other tools allow aggregators to capture content. The newest "killer-app" for me is Altmetric.

They track how papers are viewed and mentioned; in the news, on blogs, on Twitter, etc. The thing you'll probably be adding to you bookmark immediately after reading this post is their free bookmarklet which will give you a report on any paper you happen to be looking at online. Below is an image of the report for one of the recent papers I was involved with. I'd like to see citations tracked and I'm sure we'll see lots more development from the team!

Altmetric

Which profile managing system to use: There are 5 systems you might use and it is hard to say you can easily choose one. However in writing this post I tried to get all 5 up-to-date. Some talk to each other, which helps - but you can't get away from the fact that no-one has time to waste. I'll probably stick with keeping ORCID and GoogleScholar up-to-dte and then link my ORCID ID to my Scopus and ResearcherID accounts.

ORCID: My ORCID ID 0000-0001-9868-4989. The open access one! Started in 2010 ORCID is "an open, non-profit, community-driven effort to create and maintain a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers". The registry gives unique IDS to any registered scientists with data being open access. Organisations can also sign up to allow management of staff research outputs. ORCID stands for Open Researcher and Contributor ID.

Google Scholar: Me. Good listing of citations. Good presentation of citation metrics. Want to know how Google Scholar works, then read this.

Scopus: My Scopus Author ID: 26662876800. Display of citation numbers and link to citations page. Links to web pages mentioning your work and patents referencing it.

ResearcherID: My ResearcherID: A-1874-2013. ResearcherID is a product from Thomson Reuters and provides a solution to the author ambiguity by assigning a unique identifier to researchers. Only articles from Web of Science with citation data are included in the citation calculations. Good presentation of citation metrics.

ResearchGate: Me. Good presentation of citation metrics. Founded by 2 doctors and a computer scientist, Research Gate has been billed as "social networking for scientists".

Why bother? Given all this information is correctly assigned the the correct authors, at the correct institution and the correct funders then we should be able to get into some very interesting meta-analysis.

Who's collaborative and who's not? Do some institutions or grant funders punch above their weight? If so why, is there something cultural that can be translated to others? Does the journal you publish in impact the coverage your work gets over time? Many other questions can also be asked, although some may not want to see the answers!

My papers in a little bit more detail: Each has a link to its DOI, PbMed and Altmetric stats.

Mohammad Murtaza et al, Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA, in Nature 2013 May 2;497(7447):108-12. PMID:23563269 with 8 citations* at the time of writing. Altmetric stats.

Saad Idris et al, The role of high-throughput technologies in clinical cancer genomics, in Expert Rev Mol Diagn. 2013 Mar;13(2):167-81. PMID:23477557 with 2 citations* at the time or writing. Altmetric stats.

Tim Forshew et al, Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA, in Sci Transl Med. 2012 May 30;4(136):136ra68. PMID:22649089 with 31 citations* at the time or writing. Altmetric stats.

Christina Curtis et al: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups, in Nature 486 (7403), 346-352. PMID with 229 citations* at the time or writing. Altmetric stats.

Kelly Holmes et al, Transducin-like enhancer protein 1 mediates estrogen receptor binding and transcriptional activity in breast cancer cells, in Proc Natl Acad Sci U S A. 2012 Feb 21;109(8):2748-53. PMID:21536917 with 15 citations* at the time or writing. Altmetric stats.

Sarah Aldridge & James Hadfield, Introduction to miRNA profiling technologies and cross-platform comparison, in Methods Mol Biol. 2012;822:19-31. PMID:22144189 with 5 citations* at the time or writing. Altmetric stats.

Charlie Massie et al: The androgen receptor fuels prostate cancer by regulating central metabolism and biosynthesis, in EMBO J. 2011 May 20;30(13):2719-33. PMID:21602788 with 56 citations* at the time or writing. Altmetric stats.

Andy Lynch et al: The cost of reducing starting RNA quantity for Illumina BeadArrays: a bead-level dilution experiment, in BMC Genomics. 2010 Oct 6;11:540. PMID:20925945 with 2 citations* at the time or writing. Altmetric stats.

Anna Git et al: Systematic comparison of microarray profiling, real-time PCR, and next-generation sequencing technologies for measuring differential microRNA expression, in RNA. 2010 May;16(5):991-1006. PMID:20360395 with 132 citations* at the time or writing. Altmetric stats.

Christina Curtis et al: The pitfalls of platform comparison: DNA copy number array technologies assessed, in BMC Genomics. 2009 Dec 8;10:588. PMID:19995423 with 57 citations* at the time or writing. Altmetric stats.

Dominic Schmidt et al: ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions, in Methods. 2009 Jul;48(3):240-8. PMID:19275939 with citations* at the time or writing. Altmetric stats.

Partha Das et al: Piwi and piRNAs act upstream of an endogenous siRNA pathway to suppress Tc3 transposon mobility in the Caenorhabditis elegans germline, in Molecular cell 31 (1), 79-90. PMID:18571451with 131 citations* at the time or writing. Altmetric stats.

Phil Smith et al: STS markers for the wheat yellow rust resistance gene Yr5 suggest a NBS-LRR-type resistance gene cluster, in Genome 2007 Mar;50(3):259-65. PMID:17502899 with 7 citations* at the time or writing. Altmetric stats.

Steve Marquadt et al: Additional targets of the Arabidopsis autonomous pathway members, FCA and FY, in J Exp Bot. 2006;57(13):3379-86. PMID:16940039 with X 34 citations* at the time or writing. Altmetric stats.

Raka Mitra et al: A Ca2+/calmodulin-dependent protein kinase required for symbiotic nodule development: Gene identification by transcript-based cloning, in PNAS 2004 Mar 30;101(13):4701-5. PMID:15070781 with 289 citations* at the time or writing. Altmetric stats.

Robert Koebner and James Hadfield: Large-scale mutagenesis directed at specific chromosomes in wheat, in Genome. 2001 Feb;44(1):45-9. PMID:11269355 with 2 citations* at the time or writing. Altmetric stats.

Barbara Jennings et al, A differential PCR assay for the detection of c-erbB 2 amplification used in a prospective study of breast cancer, in Mol Pathol. 1997 Oct;50(5):254-6. PMID:9497915 with 14 citations* at the time of writing. Altmetric stats.

*****************************
There is a J Hadfield working on plants in Australia (here, here, here, here, here &; here), at least one other J Hadfield working in clinical sciences (here & here), and possibly one other in basic research (here).

(*according to Google Scholar)

Wednesday, 10 July 2013

Ethanomics a great blogger who really seems to know about ChIP-seq

I was pointed to Ethan Ford's blog by a colleague and thought I'd recommend you take a look. Ethan is a post-doc in Ryal Lister's lab at The University of Western Australia.

The post that got me interested was one about homemade AMPure beads. The protocol was not written by Ethan, but modified by Brant Faircloth & Travis Glenn (from the methods section in: Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture) for a SeqCap workshop, referencing.

They also wrote a protocol to adapt SureSelect to work with Illumina TruSeq and Nextera libraries.

Nadin Rohland from Harvard Medical School, who was the lead author on the above paper, was also one of the researchers that discovered Africa has two species of elephant not one. And if anyone is interested in a genetics/philosophy story for kids this one about African elephants is great, it raises all sorts of questions for young kids (and adults) about prejudice, discrimination, and violence. The genetics is not so clear.

Ethans blog has some great resources for ChIP-seq afficionado's on his protocols page:

Native ChIP protocol

NGS qPCR quantitation protocol

MeDIP-seq protocol with TruSeq adapters

ChIP-Seq library construction using the Illumina TruSeq adapters

ChIP (MNase fragmentation) protocol

ChIP (fragmentation with Covaris) protocol

ChIP (fragmentation in 0.1% SDS – tip sonicator) protocol

ChIP (fragmentation in 1% SDS – tip sonicator) protocol

Quick ChIP (MNase fragmentation) protocol

ChIP Primer Design

Antigen production protocol

Antibody purification protocol

Immunoprecipitation of chromatin associated protein complexes

Saturday, 6 July 2013

The GenomeWeb effect

Dan Kobolt wrote a pair of articles about why he suggests you start blogging. In the first he talks about why you should, and perhaps why you would not, start blogging. I'd certainly encourage people to start, it's fun, free and the feedback can be great. Many people leave comments on this blog and I get emails from readers about blogs they'd like to see written. I'm also seeing blogs get +1's now, although I'm not really up to speed with that particular social networking.

The traffic I get to my blog is a real inspiration to keep writing. I regularly meet people who read my blog at conferences and other meetings, although no-one's bought me a beer because they liked it so much! I do keep an eye on my stats and occasionally get a massive spike in readers. Usually this is because GenomeWeb has covered one of my blog posts and the numbers of readers can reach over 1000 a day. I'm sure other bloggers see the same effect on their sites too.

I call this spike "the GenomeWeb effect".

The GenomeWeb effct in action on Core Genomics

Thanks GenomeWeb, I know you can't make them stay but at least you're sending them my way occasionally.

Friday, 5 July 2013

Genome England: 100,000 genomes here we come!

We've waited since Christmas to find out more and on the NHS’ 65th birthday we finally get to hear more about how the NHS is going to roll out clinical sequencing for patients in England. In December last year Prime Minister David Cameron announced the 100,000 genome project while visiting the CRUK Cambridge Research Institute. While here he visited my lab and started a MiSeq run, probably the only world-leader to sequence a genome so far.

Late last year Prime Minister David Cameron revealed that the personal DNA code (genome) of up to 100,000 patients or infections in patients will be sequenced over the next five years. You can find out more about the new "Genomics England" by emailing enquiries@genomicsengland.co.uk or by reading the Science Working Group report.

Is there a future for semi-conductor sequencing?

The reason I wrote this post is a video I saw from Panasonic on Gizmondo, my new favourite website. However I'm going to leave the Panasonic bit to the end of this piece.

Ion Torrent set AGBT on fire in 2010 with the release of the PGM, the worlds first semi-conductor sequencer. They made such a big splash that Life Technologies bought them on the promise of delivering "greater than Moore's law" improvements. Moore's genome published in the Summer of 2011 proved how much they have come along.

The promise of semi-conductor sequencers has always been that they will scale in the same way as the processors in our Mac's and PC's. For now Illumina are pushing the envelope of what is possible in sequence space with HiSeq 2500 leading the charge, but what does the future look like?

Are we waiting for Nanopores to conquer all, or are semi-conductors about to take over the world?

What is happening to the price of sequencing?

I read the coverage of a CNN piece on GenomeWeb with interest; the article talks about how much costs are dropping, but alternative views were only recently aired on GenomeWeb by Neil Hall and Mick Watson, so who's right?

Back in May a commentary article by Neil Hall and a blog post by Mick Watson, both discussed the very recent stop in the precipitous fall of sequencing costs. We've become so used to the continued fall that I suggested, very much tongue-in-cheek, that grant funding agencies should only pay for half the sequencing requested. The cost stopped falling earlier this year and actually went up when Illumina increased their pricing.

If you take 2500 into the mix as well then per base sequencing costs look like they have jumped by almost 15%. Personally I see 2500 as another step on the road to $1000 genomes and think the real price continues to fall but I'll say no more until the end of this post and my comment probably deserves a proper post all of its own some other time.

Genome Biology genomicist smack-down: Neil Hall's article After the gold rush in Genome Biology really nails the past five to ten years of genomics research, he specifically notes that he's being provocative and a little hard on us (he's a confirmed genomicist himself) before stating "We... have been spoiled. We have been real-estate agents working in a housing boom; bankers trading in debt. We have not been made to work; worse still, there has been very little incentive to think." Ouch! But he has a point. I've seen colleagues move on from the Institute where I work to good Universities and facing up to the realities of having to 'think' very hard when resources are more limited.

Almost anyone can discover something given unlimited resources. It would be interesting to attach a £ or $ sign to every research article and then measure output in more clearly economic terms. Which high-impact papers were the best value for money?

How cost-effective is "Collaborazilla" and his/her like?

Mick Watson on his blog goes over, under and reinterprets the NHGRI graph. He plots a new graph showing how price has changed when comparing time-points on the NHGRI graph. On it he notes that apart from the introduction of the GA, at almost every other time-point there has been only a modest drop in costs, and that over time the graph shows an upward trend.

The history behind the numbers: The graph everyone points to, and which Mick reinterprets, comes from the NHGRI and it shows a pretty steep fall from mid-2007 till early-2010. These were the "Solexa-years" from their 1G, through the Illumina acquisition and GAI, GAII, GAIIx (bananas and iPAR anyone) to HiSeq. The drop in costs were brought about by real technological improvements and as a user all the way through this drop it is easy to remember the upgrades, for all the good; as well as the pain some of them inflicted on my lab!

But the drop slowed dramatically in 2010 when HiSeq came out. Genome centres traded their entire GAIIx stock and suddenly had over-supply of capacity. The new instruments spat out billions of bases and initially users had problems filling all the available lanes. Then the "big-science" projects really got started and the data has flooded out ever since. If it weren't for $1000 genome noises by Life Technologies I doubt Illumina would have given us the 600G upgrades quite as quickly, we'd more likely have seen an Apple-esque dribbling out of technology gains over the past two years.

So what does CNN say: This is a news article for a news organisation and is aimed at the general public. The author Eilene Zimmerman at CNN Money is right in pointing out how much things have changed since James Watson's genome was sequenced in 2007 for an estimated $1million, and today you can sequence (but not analyse or interpret) a genome (30x) for about $3000 or $4000. But her commentary and interviewees give an all too rosy picture and I'm not sure anyone actually looked at the NHGRI graph judging by the comment "since 2007, the cost of genome sequencing has been in free-fall". Free-fall to me means a sustained downward trajectory, the NHGRI graph is perhaps showing the moment we reach terminal velocity.

The article also covers the impact of the Ion-Torrent technology and quotes Jonathan Rothberg saying "In three months, we'll be able to do one entire human genome for $1,000", I don't know of a single PGM or Proton customer who thinks this is close to reality. PGM has effectively stopped developing past the 318 chip and the Proton is a long-way from $1000 genomes. And of course ONT gets a look in as well although Eric Topol notes they have "significant problems with accuracy". Have we found an early-access customer perhaps?

But I agree with the articles hopes for genetic testing by sequencing and that the costs of these should drop to the level where health-care providers can roll them out nationwide. The costs of NGS are still coming down and are likely to drop for longer. I guess we need to spend more of our time and money on making sure we can execute with NGS tests in the clinic.

So what does the future hold? I'll be honest with an "I don't know" to start with. I am one of those bloggers that Neil refers to, excited by the prospect of new sequencing technologies and disappointed that ONT have run into difficulties. But the future is here in all its black and white glory. I'd say its HiSeq 2500.

Disclaimer: I run a lab that uses a lot of Illumina technology, but I don't have a 2500, and I'm not expecting Illumina will give me one however much I say it is the future!

But I do think HiSeq 2500 is the way forward, or at least part of the equation, because the costs of sequencing on this instrument could be lower in real-terms than anything we've seen so far. With the new patterned flowcells only six months away data volumes are going to jump up again. And the fact that HiSeq 2500 rapid run mode generates more data than standard mode in a given time-frame means the amortisation of capital purchase and service contract costs drop and these are a huge chuck of our real costs. Oh, and let's not forget the possibility of 1 billion 1000bp reads!

PS: Neil acknowledged comments or tweets as one of the reasons for his commentary article. I'd do the same and say this has been a topic for discussion since the middle of last year when genomicists get together. I'm guessing we'll still be talking about it for another year or two.

CoreGenomics

Pages