CoreGenomics: July 2015

Saturday, 25 July 2015

PhD done...time for a holiday

Core Genomics is now on holiday for two weeks after finally graduating from my PhD yesterday, 20 years to the date I finished my BSc, better late than never! Back in mid-August folks.

Thursday, 23 July 2015

Cell-free DNA trisomy 21 tests kick ass

NIPT for Down's Syndrome and other chromosomal abnormalities is taking off. A colleague of mine recently had an Ariosa test, paid for privately, and reported real satisfaction with the process. Lin Chitty at UCL Institute of Child Health and Great Ormond Street Hospital recently reported on a model-based analysis of NHS costs and outcomes of NIPT for Down's syndrome. This suggested that NIPT was cost effective if offered at around £50 per test (compare this to the £500-£1000 privately). NIPT is not my area of expertise but I've been watching it as technological developments have often been a little in advance of cell-free DNA work in cancer.

Possibly millions of tests have now been performed. NIPT is being rolled out to patients across the globe at an amazing rate compared to the introduction of other diagnostic tests, and the NHS is getting in on the game. The number of companies offering tests is growing and so are the litigation's. Most recently Illumina filed a new patent infringement suit against Ariosa claiming their Harmony NIPT test infringes a patent for “Multiplex Nucleic Acid Reactions” (one of the patent holders is ex-Illumina, and was an author on the Ariosa paper discussed below). NIPT commonly tests for trisomy 21 (Down’s Syndrome), trisomy 18 (Edwards’ Syndrome) and trisomy 13 (Patau’s Syndrome) and most tests are NGS based, Ariosa's test is array based. You can get an NGS-based NIPT test from ThisIsMy for just £320, tests in North America are as low as £200. Tests are available from: Ariosa Harmony, BGI NIFTY, Genesis Genetics Serenity (Verifi), Illumina Verifi, Natera Panorama, Premaitha IONA, Sequenom MATERNIT21.

What do you want to be when you grow up?

The MRC have a nice career mapping tool: Interactive career framework which allows biomedical researchers to navigate through different options to see how they might get where they want to.

I'd like to think of myself as a technology Specialist Director: "an individual with technical expertise / specialist skills useful beyond their own specific group" - what are you?

Wednesday, 15 July 2015

How should I store my NGS data: disc, tape or tube

Genomics has recently been singled out as one of the largest data headaches we face. As we move to sequencing people multiple times, start newborn genome sequencing programs and increase our use of consumer genomics the amount of data goes up and up. Our GA1 generated 1Gb of data in about 11 days. Today our HiSeq 2500 puts out 1TB in 6.

We're currently storing our data on disc for up to six months. After this we either delete it or archive it onto tape (although Ive no idea if we ever try to get it back off the tapes). A while back people used to talk about the storage being more expensive that a rerun, and I wonder if we are getting even closer to that point, especially if you try to grab the data off a tape in a secure storage facility.

I've always liked the idea of storing libraries and we have all 10,000 that we've run safely stored at -80C. These tubes take minimum space and most could be rerun today at a fraction of the cost from a few years ago. I am now wondering if we should go for an even greener solution and start the long term storage on Whatman cards (available from CamLab and others). A small number of cards could store almost everything we've ever run!

Is anyone doing this?

Tuesday, 14 July 2015

An example of how fast NGS develops

Illumina have discontinued the version 1 of the NextSeq chemistry. Launched in January of last year the NextSeq was a revolutionary new sequencer, although not everyone was an immediate fan. The V2 chemistry was launched just before AGBT and the data certainly looked a lot closer to the quality we expected from the longer-lived 4-colour SBS chemistry. The V1 discontinuation notice arrived in my InBox today, just 18 months after the NextSeq launch.

That's not much longer than the shelf-life of a kit!

Monday, 13 July 2015

Your genome for under £2000

Illumina have a new offer on their Understand Your Genome (UYG) program that means you can get your genome sequenced, analysed and clinically interpreted for under £2000.

Interested? Then there are a few requirements, mainly that you give informed consent and get a doctors prescription for the test. Your DNA is sent to Illumina's own Clinical Services Laboratory, CLIA-certified since 2009. The results will be reported to you at the first day of the ASHG meeting in Baltimore. Samples need to be with Illumina by July 31st giving them 67 days for sequencing and analysis.

You'll get back results on 12 genes important in pharmacogenomics, and hundreds of genes implicated in human disease. However you'll need to discuss any "medically significant results" with your GP, and you can ask not to receive some data back.

Sounds like a pretty good bargain given you'd need to sequence 50+ genomes to get close the $1000 genome from an X Ten provider. I'm not sure if you'll find out how much Neanderthal you're carrying around?
PS: If anyone fancies crowd-sourcing a Hadfieldome drop me a line, or my PayPal account is...

Thursday, 9 July 2015

Exciting developments in Pancreatic Cancer

A paper just published in Nature Communications describes a molecular analysis of Pancreatic Cancer by tumour exome and ctDNA targeted sequencing. The results showed enrichment of mutations in known PaCa associated genes, and identified clinically actionable mutations in over 1/3rd of patients.

MinION for 16S rRNA-seq

Researchers in the group of Yolanda Sanz in Spain deposited a preliminary MinION study describing the bioRxiv of 16S rDNA amplicon sequencing from a mock microbial community composed of genomic DNA from 20 different bacterial species (BEI Resources).

Experimental workflow: 1.5kb amplicons were generated from 16S rRNA gene sequences for 20 different species present in the mock community using a universal PCR. Amplicon library prep was performed using NEBNext End Repair Module and NEBNextdA-tailing module to prepare blunt-end amplicons for adapter ligation. Sequencing was on an R7.3 flowcell with a 48 hour run with an additional library loading at 12 hours. Read QC and conversion to FASTA was performed with poretools and poRe. And they also discussed the "What's In My Pot" Metrichor workflow demonstrated at ASHG, showing that MinION can deliver real-time analysis making the "run until" mode an attractive one for some applications.

Results: Most reads were around the size of the amplicons (median 1.1kb) but they did see some very long reads (max 50kb) speculated that these were amplicon:amplicon ligation products but they were not able to align these. In the reported analysis they filtered the MinION reads to keep only those that were 100bp-2kb, retaining 97% of data (3,297 reads). The 2D reads were discarded due to having "a detrimental effect of 2d reads in the quality of assembled sequences"; given that others are reporting that 2D reads are what we should be aiming for I'd have liked some explanation as to why these were so bad.

They were able to reconstruct more than 90% of 16S rRNA gene sequences for all organisms included in the mock community. However quantitation data was less convincing and the data in figure 1 do make me wonder how applicable this method/tech might be to quantitative or longitudinal analysis. The mock community contained equimolar ratios of 20 rRNAs yet results varied by up to 100 fold, although the authors only considered a coverage bias was present if it was more or less than 10 fold from the expected value. I have no idea what allele frequency is usually tolerated in these kind of experiments.

MinION currently generates a relatively low per-base sequencing accuracy but the additional read length here helps in resolving 16S rRNA to a species level. A nice dynamic figure would be an evolutionary tree of species showing how resolution changes with accuracy and length would have been great!

MinION methods are developing rapidly and I wonder how long will it be before full length 16S rRNA is reported?

Saturday, 4 July 2015

The DarXer Side of publishing on the arXiv

The use of the pre-print servers like the original arXiv and bioRxiv appears to be growing among some of the groups I follow. You've only got to read Jeff Leeks post about this and their Ballgown paper (published at NatBiotech) or Nick Loman's or Casey Bergman's 2012 blog posts to see why. Ease of reporting new results, a good way to share preliminary data, a marker for 1st to publish, etc are all good points; but posting on the arXiv is not the same as publishing in a peer-reviewed journal (this post is not about the pro's and con's of peer-review) and I hope everyone would accept that? And in Nature Jan Conrad at Stockholm University writes a commentary on arXiv's darker-side, his focus is very Physics heavy but this is unsurprising give the birth of the arXiv in Physical sciences.

arXiv (biological science submissions expanded)

What is the arXiv: arXiv was born in 1991 as a central repository for scientific manuscripts in the TeX format (LaTeX etc) with a strong focus on physics and maths. Listings are in order of posting. There is no peer review, although according to Wikipedia "a majority of the e-prints are also submitted to journals for publication" (although they don't say how many of these are rejected). arXiv is pronounced "archive", as if the "X" were the Greek letter Chi, χ).

Who publishes on the arXiv: Lots of people, but mainly physicists ^{(see "Where do biologists go below)}! The 1 millionth post happened in 2014 and there are currently over 8,000 new posts per month. The figure above on the right shows how small the number of biological submissions there are though - about 1.6% of the monthly total (yellow is Quantitative Biology). On the left you can see a breakdown of submissions by biological sub-category (dark blue is genomics).

Where do biologists go: The bioRxiv was set up for preprints in the life sciences in late 2013 and is intended to complement the original arXiv (it has been covered here, here, here, here). It is grouped into multiple subdisciplines, including genomics, cancer biology and bioinformatics. Papers get digital object identifiers (DOIs) so you can cite them, and are papers are submitted as New Results, Confirmatory Results, or Contradictory Results.

I could not find bioRxiv usage stats such as those in the image above. Almost 1600 papers have been submitted since the start. 30% of the papers are in genomics or bioinformatics. Pathologists are conservative folks which might explain why there are only 4 papers in this category - although I'd not have read this HER2 paper if I'd not written this post!

The darker side of the arXiv: Prof Conrads commentary is driven by a slew of major 'discoveries' in his field, many of which are turning out to be false alarms. The worrying part of his article is that it appears some of the authors of these pieces had enough awareness of other data that disproved their theories but chose to 'publish' regardless, and they also followed up with big press releases raising their profile. This could have a negative impact on science funding and on public perception of science, especially if the big news stories get shot down in flames.

He suggests that "online publishing of draft papers without proper refereeing have eroded traditional standards for making extraordinary claims". To do this he references a recent arXiv paper reporting discovery of dark-matter but using data that were preliminary and suggestive, rather than final and conclusive. The same day saw a second paper that refuted this claim using the same data but a more sensitive analysis using an upgraded software. The crazy things was that the first paper acknowledged this upgrade was coming but did not wait to 'publish' on the arXiv and make their mark. This story was widely reported, but with coverage focusing on the first claim, not the later refutation.

I wrote this post in response to a Tweet with this quote "Journals should discourage the referencing of arXiv papers." I think the article is a balanced one and contains important messages beyond the quote picked up on Twitter.

It is interesting to speculate about who will scrutinise the bioRxiv. The great Retraction Watch blog is unlikely to be able to keep up if the bioRxiv grows as quickly as its big brother. But bioRxiv papers need to be watched and it'll be interesting to see if the community moderation is effective.

Thursday, 2 July 2015

Does survival of the fittest apply to bioinformatics tools?

What do 48 replicates tell you about RNA-seq DGE analysis methods: that two the most widely‐used of the tools DESeq and edgeR are probably the best tools for the job*. These two tools also top the rankings of RNA-seq methods as assessed by citations with 1204 and 822 each. These are conclusions in probably the most highly replicated RNA-seq study to date**. The authors aimed to identify the correct number of replicates to use and concluded that we should be using ~6 replicates for standard RNA-seq, and we should consider increasing this to ~12 when identifying DGE irrespective of fold‐change.

CoreGenomics

Pages