CoreGenomics: #GenSci15 day 2

Day 2
Single cell was popular, but were all crammed over in the Arts building somewhere close to Solihull. Nick forgot to organise a coach so we had to walk, I guess we're building up an appetite for the street food tonight! Bill Hanage asked if Nick could sequence some of the street food just in case - perhaps next years conference pack can come with a stool collection kit for some crowd-sourced science: "the differential impact of conference attendance on gut microbiota of PhD students, post-docs and PIs"?

Single cell genomics

Sarah Teichmann: Sanger/EBI "Understanding cellular heterogeneity"
Sarah spoke about some of the technical and bioinformatics work her group have been pioneering in single cell transcriptome analysis. She focused her talk on T-cell biology. Previously her group has used bulk RNA-seq of T cells to determine the average transcriptome signature of the GATA3 master regulator, which showed very wide distribution of transcription levels, but they had to calibrate RPKM from bulk to understand what might be going on in single cells (I think the work in Hebenstreit et al 2011 was sequenced by my lab). The lab has pioneered some of the singe cell approaches and aims to understand population heterogeneity with single cell RNA-seq, specifically scRNA-seq. Sarah described the sensitivity and specificity of protocols in their hands. They started with Fluidigm C1 instrument and are using spike-ins to try and remove technical noise (Brennecke et al Nat Meth 2013) Sarah described this as a "reasonably good approach to understand technical noise"

Sarah described the work of Valentine Svensson (PhD student): and their comparison of CEL-seq, SMRT-seq, STRT-seq, MARS-seq using analysis of the publicly available data, specifically the spike-ins, which showed several orders of magnitude in lower detection limit of 100, 10, 1, 1 transcripts per cell. Accuracy of data is very high between 0.8-1 across all platforms. As you get to lower concentration genes accuracy goes down and there is an inverse correlation of conc to sensitivity (no surprise). Sequencing depth show benefits of going up to 1M reads per sample (£2 per M reads on HiSeq 4000 SE50).

She moved on to talk about the biology they are investigating using these approaches to understand T-cell biology and the recombination of VDJ regions that produce a cellular barcode – if you can sequence it. Developing the method (TRACE-seq) to find TCR loci reads in RNA-seq data, make synthetic genome of VDJ regions, assemble TCR reads and align to identify what was present. The project has been using a model of mouse infection to see the changes in steady state = naive cells, memory cells and during infection as an expansion of effector cells occurs. The data were presented as lovely figures of large linked populations of single cells, there is clear population structure, up to ten clonal populations.

They have been using paired-end reads to help reconstruct TCR loci, but for most differential gene expression applications this is likely to be sever overkill and I'd bet that more shorter reads, and/or more cells would be a better use fo your budgets. It is still not clear what the right experimental design is for single-cell.

David Starns: spoke about his work using single cell genomics to understand the termite gut.

Henry Noyes: described a direct haplotype sequencing approach from single cells. Many of the current methods require customised equipment and he has been aiming to devise a simple method.

Tim Bonnert Associate Director, Applied Advanced Genomics at QIAGEN: described qiagen's informatics ecosystem including IPA ingenuity for causal analytics, and CLC workbench.

You can lock-down analysis and create SOPs for bioinfomaticians!
He aslo described their Microbial genomics tools and the Genome finishing module: showed PacBio assembly vs HGAP E.coli (1 SMRT cell) on a laptop in 13 mins, S.cerevisae 2hr, (11 SMRT cell) C.elegans 5 hours

Matt Loose, University of Nottingham: "Running and Reading in Real Time: Looking at Squiggles on the Oxford Nanopore minion"
Minotaur tool for MinION run QC is a collection of scripts and a website that can be installed locally to manage, analyze and manipulate reads generated by the Oxford Nanopore minION technology. MinION and minoTaur are both real-time – demonstrated MinION control tools to connect to minKNOW, change voltages, rename run, start new run and choose from run scripts.

Why does minoTaur help you? Set coverage depth and auto finish the run (might be good if you're paying for run until), set for individual barcodes, Notifications via Twitter, detect specific variants! MinION to Metrichor and back again for analysis. But if you have no access to net then no basecalls but you can analyse squiggles in minoTaur: look at raw template and raw complement, raw 2D gets generated from raw alignments. Not as clean as basecalled data (will we find a use for raw reads, this is almost the opposite to Illumina where we've moved further and further away from raw data).

He also showed a neat use of the read-until workflow and normalisation of reads across ebola amplicons so they only send back useful reads form a run! Who needs to normalise amplicons anymore!

Tom Connor, University of Cardiff: Described the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) network: a one stop shop for microbial bioinformatics. Provide a set of cloud images to implement key pipelines. Storage of data and sharing. Provide a place for orphan databases.

Richard Leggett, TGAC: NanoOK is flexible, multi-reference software for pre and post-alignment analysis of Nanopore sequencing data, quality and error profiles
NanoOK software allows you to go from Metrichor to comprehensive PDF analysis report.

Simply: Nanook extract, nook align, nanook analyse.

It uses multiple alignment tools Blast BWM-MEM BLASR MarginAlign. Performs K-mer analysis for over- and under-represented sequences, as well as error motif analysis

Richard introduced MARC and described examples of uinsg NanoOK for this: 5 labs looking at MinION consistency publication coming soon.
Wide variation in yield 10000-30000 reads, comparison of chemistry's and flowcells. Variability of aligned data, some contamination (where from in a pre-prepared sample)
Read identify %age looks good
Best perfect k-mer pretty stable at 100bp.
NanoOK available on Docker

Jukka Corander, Professor, Faculty of Mathematics and Natural Sciences, University of Helsinki "Being the clairvoyant: Sequence Element Enrichment (SEER) analysis for deciphering genetic basis of bacterial phenotypes"

How to tell silver from gold Staphlococcus? Causal mechanism is known but if you don't know how can you use genomics to investigate? Even in 10,000s of genomes? Identify sequence elements, estimate pop structure, find enriched signals, map elements to pan genome, understand biology, be amazed! Applied approach to 15 datasets some are truly amazing, some of Jukka's presentation will be unpublished work.

Scan assembled contigs for k-mers from 10-100bp. Are using DSM algorithm (Seth et al 2014). Look to see if phenotype is associated with genotype.
SEER pipieline published Weiner et al Nat Comm 2015 (a more efficient C implementation is on the way). This is scalable to tens or even hundreds of thousands of genomes

Introduced by Nick as "A double act" - Josh Quick and Lauren Cowley talk about the setup of a lab for sequencing of ebola. Matt Loose refused to come as he's seen the talk too many times!

Josh Quick, Bioinformatician, School of Biosciences, University of Birmingham: Portable Nanopore sequencing for pathogen sequencing
The lab and getting going! MinION is realtime, portable and gives long reads; can we use this to investigate the Ebola outbreak in West Africa? They were trying two approaches RNA-seq and Amplicons, but the talk focused on apmplicons from RT-PCR. Working with Miles Carroll on European Mobile Labs. Tested amplicon methods at Porton Down with archived Ebola blood extracts and settled on a wt of primers generating 11 PCR amplicons averaging 2Kb across genome.

A big list of stuff to take room temp, cold and frozen. All equipment needed went with them in a hardcase, incl PCs and MinIONs. Used USB temperature data loggers to verify if reagents were likely t be OK. Josh's lab fits on an empty table in the corner – up and running pretty quickly! Within 48 hours data was being sent back to Nick in Birmingham. UPS was required for PCR machine, laptop battery works as UPS for sequencer. They had to setup local 3G hotspot using different tech as Satellite link is $1Mb! Yikes.

They saw variation in amplicon read depths – Matt Looses minoTaur sounds like the perfect fix (see above)! But got 40-70% aligned 2D passsing rates about 10% error rate and have now performed around 125 Ebola genomes runs. Significant variation in reads per minute over a large number of flowcells (batch of adapters had a significant effect), also in reads passing filter and numbers of reads. Slightly arbitrary as runs were stopped once "enough" data was generated.

Final protocol is RT-PCR, QT and pool, library prep, MinION, all in about 1 day; then onto analysis (MarginAlign, Nonoploish/eventalign) in about 1 hour.

Lauren Cowley, PhD student, Public Health England: Sequencing Ebola in the field: a tale of nanopores, mosquitoes and whatsapp
The sequencing! Moved sequencing facility about 2-4 hours into the field. Deployed June 2nd to July 10th to provide real time sequencing of new cases. Lab in a small portakabin. Working with epidemiologists to try and put patients into known clusters or new ones. Data coming back so quickly was useful but the big impact is where no known patient contacts are present – sequencing can help identify if this is an already known infection or not. See the data at ebola.nextflu.org.

Logistics is the tough bit just receiving RNA from other labs was difficult – No M6 in Guinea - but the traffic can't be worse than the A38 at 8:15!

Commercial presentations: Thanks to the commercial sponsors, I'm sure their cash really does help keep the conference alive. I spoke to several vendors and they were pretty happy with turnout - but all agreed we could spend a bit more time talking to them! Sign up for the competitions and grab some freebies!

KAPA Biosystems: Dr Herbert van den Berg "Evolved enzymes for NGS library preparation"
An overview of their directed evolution strategies to produce better enzyzmes. It certainly seems to be working, KAPA kits have become more and more widely used. Mike Quail at Sanger showed that Kapa HiFi has less bias in varying GC content genomes see Oyola et al paper BMC Genomics 2012
KAPA Hyper + include low bais fragmentation, removes the need for Covaris.
KAPA RNA-seq: stranded RNA-seq kit, 2nd strand uracil incorporation

New England Biolabs Lynne Apone "Novel Solutions for Challenging NGS Samples"
NEB Next Ultra II kits described some of the recent work in improving library pre chemistry significantly. Also described FFPE-repair mix and the impact on increasing DNA NGS quality without affecting bias; "do no harm".

Sorry to the other commercial presenters - but my laptop died and I did not get a chnace to take any notes...Beckman Coulter, Hamilton Robotics, Thermo Fisher, Roche, Cambridge Biosciences

CoreGenomics

Pages

Tuesday, 8 September 2015

#GenSci15 day 2

No comments:

Post a Comment