CoreGenomics: AGBT 2012 day 2

Saturday, 18 February 2012

AGBT 2012 day 2

Rick Myers, HudsonAlpha Institute. Genetics and Epigenetics of Human Gene Regulation: Eukaryotic genome architecture is complex, cis and trans components looks like 1800 DNA binding proteins (what do they all do, can we find out before 2020). Talked about the promoter expansion in EPM1 gene as an example of where transcription is important in disease and regulatory elements play a part.

A user guide to the encyleopedia of DNA elements (ENCODE) (Paper), Rick said this is a complex experiment, now I don’t feel so bad about not understanding it as well as I think I should.

His lab is using Interactome, Transcriptome and Methylome analysis, suggested we need to integrate all three with allelic variation (genomics meets genetics)

Interactome: Protein:DNA interactions by ChIP-seq (the hardest part is getting a good antibody for the ChIP). Talked about NSRF binding as a repressor discovery through ChIP-seq impossible with other methods. His group has so far looked at 241 interactions, 80 TFs, 18 cell lines (is this more then Duncan’s group).

Tim Reddy Jay Gertz and Flo Pauli: Allele-specific occupancy to show differential occupancy at a locus (email to Kerstin). (Paper) Genome Research Feb 2012.

Transcriptome: gave a simple intro to mRNA-seq, alignment and calling geens is still complex and an evolving technology. Can be very sensitive and far more accurate. His group performed the transposase RNA-seq library preps (paper) Gertz et al: Genome Research 2012. Down to 1ng of RNA.

Methylome: WGS for methylation is expensive. Using RRBS and Methyl450 chips in the lab. Measuring differential methylation at a locus genome-wide. Can you use WGS bisulfite low coverage sequencing of ctDNA to detect cancer early on by looking for higher than expected methylation of DNA?

Breast cancer: Combining all three in a clinical trial of anti-DR5 antibody and BrCa. Methylation at 114 CpGs is associated with response to antibody therapy. Possible companion diagnostic for treatment response likelihood.

Tom Gingeras, Cold Spring Harbor Laboratory. Important Lessons from Complex Genomes: How to distil down data from very large projects, ENCODE and modENCODE. Recent stats from GENCODE are around 50,000 genes (20000 protein coding), transcripts 160,000 (76000 are protein coding). This fives about 8 transcripts per gene locus and more than half are non-coding. Geneic regions have multiple control regions and complex transcription. Transcripts are being identified at a faster rate then genes.

RNA analysis: Presenteed a slide showing the flowchart for RNA analysis from a whole cell to RNA-seq, RNA-PET and CAGE analysi of Cytoplasmic or Nuclear RNA, at > or < 200bp and Poly+ or PolyA- (for RNAs >200bp). Long RNAs - 200M reads per replicate, 15 cell lines. Short RNAs - 35M reads per replicate. IDR analysis: Irreproducibility Detection Rate Li et al 2011 (Paper). Statistics for two replicates, I know a lot of biologists that will be arguing “two is enough”! Transcription: Estimated that 80% of the genome is being transcribed (across 15 cell lines). There is pervasive and promiscuous transcription with lots of novel multi-exon transcripts and antisense transcripts. More than half of 245k unannotated single exon transcripts are intergenic and antisense (predominantly polyA-). Cells appear to transcribe all isoforms, there may be a dominant isoform present but the others can be compartmentalised within the cell. Transcript copy number can be different by orders of magnitude in different cells, showed a beautiful slide of expression level for coding and non-coding transcripts. (Paper) Landscape of transcription in Human cells submitted.

*Geoffrey Smith, Illumina: The Miseq DNA Sequencing Platform and Application in Clinical Microbiology. Geoff reintroduced some of the MiSeq upgrades, 7Gb, longer read lengths (2x250), etc. Unfortunately most of his talk was about three very interesting cases of how clinical microbiology can use NGS in the form of MiSeq. As this work is out for reveiw we were asked not to blog or tweet.

Illumna did present a poster showing a 678 bp read, the longest read generated so far on MiSeq. This was done from a PE400bp run. Human genome sequeucing is going to get a whole lot easier.

Jo Boland, NCI. Exome Sequencing on the PGM: Exome in a Day: First 318 chips in October, now running 6 PGMs, they also have 1 HiSeq2000. Single exomes on the PGM adds flexibility to the lab. If a sample drops out of a panel it needs to be repeated as quickly as possible to not delay the rest of a project. Flexibility is great.
Exomes in a day: Nimblegen Capture in production on HiSeq. LifeTech TargetSeq on PGM in “Exome in a day” service. Exome capture still takes three days, it is only the sequencing that can be done in one day. POC on CEPH/UTAH 1463 trio. 3, 5, 7 & 10GB for Mother, Father and Son at 5Gb only. Data looked very god but 3.5Gb was just under 20x coverage. Tried exome in a day pipeline on a Melanoma family. Five PGM runs and 1 Ampli-seq panel. Generated 4.8Gb. Lots of variants reported.
318 improved loading onto chips will make a difference. OneTouch 200bp chemistry. Rapid exomes on PGM foreshadows what will be possible very shortly.

*Richard Roberts, New England Biolabs, “Characterization of DNA Methyltransferase Specificities Using Single-Molecule, Real-Time DNA Sequencing”: He started by saying "can we please spend a small fraction of the money going into NGS on finding out what the genes actually do!" This got a round of applause form the audience. Everyone want to understand function.
His talk focused on methylation analysis. NEB have a strain of E coli with no methylases, they can introduce a plasmid with a specific methylase and determine the impact on the genome in isolation. They did all their sequencing with Pacific Biosciences.
Bacterial genome sequencing should be backed up with methylation sequence, possibly on PacBio.

*Clive Brown, Oxford Nanopore Technologies, “Single Molecule ‘Strand’ Sequencing Using Protein Nanopores and Scalable Electronic Devices”

See my earlier post for a clearer description of what Clive presented.

Nanopores are small holes, measure current changes and dwell time as DNA passes through a pore. ONT do not use anything already published intheir chemistry. 160 enzymes tested 20-400bps,. Srand sequencing, DNA can be fed 3-5 or 5-3’. DSamples prep puts a hairpin at the end to allow forwad and reverse sequen ing. dsDNA sample prep standard, duo mono, 15 minute prep, Capture prep modes. Fragment end repair a tail ligate adapter.

ASCI core is the heart of the system. These sensors can do 1000bps sequencing. On top f ASIC is a sensor chip, array of microwells layered with a membrane in which are nanopores. Very stable biological system. Not currently using dwell time for base calling. Actually reading 3 kmer blocks and need to determine the sequence from the series of read blocks to generate an actual sequence.

PhiX Genome: 5 and 10kb reads. 4% error rate. They have seen a full length genome read from PhiX! Moved to Lambda genome in a similar single pass read all 42KB!!! The 10k base is as good as the 40k base, data quality does not deteriorate. Errors come form “wobbly” Kmers. Error rate will be improved with pore eolution.

Sens-antisense reads combine to improve base quality, base analogues and SNPs.

Homopolymers cause the enzyme to tick?

RNA-seq with no cDNA conversion.

Epigenetic modification: MeC and hMeC have been looked at but many could be analysed.

Spiked known sample into blood and put blood directly onto chip. Can detect rabbit DNA in blood with no samples prep.

New product in 2012: MinION USB sized disposable DNA sequencer. About 150Mb per hour scalable to 1GB for use in the field. Plug into a laptop and sequence! App does the analysis on the fly. 5-25Gbp per day but limited to 6 hours run time.

GridIOn 2000 and 4000 pore versions. Throughput a fcunction of number of pores and speed of sequencing. 25-125 Gb per day.

Pricing: $500-1000 per use. $10 per Gbp Human genome in 15 minutes.