Friday 31 January 2014

A new tool for NGS run QC

We just published a paper describing the tool we use in my lab as the primary QC for all lanes of sequencing, we've used this tool on about 5000-6000 samples over th past 3-4 years.

In designing the tool we aimed to produce something that gave us a fast, and computationally inexpensive, visual presentation of quality and yield. The tool is alignment-based and works on a 100,000 read sample of sequences and base quality scores extracted from a FASTQ file. Whilst we run MGA at the lane level it could be run on any number of FASTQ files e.g. every library in a multiplexed pool to determine the quality/contamination of each
I'm hoping the next step will be to release this as an App on BaseSpace.

How does MGA work: The MGA tool primarily displays yield, as counts of reads (clusters) and quality (error rate) with additional information to help identify lanes/libraries that may need additional troubleshooting. In figure B from the paper below we presented a HiSeq 2000 flowcell with four good lanes and four not-so good. Before I describe the results can you guess which lanes might have problems?

Figure B from Hadfield & Eldridge 2014

The figure is meant to represent a flowcell with 8 lanes, and will display a single lane for a MiSeq run, or 4 lanes on a NextSeq 500!

Lanes 1-4 generated between 165-200M reads each. The green bar represents the genome of interest and error rate, these lanes are almost entirely Human as expected with very little "contamination" from other genomes (as repressented by yellow) or unmapped sequences (as represented by white).

Lanes 5-8 generated only 90-140M reads. There's lots of unmapped sequence (white) and also lots of adapter contamination (as represented by purple). Additionally lane 8 is almost entirely unmapped with almost the same number of reads coming from contaminants as from the genome of interest.

How do you get MGA: The paper was only published a couple of days ago but you can read it in provisional format at Frontiers in Genetics. You can also grab the software from https://github.com/crukci-bioinformatics/MGA.

Friday 24 January 2014

Who won the Core Genomics Christmas Competition?

Congratulations to Tatiana Borodina who has won the Core Genomics Christmas competition and a free MiSeq run courtesy of Illumina. Special mentions to runners up Tim Forshew (Cambridge Institute) and Charles Warden (City of Hope) who were very close to Tatiana. And also to Thomas Rio Frio (Institut Curie) who was the only person to get all 6 baubles right.

Thursday 23 January 2014

MinION is almost here...

Breaking news: AGBT Feb 14th: David Jaffe from the Broad Institute is presenting a talk titled: “Assembly of Bacterial Genomes Using Long Nanopore Reads”. My assumption is that this is ONT, it may not be but see you there anyway.

Monday 20 January 2014

HiSeq NextSeq 500

Onto the next new instrument from Illumina, the NextSeq 500, I've given a brief overview but covered the new chemistry in an earlier post.

Illumina helpfully provide the NextSeq 500 User guide on their website. There is a lot of interesting reading in it and I’ve picked out a few of my favourite bits. The datasheet also describes “streamlined Illumina sample preparation kits”, I’ve not looked to see what changes have been made yet but Keith Robison over at Omics Omics has a post which describes the new NeoPrep system from Illumina. This is Illumina's incarnation of the Advanced Liquid Logic technology they purchased last year; expect 16 DNA or RNA library preps per run using TruSeq and/or Nextera assays.

Friday 17 January 2014

NextSeq 500's new chemistry described

NextSeq 500 uses a two-colour chemistry rather than the original four-colours. This makes a massive difference to the complexity of producing reagents, the instrumentation and the computation; all are effectively reduced by a factor of two. So how does it work?

Thursday 16 January 2014

HiSeq X Ten: only Human Genomes?

Today I thought I'd give my first impressions on what HiSeq X Ten might mean, the dust has certainly not settled and yesterday there was lots of buzz about the new instruments from Illumina (see the bottom of this post for a round up of news). I’ve been reading the HiSeq X Ten datasheet and a couple of things jump out as significant changes: run speed and output per flowcell.

Update: I'd also point readers to this post by Shawn Baker (CSO at AllSeq)  where he points out that Illumina are using a four year lifespan to get the $1000 genomes. We managed to keep our GAIIs for about 4 years, and HiSeq was only 2 when 2500 was announced. Hopefully X Ten can go the distance?

Wednesday 15 January 2014

Illumina's christmas presents

JF did not disappoint at JPM: Jay Flatley announced both the HiSeq X Ten and NextSeq 500, two more instruments to add to HiSeq and MiSeq. HiSeq X Ten comes as ten “ultra-high throughput” sequencing instruments that together can generate 18Tb in three days, with 6B clusters per run! NextSeq 500 could be billed “the HiSeq in a MiSeq” and will complete a 300 cycle run in 30 hours, available now for $250,000. Both systems, which make use of flowcell innovations and major changes in SBS chemistry discussed below were covered in quite some detail by GenomeWeb.