Next-Generation DNA Sequencing Informatics (2nd edition), edited by Stuart M. Brown (blogger) is a book to get you well and truly started in NGS Bioinformatics. The twelve chapters cover QC, sequence alignment, assembly, transcriptome and ChIP-seq analysis, visualisation, and much else besides. For this review I read the chapters on QC, RNA-seq and emerging technologies and applications.
Introduction: The introductory chapter has a very good history of Sanger sequencing. It discusses the NGS revolution, although I would not personally describe this as "currently under way" more like a pumpkin coloured tidal wave has washed over the world, and we're all waiting to see what happens next! The main technologies are explained in some detail, and the reviews are pretty much bang up to date. However 454 appears to be alive and kicking in this introduction - perhaps my bias as an Illumina user clouds my vision of the NGS universe but most of the people I know with 454 sequencers turned them off a long time ago; and the inclusion of SOLiD is surely a historical piece? Stuart argues that Bioinformaticians cannot rely on everything being open-source or peer-reviewed, and he points to some of the technical artefacts that informaticians need to understand before they simply dive in a start making differential gene expression calls, or genome alignments. Only a small number of NGS methods are listed in this introduction, some are covered in the following chapters; but with so many methods on offer no book can hope to cover them all.
QC: The importance of quality control is stressed. High quality basecalls are not the only thing to consider and issues such as demultiplexing and tools such as FastQC are discussed; while an understanding that QC is often context dependent is stressed. The wonderful MGA tool makes it in to this chapter..but not into the useful QC SOP at the end :-(
RNA-seq: The main methods for RNA-seq are briefly described but are a little out of date from a wet-lab perspective, however this book is not a wetlab manual.
A discussion on sequencing depth and replication is worth reading although I feel it errs on the side of too many reads and there is no discussion on the use of single- versus paired-reads. The chapter appears to recommend not pushing much past 20M reads for DGE, but I think this could be more clearly stated. It is easy to request an extra 10 or 20M reads when you've not got to generate them. The discussion on replication presents several papers that have looked at Power in RNA-seq studies. This could make for alarming reading for some users, but three replicates really is not enough, and RNA-seq users should be, and are, encouraged to run more replicates over choosing higher read-depth.
For bioinformaticians considering RNA-seq analysis the chapter covers pre- and post-alignment QC of RNA-seq data (including Picard CollectRnaSeqMetrics & RNA-SeQC); and also covers differences in the alignment methods. Two quanitification tools, Cufflinks and HTSeq are discussed; and eight DGE methods (CuffDiff, edgeR, DESeq, baySeq, PoissonSeq, NOIseq, SAMseq, limma) are described, although there is minimal discussion about how to choose between them. Methods for alternative splicing analysis are also discussed. Lastly tools like DAVID and GSEA are introduced for downstream analysis of RNA-seq experiments. If readers want to learn more they are directed to Galaxy tutorials for RNA-seq.
If the other method-focussed chapters are as good as this one then the book is well worth having on the shelf for an easy review of what to consider when a new project comes your way.
Emerging technologies and applications: Of the chapters I read this was the one that I was most excited to get to, and unfortunately most disappointed by. The discussion of nanopore technologies cannot have been written in the last 12 months, although the firewalled MinION user forum does not help with dissemination of knowledge outside the MAP. Illumina's X Ten is only included in the concluding paragraphs, and nothing is mentioned of population genomics initiatives like the Genomics England 100,000 genomes. They do discuss the possibilities around direct sequencing of 5mC and 5hmC; the combination of long-read and short-read technologies to produce better de novo genome assemblies; and the use of synthetic long-reads.
Summary: As a wetlab scientist with very little bioinformatics experience the chapters appear to flow well internally and cover the main aspects of the methods presented. Certainly the QC and RNA-seq chapters will point students or new bioinformaticians in the right direction. As someone whose job it is to stay up to date with genomics technologies I was a little disappointed with the final chapter, but a textbook is out of date way before it is published, and the buzz around emerging genomics technologies is increasingly felt outside of, or ahead of the traditional route of academic publication.
Available June 30th and costing $60 on Amazon this book is likely to be a good one to have on your shelf.