Friday, 9 December 2016

10X Genomics updates

We had a seminar form 10X Genomics today to present some of the most recent updates on their systems and chemistry. The new chemistry for single-cell gene expression and the release of a specific single-cell controller show how much effort 10X have placed on single-cell analysis as a driver for the company. Phasing is looking very much the poor cousin right now, but still represents an important method to understand genome organisation, regulation and epigenetics.



Single cell 3'mRNA-seq V2:  the most important update from my perspective was that 10X libraries can now be run on HiSeq 4000, rather than just 2500 and NextSeq. This means we can run these alongside our standard sequencing (albeit with a slightly weird run-type).

The new chemistry offers improved sensitivity to detect more genes per cell, improved sensitivity to detect more transcripts per cell, an updated Cell Ranger 1.2 analysis pipeline, and compatibility with all Illumina sequencers - sequencing is still paired-end but read 1 = 26bp for 10X barcode and UMI, Index 1 is the sample barcode, read 2 = the cDNA reading back to the polyA tail.

It is really important in all the single-cell systems to carefully prepare and count cells before starting. You MUST have a single-cell suspension and load 100-2000 cells per microlitre in a volume of 33.8ul. This means counting cells is going to be very important as the concentration loaded affects the number of cells ultimately sequenced, and also the doublet rate. Counting cells can be highly variable; 10X recommend using a haemocytometer or a Life Tech Countess. Adherent cells need to be trypsinsed and filtered using a Flowmi cell strainer or similar. Dead cells, and/or lysed cells, can confuse analysis by leaching RNA into the cell suspension - it may be possible to detect this by monitoring the level of background transcription across cell barcodes. The interpretation of QC plots provided by 10X is likely to be very important but there are not many examples of these plots out there yet so users need to talk to each other.

There is a reported doublet rate per 1000 cells of 0.8%, which keeps 10X at the low end of doublet rates on single-cell systems. However it is still not clear exactly what the impact is of this on the different types of experiment we're being asked to help with. I suspect we'll see more publications on the impact of doublet rate, and analysis tools to detect and fix theses problems.

The sequencing per cell is very much dependant on what your question is. 10X recommend 50,000 reads per cell, which should detect 1200 transcripts in BMCs, or 6000 in HEK293 cells. It is not completely clear how much additional depth will increase genes detected before you reach saturation, but it is not worth going much past 150,000 reads per cell.

1 million single-cells: 10X also presented a 3D tSNE plot of the recently released 1 million cell experiment. This was an analysis of E18 mouse cortex, hippocampus, and ventricular zone. The 1 million single-cells were processed as 136 libraries across 17 Chromium chips, and 4 HiSeq 4000 flowcells. This work was completed by one person in one week - it is amazing to think how quickly single-cell experiments have grown from 100s to 1000s of cells, and become so simple to do.

Additional sequencing underway to reach ~20,000 reads per cell. All raw and processed data will be released without restrictions.

The number of cells required to detect a population is still something that people are working on. The 1 million cell dataset is probably going to help the community by delivering a rich dataset that users can analyse and test new computational methods on.

What's next from 10X: A new assay coming in Spring 2017 is for Single Cell V(D)J sequencing, enabling high-definition immune cell profiling.


The seminar was well attended showing how much interest there is in single-cell methods. Questions during and after the seminar included the costs of running single-cell experiments, the use of spike-ins (e.g. ERCC, SIRV, Sequins), working with nuclei, etc.

In answering the question about working with nuclei 10X said "we tried and it is quite difficult"...the main difficulty was the lysis of single-nuclei in the gel droplets. Whilst we might not be able to get it at single-cell resolution, this difficulty in lysing the nucleus rather than the cell might possibly be a way to measure and compare nuclear versus cytoplasmic transcripts.

Thursday, 17 November 2016

MinION: 500kb reads and counting

A couple of Tweets today point to the amazing lengths Oxford Nanopores MinION sequencer is capable of generating - over 400kb!

Dominik Handler Tweeted a plot showing read distribution from a run . In replies following the Tweet he describes the DNA handling as involving "no tricks, just very careful DNA isolation and no, really no pipetting (ok 2x pipetting required)".


and Martin Smith Tweeted an even longer read, almost 500kb in length...


Exactly how easily we'll all see similar read lengths is unclear, but it is going to be hugely dependant on the sample and probably having "green fingers" as well.

Here's Dominics gel...


Wednesday, 9 November 2016

Unintended consequences of NGS-base NIPT?

The UK recently approved an NIPT test to screen high risk pregnancies for foetal trisomy 21, 13, or 18 after the current primary screening test, and in place of amniocentesis (following on from the results of the RAPID study). I am 100% in favour of this kind of testing and 100% in favour of individuals, or couples, making the choice of what to do with the results. But what are the consequences of this kind of testing and where do we go in a world where cfDNA foetal genomes are possible?


I decided to write this post after watching "A world Without Downs", a documentary on BBC2 that was presented by Sally Phillips (of Bridget Jones fame), mother to Olly who has Down's syndrome. She presented a program where the case for the test was made (just), but the programme was very clearly pro-Down's. Although not quite to the point of being anti-choice.

Friday, 21 October 2016

Does the world have too many HiSeq X Tens?

Illumina stock dropped 25% after a hammering by the stock market with their recent announcements that Q3 revenues would be 3.4% lower than expected at just $607 million. This makes Illumina a much more attractive acquisition (although I doubt this summers rumours of a Thermo bid had any substance), and also makes a lot of people ask the question "why?"

The reasons given for the shortfall were "a larger than anticipated year-over-year decline in high-throughput sequencing instruments" i.e. Illumina sold fewer sequencers than it expected to. It is difficult to turn these revenue figures and statements into the number of HiSeq 2500's, 4000's or X's that Illumina missed it's internal forecasts by, but according to Francis de Souza Illumina "closed one less X deal than anticipated" - although he did not say if this was an X5, X10 or X30! Perhaps more telling was that de Souza was quoted saying that "[Illumina was not counting on a continuing increase in new sequencer sales]"...so is the market full to bursting?



Controlling for bisulfite conversion efficiency with a 1% Lamda spike-in

The use of DNA methylation analysis by NGS has become a standard tool in many labs. In a project design discussion we had today somebody mentioned the use of a control for bisulfite conversion efficiency that I'd missed, as its such a simple one I thought I'd briefly mention it here. In their PLoS Genet 2013 paper, Shirane et al from Kyushu University spiked-in unmethylated lambda phage DNA (Promega) to control for, and check, the C/T conversion rate was greater than 99%.






The bisulfite conversion of cytosine bases to uracils, by deamination of unmethylated cytosine (as shown above) is the gold standard for methylation analysis.

Monday, 17 October 2016

SIRVs: RNA-seq controls from @Lexogen

This article was commissioned by Lexogen GmbH.

My lab has been performing RNA-seq for many years, and is currently building new services around single-cell RNA-seq. Fluidigm’s C1, academic efforts such as Drop-seq and inDrop, and commercial platforms from 10X Genomics, Dolomite Bio, Wafergen, Illumina/BioRad, RainDance and others makes establishing the technology in your lab relatively simple. However the data being generated can be difficult to analyse and so we’ve been looking carefully at the controls we use, or should be using, for single-cell, and standard, RNA-seq experiments. The three platforms I’m considering are the Lexogen SIRVs (Spike-In RNA Variants), or SEQUINs, or ERCC 2.0 (External RNA Controls Consortium) controls. All are based on synthetically produced RNAs that aim to mimic complexities of the transcriptome: Lexogen’s SIRVs are the only controls that are currently available commercially; ERCC 2.0 is a developing standard (Lexogen is one of the groups contributing to the discussion), and SEQUINs for RNA and DNA were only recently published in Nature Methods.

You can win a free lane of HiSeq 2500 sequencing of your own RNA-seq libraries (with SIRVs of course) by applying for the Lexogen Research Award


Lexogen’s SIRVs are probably the most complex controls available on the market today as they are designed to assess alternative splicing, alternative transcription start and end sites, overlapping genes, and antisense transcription. They consist of seven artificial genes in-vitro transcribed as multiple (6-18) isoforms to generate a total of 69 transcripts. Each has a 5’triphosphate and a 30nt poly(A)-tail, enabling both mRNA-Seq and TotalRNA-seq methods. Transcripts vary from 191 to 2528nt long and have variable (30-50%) GC-content.



Want to know more: Lexogen are hosting a webinar to describe SIRVs in more detail on October 19th: Controlling RNA-seq experiments using spike-in RNA variants. They have also uploaded a manuscript to BioRxiv that describes the evaluation of SIRVs and provides links to the underlying RNA-Seq data. As a Bioinformatician you might want to download this data set and evaluate the SIRV reads yourself. Or read about how SIRVs are being used in single-cell RNA seq in the latest paper from Sarah Teichmann’s group at EBI/Sanger.



Before diving into a more in-depth description of the Lexogen SIRVs, and how we might be using them in our standard and/or single-cell RNA-seq studies, I thought I’d start with a bit of a historical overview of how RNA controls came about...and that means going back to the days when microarrays were the tool of choice and NGS had yet to be invented!

Friday, 14 October 2016

Batch effects in scRNA-seq: to E or not to E(RCC spike-in)

At the recent Wellcome Trust conference on Single Cell Genomics (Twitter #scgen16) there was a great talk (her slides are online) from Stephanie Hicks in the @irrizarry group (Department of Biostatistics and Computational Biology at Dana-Farber Cancer Institute). Stephanie was talking about the recent work she's been doing looking at batch effects in single-cell data, all of which you can read about in her paper is on the BioRxiv: On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. You can also read about this paper over at NExtGenSeek.

Adapted from Figure 1 in Hicks et al.

Tuesday, 11 October 2016

Clinical trials using ctDNA

DeciBio have a great interactive Tableau dashboard which you can use to browse and filter their analysis of 97 “laboratory biomarker analysis” ImmunOncolgy clinical trials; see: Diagnostic Biomarkers for Cancer Immunotherapy – Moving Beyond PD-L1. The raw data comes from ClinicalTrials.gov where you can specify a "ctDNA" search and get back 50 trials, 40 of which are open.

Two of these trails are happening in the UK. Investigators at The Royal Marsden are looking to measure the presence or absence of ctDNA post CRT in EMVI-positive rectal cancer. And Astra Zeneca are looking for ctDNA as a secondary outcome to obtain a preliminary assessment of safety and efficacy of AZD0156 and its activity in tumours by evaluation of the total amount of ctDNA.

You can also specify your own search terms and get back lists of trials from  OpenTrials which went live very recently. The Marsden's ctDNA trials above is currently listed.

You can use the DeciBio dashboard on their site. In the example below I filtered for trials using ctDNA analysis and came up with 7 results:

  1. Dabrafenib and Trametinib Followed by Ipilimumab and Nivolumab or Ipilimumab and Nivolumab Followed by Dabrafenib and Trametinib in Treating Patients With Stage III-IV BRAFV600 Melanoma
  2. Nivolumab in Eliminating Minimal Residual Disease and Preventing Relapse in Patients With Acute Myeloid Leukemia in Remission After Chemotherapy
  3. Nivolumab and Ipilimumab in Treating Patients With Advanced HIV Associated Solid Tumors
  4. Entinostat, Nivolumab, and Ipilimumab in Treating Patients With Solid Tumors That Are Metastatic or Cannot Be Removed by Surgery or Locally Advanced or Metastatic HER2-Negative Breast Cancer
  5. Nivolumab in Treating Patients With HTLV-Associated T-Cell Leukemia/Lymphoma
  6. Tremelimumab and Durvalumab With or Without Radiation Therapy in Patients With Relapsed Small Cell Lung Cancer
  7. Pembrolizumab, Letrozole, and Palbociclib in Treating Patients With Stage IV Estrogen Receptor Positive Breast Cancer With Stable Disease That Has Not Responded to Letrozole and Palbociclib


Thanks to DecBio's Andrew Aijian for the analysis, dashboard and commentary. And to OpenTrials for making this kind of data open and accessible.

Friday, 7 October 2016

Index mis-assignment to Illumina's PhiX control

Multiplexing is the default option for most of the work being carried out in my lab, and it is one of the reasons Illumina has been so successful. Rather than the one-sample-per-lane we used to run when a GA1 generated only a few million reads per lane, we can now run a 24 sample RNA-seq experiment in one HiSeq 4000 lane and expect to get back 10-20M reads per sample. For almost anything other than genomes multiplexed sequencing is the norm.

But index sequencing can go wrong, and this can and does happen even before anything gets on the sequencer. We noticed that PhiX has been turning up in demultiplexed sample Fastq. PhiX does not carry a sample index index so something is going wrong! What's happening? Is this a problem for indexing and multiplexing in general on NGS platforms? These were the questions I have recently been digging into after our move from HiSeq 2500 to HiSeq 4000. In this post I'll describe what we've seen with mis-assignment of sample indexes to PhiX. And I'll review some of the literature that clearly pointed out the issue - in particular I'll refer to Jeff Hussmann's PhD thesis from 2015.

The problem of index mis-assignment to PhiX can be safely ignored, or easily fixed (so you could stop reading now). But understanding it has made me realise that index mis-assignment between samples is an issue we don not know enough about - and that the tools we're using may not be quote up to the job (but I'll not cover this in depth in this post).


Tuesday, 20 September 2016

The future of Illumina according to @chrissyfarr

In yesterdays Fast Company piece Christina Farr (on Twitter) gives a very nice write up of Illumina's history and where they are going with respect to bringing DNA sequencing into the clinic. I really liked the piece and wanted to share my thoughts after reading it with Core-Genomics readers.


Friday, 16 September 2016

Reporting on Fluidigm's single-cell user meeting at the Sanger Institute

The Genomics community is pushing ahead fast on single-cell analysis methods as these are revolutionising how we approach biological questions. Unfortunately my registration went in too late for the meeting running at the Sanger Institute this week (Follow #SCG16 on Twitter), but the Fluidigm pre-meeting was a great opportunity to hear what people are doing with their tech. And it should be a great opportunity to pick other users brains about their challenges with all single-cell methods.



Imaging mass-cytometry: the most exciting thing to happen in 'omics?

Mark Unger (Fluidigm VP of R&D) started the meeting off by asking the audience to consider the two axes of single-cell analysis: 1) Number of cells being analysed, 2) what questions can you ask of those cells (mRNA-seq is only one assay) - proteomics, epigenetics, SNPs, CNVs, etc.

Right now Fluidigm has the highest number of applications that can be run on single-cells with multiple Fluidigm and/or user developed protocols on the Fludigm Open App website; 10X Genomics only have single-cell 3' mRNA-seq right now, as do BioRad/Illumina and Drop-seq. But I am confident other providers will expand into non 3'mRNA assays...I'd go further and say that if they don't they'll find it hard to get traction as users are likely require a platform that can do more than one thing.

Wednesday, 14 September 2016

10X Genomics publications

Anyone that's been reading Core-Genomics will have seen my interest in the technology from 10X Genomics. I've been watching and waiting for publications to come out to get a better understanding of how people are using the technology and thought you might like my current list of articles: many of these are on the BioRxiv and should be available in a reputable journal if you're reading this in 2017 or later!

The number of 10X Genomics publications is going to grow rapidly; and this list will only be updated sporadically!



Friday, 9 September 2016

10X Genomics phasing explained

This post follows on from my previous one explaining the 10X Genomics single-cell mRNA-seq assay. This time round I'm really reviewing the method as described in a paper recently put up on the BioRxiv by 10X's Deanna Church and David JaffeDirect determination of diploid genome sequences. This follows on from the earlier Nat Methods paper which was the first 10X de novo assembly of NA12878, but on the GemCode system. While we are starting some phasing projects on our 10X Chromium box the more significant interest has been on the single cell applications. But if we can combine the two methods (or something else) to get single-cell CNV then 10X are onto a winner!



Monday, 5 September 2016

Nuclear sharks live for 400 years

A wonderful paper in a recent edition of Science uses radiocarbon dating to show that the Greenland shark can live for up to 400 years - making it the longest lived vertebrate known. See: Eye lens radiocarbon reveals centuries of longevity in the Greenland shark (Somniosus microcephalus).


Friday, 2 September 2016

Sequencing base modifications: going beyond mC and 5hmC

A great new resource was recently brought to my attention on Twitter and there is a paper describing it on the BioRxiv: DNAmod: the DNA modification database. Nearly all of the modified nucleotide sequencing we hear and read about is modifications to Cytosine mostly methyl cytosine and hydroxymethyl cytosine; you may also have heard about 8-oxoG if you are interested in FFPE analysis. All sorts of modified nucleotides occur in nature and may be important in biological processes where they can vary across tissue of an organism, or may just be chemical noise. The modifications are most important when they change the properties of the DNA strand, how is is read, and what might or might not bind to it e.g mC.


Thursday, 1 September 2016

Celebrating 10 years at the CRUK-Cambridge Institute today

Today I have been working for Cancer Research UK for ten years! September 1st 2006 seems like such a short time ago but a huge amount has changed in that time in the world of Genomics. NGS has changed the way we do biology, and is changing the way we do medicine. The original Solexa SBS has been pushed hard by Illumina to give us the $1000 genome, and perhaps just as exciting are the results coming out of Oxford Nanopore's MAP community - this maybe the technology to displace Illumina? What the next ten years will hold is difficult to predict, but today I wanted to focus on the highlights of the last ten years at CRK for me.

CRUK-Cambridge Institue circa early 2006

Thursday, 25 August 2016

Optalysys eco-friendly genomics analysis

The amount of power used in a genome analysis is not something I'd ever thought of until I heard about Optalysys, a company developing optical computing that has the potential to be 90% more energy-efficient and 20X faster than than standard (electronic) compute infrastructure. Read on if you are interested in finding out more, and watch the video below - featuring Prof Heinz Wolff!



Optalysys was originally spun out from the University of Cambridge and the technology needs a lot more explanation that I'll give: briefly they split laser light across liquid crystal grids where each "pixel" can be modulated to encode analogue numerical data in the laser beam, this diffracts forming an interference pattern and a mathematical calculation is performed - all at the speed of light. The beam can be split across many liquid crystals to increase the multiplicity and complexity of mathematical operations performed.

Optalysys and the Earlham Institute in Norwich are collaborating on a project to build hardware/software that will be used for metagenomic analysis. This is a long way from comparing 500 matched tumour and normal genomes in an ICGC project; but if Optalysys can build systems to handle this scale then the huge compute processing tasks might be carried out at a fraction of the current costs and whilst running from a standard mains power supply.

PS: do you remember the Great Egg race as fondly as I do?

Wednesday, 24 August 2016

Upcoming Genomics conferences in the UK

It is almost time for the kick off at Genome Science, probably the best organised academic conference in the UK. It runs from August 30th to September 1st next week and sadly I can't be there (just returned from holidays and too much going on). You can hear from a wide range of speakers in a jam packed agenda. This year it is hosted by the University of Liverpool, and the evening entertainment comes from Beatles Tribute Band “The Cheatles”!

What other conferences are available for Genomics in the UK, and which one should you attend if you too can't make it over to Liverpool? The Wellcome Trust Genome Campus is holding their first Single Cell Genomics conference from September 9th (sold-out I'm afraid). Personally I thought that the London Festival of Genomics was excellent and I've high hopes for the January 2017 meeting. 

Often it is word of mouth that brings a conference to my attention, but there are a couple of resources out there to help.
  • AllSeq maintain a list of conferences.
  • GenomeWeb has a similar list, but it seems less focused than AllSeq.
  • NextGenSeek has a list for 2016, but nothing on the cards for 2017 yet.
  • Nature has an events page (searchable) that lists 50 upcoming NGS conferences.

PS: please do let me know if you've particular recommendations on conferences to attend. And do get in touch with the groups above to list your conference on their sites.

PPS: If you can justify it then the HVP/HUGO Variant Detection Training Course - "Variant Effect Prediction" running from 31st October 2016 is in Heraklion, Crete - a beautiful place to learn!

Thursday, 28 July 2016

10X Genomics single-cell 3'mRNA-seq explained

10X Genomics have been very successful in developing their gel-bead droplet technology for phased genome sequencing and more recently, single-cell 3'mRNA-seq. I've posted about their technology before (at AGBT2016, and March and November 2015) and based most of what I've written on discussion with 10X or from presentations by early access users. Now 10X have a paper up on the BioRxiv: Massively parallel digital transcriptional profiling of single cells. This describes their approach to single-cell 3'mRNA-seq in some detail and describes how you might use their technology in trying to better understand biology and complex tissues.

Monday, 25 July 2016

RNA-seq advice from Illumina

This article was commissioned by Illumina Inc.

The most common NGS method we discuss in our weekly experimental design meeting is RNA-seq. Nearly all projects will use it at some point to delve deeply into hypothesis driven questions, or simply as a tool to go fishing for new biological insights. It is amazing how far a project can progress in just 30 minutes of discussion, methodology, replication, controls, analysis, and all sorts of bias get covered as we try to come up with an optimal design. However many users don't have the luxury of in-house Bioinformatics and/or Genomics core facilities so they have to work out the right sort of experiment to do for themselves. Fortunately people have been hard at work creating resources that can really help and most recently Illumina released an RNA-seq "Buyer’s Guide" with lots of helpful information....including how to keep costs down.



Thursday, 21 July 2016

Core Genomics is going cor-porate (sort of)

I've just had my five year anniversary of starting the Core Genomics blog! Those five years have whizzed by and NGS technologies have surpassed almost anything I dreamed would have been possible when I started using them in 2007. My blog has also grown beyond anything I dreamed possible and the feedback I've had has been a real motivating factor in keeping up with the writing. It also stimulated my move onto Twitter and I now have multiple accounts: @CIGenomics (me), @CRUKgenomecore (my lab) and @RNA_seq, @Exome_seq (PubMed Twitter bots).

The blog is still running on the Google Blogger site I set up back in 2011 and I feel ready for a change. This will allow me to do a few things I've wanted to do for a while and over the next few months I'll be migrating core-genomics to a new WordPress site: Enseqlopedia.com



Saturday, 16 July 2016

Whole genome amplification improved

A new genome amplification technology from Expedeon/Sygnis: TruePrime looks like it might work great for single-cell and low-input anlyses - particularly copy number. TruePrimer is a primer-free multiple displacement amplification technology. It uses the well established phi29 DNA polymerase and a new TthPrimPol primase, which eliminates the need to use random primers and therefore avoids their inherent amplification bias. The senior author on the TthPrimPol primase paper, Prof Luis Blanco, is leading the TruePrime research team.


Thursday, 14 July 2016

How much time is lost formatting references?

I just completed a grant application and one of the steps required me to list my recent papers in a specific format. This was an electronic submission and I’m sure it could be made much simpler, possibly by working off the DOI or PubMed ID? But this got me thinking about the pain of reformatting references and the reasons we have so many formats in the first place. It took me ten minutes to get references in the required format, and I've spent much longer in the past - all wasted time in a day that is already too full!

Saturday, 2 July 2016

Comparison of DNA library prep kits by the Sanger Institute

A recent paper from Mike Quail's group at the Sanger Institute compares 9 different library prep kits for WGS. In Quantitation of next generation sequencing library preparation protocol efficiencies using droplet digital PCR assays - a systematic comparison of DNA library preparation kits for Illumina sequencing, the authors used a digital PCR (ddPCR) assay to look at the efficiency of ligation and post-ligation steps. They show that even though final library yield can be high, this can mask poor adapter ligation efficiency - ultimately leading to lower diversity libraries.

In the paper they state that PCR-free protocols offer obvious benefits in not introducing amplification biases or PCR errors that are impossible to distinguish from true SNVs. They also discuss how the emergence of greatly simplified protocols that merge library prep steps can significantly improve the workflow as well as the chemical efficiency of those merged steps. As a satisfied user of the Rubicon Genomics library prep technology (e.g. for ctDNA exomes) I'd like to have seen this included in the comparison*. In a 2014 post I listed almost 30 different providers.



Hidden ligation inefficiency: The analysis of ligation efficiency by the authors sheds light on an issue that has been discussed by many NGS users - that of whether library yield is an important QC or not? Essentially yield is a measure of how much library a kit can generate from a particular sample, but it is not a measure of how "good" that library is. Only analysis of final library diversity can really act as a sensible QC.

The authors saw that kits with high adapter ligation efficiency gave similar yields when compared to kits with low adapter ligation efficiency (fig 4 reproduced above). They determined that the most likely cause was that the relatively high amount of adapter-ligated DNA going into PCR inhibits the PCR amplification reaction leading to lower than expected yields. For libraries with low adapter ligation efficiency a much lower amount of adapter-ligated DNA would make it into PCR, but because there is no inhibition the PCR amplification reaction leads to higher than expected yields. The best performing kits were Illumina Truseq Nano and PCR free, and KAPA Hyper kit with ligation yields above 30%; and the KAPA HyperPlus was fully efficient.

Control amplicon bias: the PhiX control used had three separate PCR amplicons amplified to assess bias. The kits with the lowest bias at less than 25 % for each fragment size were KAPA HyperPlus and NEBNext. The Illumina TruSeq Nano kit showed different biases when using the "Sanger adaptors" rather than "Illumina adaptors", which the authors suggest highlights that both adapter and fragment sequence play a role in the cause of this bias.

Which kit to choose: The authors took the same decision as most kit comparison papers and shied away from making overt claims about which kit was "best". The did discuss fragmentation and PCR-free as important points to consider.
  • If you have lots of DNA then aim for PCR-free to remove any amplification errors and/or bias.
  • If you don't have a Covaris then newest enzymatic shearing methods e.g. KAPA fragmentase have significantly less bias than previous chemical fragmentation methods.
Ultimately practicability, the overall time and number of steps required to complete a protocol, will be uppermost in many users minds. The fastest protocols were NEBNext Ultra kit, KAPA HyperPlus, and Illumina Truseq DNA PCR-free.

*Disclosure: I am a paid member of Rubicon Genomics' SAB.

Wednesday, 29 June 2016

Measuring translation by ribosomal footprinting on MinION?

Oxford Nanopore should have kits for direct RNA sequencing available later this year, and have several examples of how these might be used on their website. The method presented at London Calling (see OmicsOmics coverage), is primarily for mRNA, but it likely to be adapted for other RNA species in due course.

One of the ideas I've briefly thought about is using the MinION to perform ribosome profiling - a basic method would involve ligating adapters to RNA after cell lysis so Ribosomes  are fixed to mRNAs with cyclohexamide treatment. Fast mode sequencing would identify the 3' end and the transcript, then sequencing speed would be massively ramped up to zip the mRNA through the pore; the bound ribosomes should cause sequencing to stall allowing counting of stall events and therefore the number of ribosomes attached to an mRNA.

Friday, 24 June 2016

I don't want to leave Europe

Brexit sucks...probably. The issue is we don't really know what the vote really means, or even if we'll actually leave the European Union in the next couple of years at all. However one thing cannot be ignored and that is the two-fingered salute to our European colleagues from 52% of the UK voting population that got out of bed on a rainy Thursday.


I am privileged to work in one of the UK's top cancer institutes at the top UK University: the Cancer Research UK Cambridge Institute, a department of the University of Cambridge. The institute, and the University, is an international one with people from all across the globe, many of the staff in my lab have come from outside the UK and they are all great people to work with. I dislike the idea that these people feel insecure about their future because our politicians have done such a crap job on governing the country.

I'd like to keep the international feel so if you're still thinking that working in the UK would be good for you (and your a genomics whiz) then why not check out the job ad for a new Genomics Core Deputy ManagerWe're expanding the lab and putting lots of effort into single-cell genomics service (10X Genomics and Fluidigm C1 right now). I'm looking for a senior scientist of any nationality to help lead the team, with NGS experience, and ideas about single-cell genomics.



You can get more information about the lab on our websiteYou can get more information about the role, and apply on the University of Cambridge website.

Friday, 17 June 2016

Come and work in my lab...

I've just readvertised for someone to my lab as the new Genomics Core Deputy Manager. We're expanding the Genomics core and building single-cell genomics capabilities (we currently have both 10X Genomics and Fluidigm C1). I'm looking for a senior scientist who can help lead the team, who has significant experience of NGS methods and applications; and ideally has an understanding of the challenges single-cell genomics presents. You'll be hands-on helping to define the single-cell genomics services we offer, and build these over the next 12-18 months.



You'll have a real opportunity to make a contribution to the science in our institute and drive single-cell genomics research. The Cancer Research UK Cambridge Institute is a great place to work. It's a department of the University of Cambridge, is one of Europe's top cancer research institutes. We are situated on the Addenbrooke's Biomedical Campus, and are part of both the University of Cambridge School of Clinical Medicine, and the Cambridge Cancer Centre. The Institute focus is high-quality basic and translational cancer research and we have an excellent track record in cancer genomics 123. The majority of data generated by the Genomics Core facility is Next Generation Sequencing, and we support researchers at the Cambridge Institute, as well as nine other University Institutes and Departments within our NGS collaboration.

You can get more information about the lab on our websiteYou can get more information about the role, and apply on the University of Cambridge website.

Tuesday, 14 June 2016

SPRI alternatives for NGS: survey results

Everyone loves bead cleanups, and it appears that almost everyone (85%) who read my recent post about SPRI alternatives loves Agencourt AMPureXP. I'd asked readers to take a survey asking if they used AMPure XP, a commercial alternative, or a home-brew version - the results are below.

https://www.surveymonkey.co.uk/r/V6LN5VX
Take the survey: https://www.surveymonkey.co.uk/r/V6LN5VX

I was surprised to see more home-brew responses than commercial alternatives, but this could simply reflect the attitudes of people reading CoreGenomics.

Friday, 10 June 2016

CNV, RNA, ChIP and cfDNA sequencing for £10 per sample

Copy-number analysis is a useful tool for many researchers and we use it a lot for analysis of tumour samples. In the past this was done using SNP arrays e.g. Affymetrix SNP6.0 in METABRIC, but today we're generally using low-coverage whole genome sequencing and tools like qDNAseq. I've posted before about our use of low-coverage WGS in our exome pipeline. Most recently we've got groups doing low-coverage WGS on large numbers of samples purely for copy-number analysis.

Low-coverage WGS makes CNV-seq fast and cheap but a recent Genome Research paper suggest some great methodological improvements to push costs down to very low levels:  SMASH, a fragmentation and sequencing method for genomic copy number analysis

WGS and SMASH generate highly concordant CNV calls

Thursday, 9 June 2016

Proteomics is starting to rock too...

I don’t usually read Proteomics papers but have been thinking about how we might combine single cell genome and transcriptome sequencing - with Fluidigm’s Helios (CyToF) and have been trying to get more acquainted with Proteomics methods. In doing so I found this excellent paper: Proteomic Biomarker Discovery in 1000 Human Plasma Samples with Mass Spectrometry. The paper is probably a tour-de-force of Proteomics, even if the published results were not stunning, but not being a Proteomecist I’m not sure I’m qualified to say that. It is obvious that the group working on this put a large amount of effort into experimental design ahead of completing the mass-spec work.



Figure from Cominetti et al 2016


Any large project needs to consider design very carefully from considering what factors might need to be controlled for, to deciding what controls to use. The experimental design for the Human proteome paper is illustrated above; they used a controlled-randomised plate layout to remove plate confounding effects for sample origin, gender, age, ethnicity, BMI, blood pressure, glycemic indices, and clinical biochem.

Reducing mass-spec variability with tandem-mass-tags: The key to making the data comparable across what was over 300 mass-spec runs was the use of tandem-mass tags purchased from Thermo Scientific (Rockford, IL, USA), these add specific masses to all proteins in a sample allowing multiplexing of up to 24 samples per run. With a carefully designed experiment it is possible to reduce the impact of run-to-run variability. Much in the same way as we designed projects using multi-sample microarrays, the experimental groups are balanced across mass-spec runs. I’ve learnt a lot more about tandem-mass tagging in Proteomics over the last 18 months after hearing about the tech in an internal seminar. It seems that this approach is going to allow Proteomics researchers to take advantage of the statistical tools developed for gene expression array analysis. The group used a pair of control samples in each run further reducing the impact of technical variability. 304 TMT 6-plex mass-spec runs were performed, with each 6-plex containing two standards, and 4 samples. 1000 patient plasma samples were processed in 19x 96-well plates over a period of just 15 weeks. All sample handling was tracked, although they did not describe their tracking and whether they used a LIMs or not. The paper is a great example of careful experimental design and I thought was one well worth sharing outside the Proteomics community.

Between 150-200 proteins were identified and the authors argue strongly that this was only possible because f the use of TMTs. Label-free mass-spec approaches would have introduced more variability and taken significantly longer (38 weeks by their estimation). However after crunching the numbers only two proteins in Human Plasma had significant correlation with BMI. Both were shown to be associated with obesity.

NGS experimental design: We're lucky to have such a large number of sample barcodes available for NGS experiments. We can usually fit the whole experiment into one library prep plate and a single sequencing pool and remove almost all the confounding technical issues. However this does not mean we should skip careful design of NGS experiments. Taking a little time to discuss the major question(s) being asked, the samples available and the methods we'll use in both wet and dry labs is time very well spent.