tag:blogger.com,1999:blog-63344534755265235972024-03-05T07:46:07.662+00:00CoreGenomicsSome comments and analysis from the exciting and fast moving world of Genomics. This blog focuses on next-generation sequencing and microarray technologies, although it is likely to go off on tangents from time-to-timeJames@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.comBlogger441125tag:blogger.com,1999:blog-6334453475526523597.post-8545958807693512572017-01-23T20:58:00.001+00:002017-07-13T01:56:06.181+01:00CoreGenomics has moved<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: large;"><b><a href="http://enseqlopedia.com/coregenomics">Follow this link to Enseqlopedia/coregenomics...</a></b></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">"CoreGenomics is dead...long live <a href="http://enseqlopedia.com/coregenomics">CoreGenomics</a>"...t</span><span style="font-family: "georgia" , "times new roman" , serif;">he CoreGenomics blog has moved to its new home: http://enseqlopedia.com/coregenomics. </span><span style="font-family: "georgia" , "times new roman" , serif;">Please update your bookmarks and <a href="http://enseqlopedia.com/#_loginarea">register to follow the new blog</a>, for updates on the NGS map</span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><span style="font-family: "georgia" , "times new roman" , serif;">(coming soon)</span><span style="font-family: "georgia" , "times new roman" , serif;">, and to access the new Enseqlopedia (coming soon)!</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Enseqlopedia: </b>Last year I started the process of building the new <a href="http://enseqlopedia.com/">Enseqlopedia</a> site, after five years of blogging here on Blogger. Whilst Enseqlopedia is still being developed the CoreGenomics blog has moved over and you can also find all the old content there too. Comenting should be much easier for me to manage so please do give me your feedback directly on the site.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>NGS mapped: </b>Currently I'm working on the newest implementation of the Googlemap </span><a href="http://omicsmaps.com/" style="font-family: Georgia, "Times New Roman", serif;">sequencer map</a><span style="font-family: "georgia" , "times new roman" , serif;"> Nick Loman and I put together many years ago. The screenshot of the demo gives you an idea of what's changed. The big differences are a search bar that allows you to select technology providers and/or instruments. The graphics also now give a pie-chart breakdown of the instruments in that location...you can clearly see the dominance of Illumina!</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Other technologies that will appear soon will be single-cell systems from the likes of <a href="https://www.10xgenomics.com/">10X Genomics</a>, <a href="https://www.fluidigm.com/products/c1-system">Fluidigm</a>, <a href="http://www.wafergen.com/products/icell8-single-cell-system">Wafergen</a>, <a href="https://www.genomeweb.com/pcr/bio-rad-illumina-launch-codeveloped-single-cell-sequencing-system">BioRad/Illumina</a>, <a href="http://www.dolomite-bio.com/product/rna-seq-system/">Dolomite</a>, etc, etc, etc. So users can find people nearby to discuss their experiences with <span style="color: #666666;">(we're also restarting our beer & pizza nights as a <a href="https://twitter.com/singlecellclub">single cell club</a> here in Cambridge so keep an eye out for that on Twitter)</span>.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Lastly a change that should also happen in 2017 is the addition of users to the map. I'm hoping to give anyone who uses NGS technologies a way to list their lab, and highlight the techniques they are using. Again the aim is to make it easier for us to find each other and get talking.</span></div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvo3BGpPLhS-RbsE9gJVN_CFaBFUGM2QeRBcRYjCUyMvrz0tm2vBrZEBBjkITvshXhTwKCYvY0-P3zCiUW89LWa1-h4wmK8PkpeDNVBMHqBIXkrPyVCtxGjyKD2cu94bjXQqtyT37mYUMQ/s1600/map3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvo3BGpPLhS-RbsE9gJVN_CFaBFUGM2QeRBcRYjCUyMvrz0tm2vBrZEBBjkITvshXhTwKCYvY0-P3zCiUW89LWa1-h4wmK8PkpeDNVBMHqBIXkrPyVCtxGjyKD2cu94bjXQqtyT37mYUMQ/s400/map3.png" width="550" /></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Enseqlopedia.com is a big step for me. I hope you think it was worthwhile in a year or so. There's one one feature I've not mentioned until now which I'm hoping you'll get to hear more about in the very near future - the Enseqlopedia itself. Watch out for it to appear in press.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Thanks so much for following this blog. I'm sad to leave Blogger. I hope you'll come with me to </span><a href="http://enseqlopedia.com/coregenomics" style="font-family: georgia, "times new roman", serif;">Enseqlopedia/coregenomics</a>.</div>
<div>
<span style="background-color: white; color: #666666; font-family: "georgia" , "times new roman" , serif; font-size: 13.2px;"></span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com940tag:blogger.com,1999:blog-6334453475526523597.post-57570147410263293692016-12-09T15:57:00.000+00:002016-12-09T15:57:08.536+00:0010X Genomics updates<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">We had a seminar form 10X Genomics today to present some of the most recent updates on their systems and chemistry. The new chemistry for single-cell gene expression and the release of a specific single-cell controller show how much effort 10X have placed on single-cell analysis as a driver for the company. Phasing is looking very much the poor cousin right now, but still represents an important method to understand genome organisation, regulation and epigenetics.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwyZvgcinW0C-LdZ0bdjIqDDwngZAqEMfy-Lo62B8FodyGr3xvGEEd_JgCubvDMKj6AroePQamiAyGxnjFqLVANyf-P8u8xuDPegKY5KiSwBDyCErYhsmjzqdZRoRf6s3oDQInntwCl765/s1600/10X+update+seminar+-+single+cell+3%2527mRNA-seq+chemistry.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="363" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwyZvgcinW0C-LdZ0bdjIqDDwngZAqEMfy-Lo62B8FodyGr3xvGEEd_JgCubvDMKj6AroePQamiAyGxnjFqLVANyf-P8u8xuDPegKY5KiSwBDyCErYhsmjzqdZRoRf6s3oDQInntwCl765/s400/10X+update+seminar+-+single+cell+3%2527mRNA-seq+chemistry.png" width="400" /></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b>Single cell 3'mRNA-seq V2: </b> the most important update from my perspective was that 10X libraries can now be run on HiSeq 4000, rather than just 2500 and NextSeq. This means we can run these alongside our standard sequencing (albeit with a slightly weird run-type).</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">The new chemistry offers i</span><span style="font-family: Georgia, "Times New Roman", serif;">mproved sensitivity to detect more genes per cell, i</span><span style="font-family: Georgia, "Times New Roman", serif;">mproved sensitivity to detect more transcripts per cell, an updated </span><span style="font-family: Georgia, "Times New Roman", serif;">Cell Ranger 1.2 analysis pipeline, and c</span><span style="font-family: Georgia, "Times New Roman", serif;">ompatibility with all Illumina sequencers - s</span><span style="font-family: Georgia, "Times New Roman", serif;">equencing is still paired-end but read 1 = 26bp for 10X barcode and UMI, Index 1 is the sample barcode, read 2 = the cDNA reading back to the polyA tail.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">It is really important in all the single-cell systems to carefully prepare and count cells before starting. </span><span style="font-family: Georgia, "Times New Roman", serif;">You MUST have a single-cell suspension and l</span><span style="font-family: Georgia, "Times New Roman", serif;">oad 100-2000 cells per microlitre in a volume of 33.8ul. This means c</span><span style="font-family: Georgia, Times New Roman, serif;">ounting cells is going to be very important as the concentration loaded affects the number of cells ultimately sequenced, and also the doublet rate. Counting cells can be highly variable; 10X recommend using a </span><span style="font-family: Georgia, "Times New Roman", serif;">haemocytometer or </span><span style="font-family: Georgia, "Times New Roman", serif;">a </span><a href="https://www.thermofisher.com/uk/en/home/life-science/cell-analysis/cell-analysis-instruments/automated-cell-counters/countess-ii-fl-automated-cell-counter.html" style="font-family: Georgia, "Times New Roman", serif;">Life Tech Countess</a><span style="font-family: Georgia, Times New Roman, serif;">. Adherent cells need to be trypsinsed and filtered using a </span><a href="http://www.belart.com/flowmi" style="font-family: Georgia, "Times New Roman", serif;">Flowmi cell strainer </a><span style="font-family: Georgia, Times New Roman, serif;">or similar. Dead cells, and/or lysed cells, can confuse analysis by leaching RNA into the cell suspension - it may be possible to detect this by monitoring the level of background transcription across cell barcodes. The i</span><span style="font-family: Georgia, "Times New Roman", serif;">nterpretation of QC plots provided by 10X is likely to be very important but there are not many examples of these plots out there yet so users need to talk to each other.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">There is a reported doublet rate per 1000 cells of </span><span style="font-family: Georgia, "Times New Roman", serif;">0.8%</span><span style="font-family: Georgia, Times New Roman, serif;">, which keeps 10X at the low end of doublet rates on single-cell systems. However it is still not clear exactly what the impact is of this on the different types of experiment we're being asked to help with. I suspect we'll see more publications on the impact of doublet rate, and analysis tools to detect and fix theses problems.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">The sequencing per cell is very much dependant on what your question is. 10X recommend </span><span style="font-family: Georgia, Times New Roman, serif;">50,000 reads per cell, which should detect 1200 transcripts in BMCs, or 6000 in HEK293 cells. It is not completely clear how much additional depth will increase genes detected before you reach saturation, but it is not worth going much past 150,000 reads per cell.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b>1 million single-cells: </b>10X also presented a 3D tSNE plot of the recently released <a href="http://support.10xgenomics.com/single-cell/datasets">1 million cell experiment</a>. This was an analysis of E18 mouse cortex, hippocampus, and ventricular zone. The 1 million single-cells were processed as 136 libraries across 17 Chromium chips, and 4 HiSeq 4000 flowcells. This work was completed by one person in one week - it is amazing to think how quickly single-cell experiments have grown from 100s to 1000s of cells, and become so simple to do.</span></div>
<div>
<br /></div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
Additional sequencing underway to reach ~20,000 reads per cell. All raw and processed data will be released without restrictions.</div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
<br /></div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
The number of cells required to detect a population is still something that people are working on. The 1 million cell dataset is probably going to help the community by delivering a rich dataset that users can analyse and test new computational methods on.</div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
<br /></div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
<b>What's next from 10X: </b>A new assay coming in Spring 2017 is for Single Cell V(D)J sequencing, enabling high-definition immune cell profiling.</div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoqYXAgA0XBn5kx8ddXQuMAlgsQZhlK4q1CYoeQGWoHKbfpfCAjO_4HiCmTe8zziK1kjxGkhIURBfBHwoNWW4p6IL1ZK90UPNVTGCnbkHXVRQ5mIw7Rpv9TlgswAaXVwXAmieGz3zl_JuM/s1600/10X+update+seminar+-+single+cell+VDJ+sequencing.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="145" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoqYXAgA0XBn5kx8ddXQuMAlgsQZhlK4q1CYoeQGWoHKbfpfCAjO_4HiCmTe8zziK1kjxGkhIURBfBHwoNWW4p6IL1ZK90UPNVTGCnbkHXVRQ5mIw7Rpv9TlgswAaXVwXAmieGz3zl_JuM/s400/10X+update+seminar+-+single+cell+VDJ+sequencing.png" width="400" /></a></div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">The seminar was well attended showing how much interest</span><span style="font-family: Georgia, Times New Roman, serif;"> there is in single-cell methods. Questions during and after the seminar included the costs of running single-cell experiments, the use of spike-ins (e.g. ERCC, SIRV, Sequins), working with nuclei, etc.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">In answering the question about working with nuclei 10X said </span><i style="font-family: Georgia, "Times New Roman", serif;">"we tried and it is quite difficult"</i><span style="font-family: Georgia, "Times New Roman", serif;">...the main difficulty was the lysis of single-nuclei in the gel droplets. Whilst we might not be able to get it at single-cell resolution, this difficulty in lysing the nucleus rather than the cell might possibly be a way to measure and compare nuclear versus cytoplasmic transcripts.</span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com180tag:blogger.com,1999:blog-6334453475526523597.post-42615804248779217782016-11-17T15:49:00.000+00:002016-11-17T15:49:00.319+00:00MinION: 500kb reads and counting<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Georgia, Times New Roman, serif;">A couple of Tweets today point to the amazing lengths <a href="https://nanoporetech.com/">Oxford Nanopores</a> MinION sequencer is capable of generating - over 400kb!</span><br />
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif;"><a href="https://twitter.com/86Dominik">Dominik Handler</a> Tweeted a plot showing read distribution from a run . In replies following the Tweet he describes the DNA handling as involving "no tricks, just very careful DNA isolation and no, really no pipetting (ok 2x pipetting required)".</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghucOp_kLpKqmz2vVpks-ksstgynmdBiIPpEBaT6kf32saZH3Nv0fBr6RrmZgSM7gspcuB8y0p7Euby3Sc4PrSdDUkWgK3l-6pocQGQMun6l_ODYYZa0z7sRybn7iuWmvcdhzDcj4pXZ-V/s1600/Screen+Shot+2016-11-17+at+15.31.07.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghucOp_kLpKqmz2vVpks-ksstgynmdBiIPpEBaT6kf32saZH3Nv0fBr6RrmZgSM7gspcuB8y0p7Euby3Sc4PrSdDUkWgK3l-6pocQGQMun6l_ODYYZa0z7sRybn7iuWmvcdhzDcj4pXZ-V/s400/Screen+Shot+2016-11-17+at+15.31.07.png" width="331" /></span></a></div>
<br />
<span style="font-family: Georgia, Times New Roman, serif;">and <a href="https://twitter.com/martinalexsmith">Martin Smith</a> Tweeted an even longer read, almost 500kb in length...</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhahBTfFco8EsX6i1ROWbsXtNrot3JK0j-m3dI7ahOFeP8h_bm8aWt9Vv9aybapjoMMLWyoaQac6OIMPtrReYQGwqiNxOq-LyMKOBED8xs_aRGOXWmf4GTgKV6zH6lhiSp3gUrbuXMpRC3Z/s1600/Screen+Shot+2016-11-17+at+15.39.57.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="203" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhahBTfFco8EsX6i1ROWbsXtNrot3JK0j-m3dI7ahOFeP8h_bm8aWt9Vv9aybapjoMMLWyoaQac6OIMPtrReYQGwqiNxOq-LyMKOBED8xs_aRGOXWmf4GTgKV6zH6lhiSp3gUrbuXMpRC3Z/s320/Screen+Shot+2016-11-17+at+15.39.57.png" width="320" /></span></a></div>
<br />
<span style="font-family: Georgia, Times New Roman, serif;">Exactly how easily we'll all see similar read lengths is unclear, but it is going to be hugely dependant on the sample and probably having "green fingers" as well.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">Here's Dominics gel...</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI5KBadysfNdSl6B0q1elq3e6jE7o-2IHjSDDXmreiitpkdawzHMSmJ05_uVmFsipXv-Jlev1AD4vCwUlcBecagA1OmtqPO6z-UwsUmS6vu4yu3t6VC2d-HABffGqx7rcP2rHNjl0wBiMA/s1600/Screen+Shot+2016-11-17+at+15.48.12.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI5KBadysfNdSl6B0q1elq3e6jE7o-2IHjSDDXmreiitpkdawzHMSmJ05_uVmFsipXv-Jlev1AD4vCwUlcBecagA1OmtqPO6z-UwsUmS6vu4yu3t6VC2d-HABffGqx7rcP2rHNjl0wBiMA/s320/Screen+Shot+2016-11-17+at+15.48.12.png" width="230" /></a></div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com11tag:blogger.com,1999:blog-6334453475526523597.post-11358206109027857402016-11-09T11:34:00.000+00:002016-11-09T11:34:04.658+00:00Unintended consequences of NGS-base NIPT?<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">The UK recently approved an </span><a href="https://www.genomeweb.com/molecular-diagnostics/uk-approves-use-nipt-part-national-screening-program" style="font-family: Georgia, "Times New Roman", serif;">NIPT test</a><span style="font-family: Georgia, "Times New Roman", serif;"> to screen high risk pregnancies for foetal trisomy 21, 13, or 18 after the current primary screening test, and in place of amniocentesis (following on from the results of the </span><a href="http://www.rapid.nhs.uk/library/rapid-publications" style="font-family: Georgia, "Times New Roman", serif;">RAPID</a><span style="font-family: Georgia, "Times New Roman", serif;"> study). I am 100% in favour of this kind of testing and 100% in favour of individuals, or couples, making the choice of what to do with the results. But what are the consequences of this kind of testing and where do we go in a world where cfDNA foetal genomes are possible?</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1kzVShHWmc_CIm8BEjR8SVW4cVHfiKPvUQiN-EYY-bnZkXp7JQPT8Q-7X8yauwA54EYZhzO5E59kWY16zl2P0yUMrzRLYbUjEAkKJ22S1zUTu5GLbIOSbuxOtk18y-QgXUoRInXH_W-Fp/s1600/Screen+Shot+2016-11-09+at+11.04.05.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="184" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1kzVShHWmc_CIm8BEjR8SVW4cVHfiKPvUQiN-EYY-bnZkXp7JQPT8Q-7X8yauwA54EYZhzO5E59kWY16zl2P0yUMrzRLYbUjEAkKJ22S1zUTu5GLbIOSbuxOtk18y-QgXUoRInXH_W-Fp/s320/Screen+Shot+2016-11-09+at+11.04.05.png" width="320" /></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">I decided to write this post after watching <a href="http://www.bbc.co.uk/programmes/b07ycbj5">"A world Without Downs"</a>, a documentary on BBC2 that was presented by <a href="https://en.wikipedia.org/wiki/Sally_Phillips">Sally Phillips</a> (of Bridget Jones fame), mother to Olly who has Down's syndrome. She presented a program where the case for the test was made (just), but the programme was very clearly pro-Down's. Although not quite to the point of being anti-choice.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><a name='more'></a><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">My own personal experience of Down's is limited, and I'd watched the documentary more out of excitement to see how NGS is being rolled out across the NHS; particularly because the same technology is being applied in Cancer and is likely to transform patient treatment. My view before watching was that this new NIPT test could only be a good thing. The program made me see that there are likely to be unintended consequences of this kind of testing, and that there may be darker sides to the use of the technology. It made me think more carefully about the issue, but in the end I'm still 100% in favour of the test.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Unintended consequences of cell-free DNA testing in have been reported previously, with the discovery of cancer in an expectant mum <a href="https://www.ncbi.nlm.nih.gov/pubmed/23559449">first reported in 2013</a>. How we deal with these issues is a matter of ongoing debate. For Down's t</span><span style="font-family: Georgia, "Times New Roman", serif;">he program highlighted the negative way expectant mothers and fathers are given the news that they may have a </span><span style="font-family: Georgia, "Times New Roman", serif;">Down's </span><span style="font-family: Georgia, "Times New Roman", serif;">child; and that better information can only lead to a more informed choice - not difficult to agree with that. Unfortunately the program can't escape it's Herodotian title. This test won't lead to "A world Without Downs", but how people use the information might.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">I'd highlighted the program on Twitter after watching it. And posted again after reading an article in the </span>Guardian<span style="font-family: Georgia, Times New Roman, serif;"> <a href="https://www.theguardian.com/society/2016/nov/04/new-prenatal-test-for-downs-syndrome-will-not-lead-to-more-terminations-nipt"><b>"Fears over new Down's syndrome test may have been exaggerated, warns expert"</b></a> where <a href="http://www.statslab.cam.ac.uk/Dept/People/Spiegelhalter/davids.html">Prof Sir David Spiegelhalter</a> was quoted as saying that terminations would not go up - based on the current models being used. I did not disagree with his stats (I'd be crazy to do that), but models can be wrong, and that was the basis of my Tweet.</span></div>
<span style="font-family: Georgia, Times New Roman, serif;"><div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The main argument from Phillips in the program is that this test will result in more terminations, and that means fewer people being born with Down’s syndrome. She visited Iceland, which she stated has not had a Down's syndrome child born in the last 5 years. This is surprising as I'd expect a country like Iceland to have a testing regime with as many false-negatives as anyone else - a few Down's children should have been born...and data from the <a href="http://gateway.euro.who.int/en/visualizations/line-charts/hfa_603-births-with-downs-syndrome-per-100-000-live-births/">WHO</a> seem to suggest this is indeed the case.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs0VncOH9kpsb2IXXmPe3vVpa6S6vKMElLXf1ahrbcMj7DxvaXOAkqhre-3QyBuBQLwxKhesfldoq5froQee0CaZq9IgExSRWrwM8WHn77aleCZOeBgU5-2Gat-8eHNd42bApub38PcsDW/s1600/Screen+Shot+2016-11-09+at+11.27.30.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="183" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs0VncOH9kpsb2IXXmPe3vVpa6S6vKMElLXf1ahrbcMj7DxvaXOAkqhre-3QyBuBQLwxKhesfldoq5froQee0CaZq9IgExSRWrwM8WHn77aleCZOeBgU5-2Gat-8eHNd42bApub38PcsDW/s400/Screen+Shot+2016-11-09+at+11.27.30.png" width="400" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<div>
Ultimately even if 100% of parents did choose to abort after receiving test results, as long as they were well informed before making their decision, then we've done the right thing. Haven't we?</div>
<div>
<br /></div>
</div>
<div style="text-align: justify;">
Trisomies 13, 18 and 21 are the only things tested for right now. But the underlying technology could ultimately use whole genome sequencing and find the full spectrum of genetic abnormalities: such as an increased risk of psoriasis, glaucoma, and Alzheimer's. If my mum had decided these were not traits she wanted her baby to have I'd not be writing this blog.</div>
</span><style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px <span class="goog-spellcheck-word" id=":e9.19" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">Helvetica</span>}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px <span class="goog-spellcheck-word" id=":e9.20" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">Helvetica</span>; min-height: 14.0px}
p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px <span class="goog-spellcheck-word" id=":e9.21" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">Helvetica</span>; -<span class="goog-spellcheck-word" id=":e9.22" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">webkit</span>-text-stroke: #6b5e3f; min-height: 14.0px}
p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px <span class="goog-spellcheck-word" id=":e9.23" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">Helvetica</span>; -<span class="goog-spellcheck-word" id=":e9.24" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">webkit</span>-text-stroke: #6b5e3f}
p.p5 {margin: 0.0px 0.0px 19.6px 0.0px; line-height: 21.0px; font: 14.0px 'Century Gothic'; -<span class="goog-spellcheck-word" id=":e9.25" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">webkit</span>-text-stroke: #000000; background-<span class="goog-spellcheck-word" id=":e9.26" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">color</span>: #<span class="goog-spellcheck-word" id=":e9.27" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">fcfdfc</span>}
<span class="goog-spellcheck-word" id=":e9.28" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">li</span>.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px <span class="goog-spellcheck-word" id=":e9.29" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">Helvetica</span>}
span.s1 {font: 17.0px Times; font-kerning: none; <span class="goog-spellcheck-word" id=":e9.30" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">color</span>: #6b5e3f; background-<span class="goog-spellcheck-word" id=":e9.31" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">color</span>: #f9f6f2; -<span class="goog-spellcheck-word" id=":e9.32" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">webkit</span>-text-stroke: 0px #6b5e3f}
span.s2 {font-kerning: none}
<span class="goog-spellcheck-word" id=":e9.33" tabindex="-1" role="menuitem" aria-haspopup="true" style="background: yellow;">ul</span>.ul1 {list-style-type: disc}
</style></div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com6tag:blogger.com,1999:blog-6334453475526523597.post-28778249740759487212016-10-21T15:03:00.000+01:002016-11-01T10:58:37.704+00:00Does the world have too many HiSeq X Tens?<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Illumina stock dropped 25% after a hammering by the stock market with their recent announcements that Q3 revenues would be 3.4% lower than expected at just $607 million. This makes Illumina a much more attractive acquisition (although I doubt this summers rumours of a </span><a href="http://www.investors.com/news/technology/is-thermo-fisher-buying-illumina-wall-street-doubts-it/" style="font-family: Georgia, "Times New Roman", serif;">Thermo bid</a><span style="font-family: "georgia" , "times new roman" , serif;"> had any substance), and also makes a lot of people ask the question "why?"</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;">The reasons given for the shortfall were </span><i style="font-family: georgia, "times new roman", serif;">"a larger than anticipated year-over-year decline in high-throughput sequencing instruments"</i><span style="font-family: "georgia" , "times new roman" , serif;"> </span><span style="font-family: "georgia" , "times new roman" , serif;">i.e. Illumina sold fewer sequencers than it expected to. It is difficult to turn these revenue figures and statements into the number of HiSeq 2500's, 4000's or X's that Illumina missed it's internal forecasts by, but according to Francis de Souza Illumina</span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><i style="font-family: georgia, "times new roman", serif;">"closed one less X deal than anticipated"</i><span style="font-family: "georgia" , "times new roman" , serif;"> </span><span style="font-family: "georgia" , "times new roman" , serif;">- although he did not say if this was an X5, X10 or X30! Perhaps more telling was that de Souza was quoted saying that</span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><i style="font-family: georgia, "times new roman", serif;"><a href="https://www.genomeweb.com/sequencing/illuminas-q3-revenues-fall-short-guidance-shares-tumble">"[Illumina was not counting on a continuing increase in new sequencer sales]"</a></i><span style="font-family: "georgia" , "times new roman" , serif;">...so is the market full to bursting?</span></div>
<div style="text-align: justify;">
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidx5B6JFRjFtlHSNXVcqxpfQcDwiry697ueqVHEqer3kxxskaPuKC9eEh59c02Rg-5Ygiqrm2_cpFil93mprJBpSYcRORarUjnLiSMZ8itNxUcfStJUw9uJq1yZZouWShG6CSg-wrDddnD/s1600/CoreGenomics+Too+Many+X+Tens.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="232" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidx5B6JFRjFtlHSNXVcqxpfQcDwiry697ueqVHEqer3kxxskaPuKC9eEh59c02Rg-5Ygiqrm2_cpFil93mprJBpSYcRORarUjnLiSMZ8itNxUcfStJUw9uJq1yZZouWShG6CSg-wrDddnD/s400/CoreGenomics+Too+Many+X+Tens.jpg" width="460" /></a></div>
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<a name='more'></a><span style="font-family: "georgia" , "times new roman" , serif;">Before diving into my own analysis (you might </span><span style="font-family: "georgia" , "times new roman" , serif;">also like to read </span><a href="https://www.genomeweb.com/sequencing/illuminas-q3-revenues-fall-short-guidance-shares-tumble" style="font-family: georgia, "times new roman", serif;">GenomeWeb's coverage</a>), <span style="font-family: "georgia" , "times new roman" , serif;">I would like to put these numbers in perspective. A 3rd quarter revenue of $607 million is nearly $2.5 billion over the full year (versus $2.3B in 2016 and $2.1B in 2015 (numbers from </span><a href="http://www.illumina.com/company/investor-information/financial-information.html" style="font-family: georgia, "times new roman", serif;">Illumina data here</a><span style="font-family: "georgia" , "times new roman" , serif;">). And revenues grew by 10% year on year. This does not seem like bad news from an academic users perspective!</span><br />
<div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Is there such a thing as too many sequencers: </b>Illumina have talked about how they were surprised by the interest in X Ten, and have sold far more units than they initially forecast. </span><span style="font-family: "georgia" , "times new roman" , serif;">The word on the street seems to be that only a few X Ten labs are working at capacity <a href="https://www.broadinstitute.org/node/8517">Broad</a>, <a href="http://www.nygenome.org/">NYGC</a>, <a href="http://www.humanlongevity.com/">Human Longevity</a>. Illumina have said the reagent pull-though on X Ten has been about $650K/X/year, which is only half of the theoretical $1.2 million</span><span style="font-family: "georgia" , "times new roman" , serif;">/X/year</span><span style="font-family: "georgia" , "times new roman" , serif;">.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Sales of HiSeq 4000 appear not to have been as strong as the 2500 platform was on its launch. NextSeq seems to be popular with almost 1000 units out in the field, especially for NIPT use, but also in medium sized labs wanting their own sequencer. I suspect a fair number of MiniSeq's are rolling off the production line <span style="color: #666666;">(although whether they offer good value for money is debatable)</span>.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">But Illumina's main reasons for slightly lower than expected performance were clearly lower sales of instruments; and this was particularly so in Europe last quarter. Todd Campbell at <a href="http://www.fool.com/investing/general/2016/04/26/does-illumina-incs-slowing-growth-mean-its-time-to.aspx">The Motley Fool</a> asked an important question about what's happening in Europe </span><span style="font-family: "georgia" , "times new roman" , serif;"><i>"Europe was [growing] slower than the rest of the world." </i>but he also poseed the questions <i>"Why? What's so special about Europe? What are the things that could be going into the reasoning behind Europe growing more slowly than the other parts of the world?" </i>He went on to discuss competition (from Oxford Nanopore) as being a factor, but most telling was something he picked up on from the Illumina conference call when their results were announced at the end of Q2<i> "Europe is slowing is because of sharing of devices"</i>!</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">I'd wholly subscribe to the </span><i style="font-family: Georgia, "Times New Roman", serif;">"glut of capacity and increased use of outsourcing" </i><span style="font-family: "georgia" , "times new roman" , serif;">hypothesis. If the glut does not go away, and if labs continue to move to outsourcing then Illumina will sell decreasing numbers of instruments and service contracts, but consumables pull-through should be higher from each box. Ultimately I think this is a win for Illumina as more science will be published using their technology - and that's really what we all want.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">I run a core lab, and I know lots of other people who do in Europe, Africa, and across the world. Sharing Illumina (and other) instruments in core labs has been a part of science for a very long time. It makes good sense scientifically and economically <span style="color: #666666;">(I know I'm biased)</span>. And from where I sit I can see many Illumina sequencers gathering dust <span style="color: #666666;">(metaphorically speaking)</span>, or being run at 25% or lower utilisation. People got the funding to buy these amazing devices; but not the money to staff the lab, to service them, and to fund the projects to run on them. Perhaps worse is the <a href="https://en.wikipedia.org/wiki/Opportunity_cost">opportunity cost</a> of the lost science; science that could not be done because someone spent money on a sequencer rather than sequences.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">Maybe instrument sales have slowed down in Europe because we've got wise to this problem, maybe scientists in Europe have seen how great great core labs can be, and that shared devices with high utilisation is a good thing for science in general. </span>But what happens, if I'm right, when the rest of the world realises it has too many sequencers but not as many results as they'd expected, and focuses on buying sequences rather than sequencers?</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Will users continue to purchase consumables at an ever increasing rate: </b></span><span style="font-family: "georgia" , "times new roman" , serif;">Illumina's business model has been described as </span><i style="font-family: georgia, "times new roman", serif;">"a simple razor and blade model: Illumina makes one-time sales of large machines at lower margins, then provides consumables needed for use in their operation on an ongoing basis."</i></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">A Tweet earlier in the year from </span><a href="https://twitter.com/kinghorngenomes" style="font-family: Georgia, "Times New Roman", serif;">Kinghorn Genomics</a><span style="font-family: "georgia" , "times new roman" , serif;"> is one of the few public figures I've seen for actual sequencing throughput on an X Ten. 1100 genomes in one month is astounding, but still 20-30% short of the 1500 per month figure in </span><a href="http://www.illumina.com/systems/hiseq-x-sequencing-system/system.html" style="font-family: Georgia, "Times New Roman", serif;">Illumina's specs</a><span style="font-family: "georgia" , "times new roman" , serif;">. Very few owners openly discuss the numbers of samples going through their instruments, and Illumina are very cagey about reagent pull-through in individual labs. It seems pretty clear if X Ten labs simply can't pull in the required numbers of samples to match Illumina's specs. But Kinghorn Genomics is at the high end of reagent pull through at < 70% utilisation.</span></div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgW3oZ80sqip-yG3Tk9wFN5b60gXyGlzUi95SCsEBRwumLDN-zrV5bJ1vxvHcTy_0810XGWj88kc3Y09qjcMJSzZvkdv4btRQdOc_1UNuHj8V_AwvE6O4XIH7ZlHBvwUFGjnAl4aCIWbSRn/s1600/Screen+Shot+2016-07-18+at+11.43.00.png"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgW3oZ80sqip-yG3Tk9wFN5b60gXyGlzUi95SCsEBRwumLDN-zrV5bJ1vxvHcTy_0810XGWj88kc3Y09qjcMJSzZvkdv4btRQdOc_1UNuHj8V_AwvE6O4XIH7ZlHBvwUFGjnAl4aCIWbSRn/s320/Screen+Shot+2016-07-18+at+11.43.00.png" /></span></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Illumina's consumables are a highly profitable business with gross profits around 70%, and these margins have been at that level for as long as I can remember. I don't want to skip over the fact that Illumina has also invested heavily in R&D, and is investing heavily in the clinical adoption of it's core technology in the clinical space via Helix and Grail. So some of that 70% margin is going somewhere that is likely to be useful to me in th<b>e future. </b></span><b><span style="font-family: "georgia" , "times new roman" , serif;">But lllumina have cited weakness in the HiSeq franchise outside of the X – both instrument shipments and consumables. Regent pull-through on HiSeq was below their estimates of ~$350K per year. Again pointing to a glut of sequencers, rather than sequencing projects. So reagents are perhaps the most important thing for Illumina to focus on.</span></b></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span></div>
<div style="text-align: justify;">
<ul style="font-family: Times; text-align: left;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span>
<li style="font-family: times;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">Total revenue increased by $93.9 million to $1,171.9 million in the first half of 2016; up by 9% over 2015.</span></span></span></li>
<span style="font-family: "georgia" , "times new roman" , serif;">
<li><span style="font-family: "georgia" , "times new roman" , serif;">Consumables revenue (63% of total) <u>increased</u> by $128.4 million to $740.1 million in the first half of 2016; up by 21% over 2015 <b><i>"driven by growth in the sequencing instrument installed base"</i></b>.</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">Instrument revenue (20% of total) <u>decreased</u> by $58.5 million to $243.2 million in the first half of 2016; down</span><span style="font-family: "georgia" , "times new roman" , serif;"> by 19% over 2015</span><span style="font-family: "georgia" , "times new roman" , serif;"> <b><i>"primarily due to lower shipments of our high-throughput platforms"</i></b>.</span></span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">Service and other revenue (15% of total) increased by $23.2 million to $179.2 million in the first half of 2016; up</span><span style="font-family: "georgia" , "times new roman" , serif;"> by 15% over 2015</span><span style="font-family: "georgia" , "times new roman" , serif;"> <b><i>"driven by revenue from genotyping services and extended instrument service contracts associated with a larger sequencing installed base"</i></b>.</span></span></li>
</span></ul>
<div>
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span>
<span style="font-family: "georgia" , "times new roman" , serif;"><b>What locks us into Illumina:</b> Capital costs are very high in replacing an Illumina fleet, my own lab has around £2 million invested (2x HiSeq 4000, 1x HiSeq 2500, 1x NextSeq, 2x MiSeq) - we couldn't simply go out and buy machines from another vendor, even if there were one. The real tie in is the infrastructure we've built up around the use of Illumina sequencing. Users are unlikely to switch until there is a really good competitor out there...and Life Tech's SOLiD and Ion Torrent technologies just were not good enough.<br />
<br /><b>Predicting the future: </b>For the future I'm as confident as everyone else that NGS usage is going up, bigger projects, more samples, more sequencing, more data - that's a great scenario for Illumina. They might be a bit stuck with the next big leap in instrument yields, as this would need to jump significantly to make labs like mine purchase new boxes, and that could land them back in the same position as they were in 2011. If the economic case for a new machine can't be made then labs will find it hard to get funding for incremental changes. And if Illumina do make a big leap then many labs may prefer to share the infrastructure costs, and aim bring down experimental costs. Where do Illumina go in the research space next if they can't bring us cheaper sequencing?</span><br />
<i><br /></i>
<br />
<div style="text-align: center;">
<i><i><span style="font-family: "times" , "times new roman" , serif; font-size: large;">Q: What will Illumina announce at J.P.Morgan? A new sequencer? The $500 genome? Nanopores?</span></i></i></div>
<i>
</i>
<br />
<div>
<i><i><br /></i></i></div>
<i>
</i><span style="font-family: "georgia" , "times new roman" , serif;">The use of NGS in oncology might take ten years to become profitable given the pace at which healthcare systems can adopt new technologies. I know from my experiences of the NHS that a few labs can be leading lights, but the majority need to be dragged into accepting change of almost any kind. Oncology is tough, but is a huge, and highly profitable, market so the effort from Illumina is likely to be worth it. Illumina certainly think so; </span><span style="font-family: "georgia" , "times new roman" , serif;"><a href="http://seekingalpha.com/article/3982698-illuminas-20-200-billion-quest-miss-opportunity">SeekingAlpha</a></span><span style="font-family: "georgia" , "times new roman" , serif;"> quoted Francis deSouza (Illumina CEO) as saying </span><i style="font-family: georgia, "times new roman", serif;">"We spent a decade selling instruments to researchers who are experts and understand genomics. Now we're seeing applications take off, which is a much bigger market for us."</i><span style="font-family: "georgia" , "times new roman" , serif;"> Whether the recent stock fall was partly because the markets see the realisation of this </span><i style="font-family: georgia, "times new roman", serif;">"bigger market"</i><span style="font-family: "georgia" , "times new roman" , serif;"> as being too much of a future gamble is unclear to me. </span><a href="http://www.illumina.com/clinical/reproductive-genetic-health.html" style="font-family: georgia, "times new roman", serif;">Verinata</a><span style="font-family: "georgia" , "times new roman" , serif;">, </span><a href="http://www.grailbio.com/" style="font-family: georgia, "times new roman", serif;">Grail</a><span style="font-family: "georgia" , "times new roman" , serif;"> and </span><a href="https://www.helix.com/" style="font-family: georgia, "times new roman", serif;">Helix</a><span style="font-family: "georgia" , "times new roman" , serif;"> are really exciting ventures, but how quickly can they add to Illumina's revenues and profits?</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">The rapid adoption of NGS in NIPT might shed some light on the future. Verinata is now contributing high single digit percentages to Illumina's revenues, and this could reach 10% as soon as 2018. I'd highly reccomend anyone who can get access to BBC Player to watch the <a href="http://www.bbc.co.uk/news/magazine-37500189">"A world without Downs"</a> documentary!</span></div>
<div style="font-family: Times; text-align: start;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="font-family: georgia, "times new roman", serif; text-align: start;">
<div style="text-align: justify;">
<div style="font-family: georgia, "times new roman", serif;">
<div style="font-family: georgia, "times new roman", serif;">
<span style="font-family: "georgia" , "times new roman" , serif;">I thought I'd finish up with a look to the future; particularly to the other NGS technology that we might be using alongside Illumina routinely by 2020 - <a href="https://nanoporetech.com/">Oxford Nanopore</a>. The technology, soon to be "a genome centre in a box", and possibly <a href="https://twitter.com/erlichya/status/788838980238344192">iPhone compatible</a>, is starting to gain traction outside of the hardcore fanboys and fangirls like Nick Loman and Josh Quick. Right now it is certainly an unproven, in the commercial sense; closed-community, the MinION is available commercially, but users are generally talking in the Nanopore forum; and niche tool. But R9 makes Nanopore sequencing easy, and the most recent updates from Clive Brown point to a future where we might use Nanopores alongside SBS. If the ONT tech is truly disruptive then there is a future that may be decidedly less longer orange! </span></div>
<div style="font-family: georgia, "times new roman", serif;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="font-family: georgia, "times new roman", serif;">
<span style="font-family: "georgia" , "times new roman" , serif;">I'd not </span><span style="font-family: "georgia" , "times new roman" , serif;">want to forget to mention <a href="http://www.pacb.com/products-and-services/pacbio-systems/sequel/">Pacific Bioscience</a> now that Sequel appears to be getting some traction (over 100 instruments sold since the launch compared to 100-150 RSIIs). And the 50x drop in DNA required is going to make this a tool people with limited sample availability can now consider using.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<div style="font-family: georgia, "times new roman", serif;">
<span style="font-family: "georgia" , "times new roman" , serif;">But w</span><span style="font-family: "georgia" , "times new roman" , serif;">e should not forget that Illumina is a company that can deliver on innovation. Whilst Illumina did not invent SBS - Solexa, a small UK company, did; Illumina turned Solexa's $2.5 million revenues in 2006, into a $100 million business, in one year! Many readers will remember the release of the HiSeq, MiSeq, NextSeq, X Ten - all significant leaps for genomics; and I'm betting they've got some pretty cool tech up their sleeves yet.</span></div>
</div>
<div style="font-family: georgia, "times new roman", serif;">
<br /></div>
<div style="font-family: georgia, "times new roman", serif;">
<b>Finally: </b>Do you think there are too many sequencers out there? Should we focus on buying <i>sequences</i> rather than <i>sequencers</i>? If the majority of users answer yes to these questions then sequencer sales may well continue to decline in the short term. But reagent pull-through on each box should increase, and Illumina's focus for research sequencing might shift to "blades rather than razors", on driving utilisation of their instal base up.</div>
</div>
</div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com358tag:blogger.com,1999:blog-6334453475526523597.post-5104820759308029672016-10-21T08:00:00.000+01:002016-10-21T08:00:40.620+01:00Controlling for bisulfite conversion efficiency with a 1% Lamda spike-in<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The use of DNA methylation analysis by NGS has become a standard tool in many labs. In a project design discussion we had today somebody mentioned the use of a control for bisulfite conversion efficiency that I'd missed, as its such a simple one I thought I'd briefly mention it here. In their </span><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3630097/" style="font-family: Georgia, "Times New Roman", serif;">PLoS Genet 2013</a><span style="font-family: "georgia" , "times new roman" , serif;"> paper, Shirane <i>et al</i> from <a href="https://twitter.com/kyushuuniv_jp">Kyushu University</a> spiked-in </span><a href="https://www.promega.co.uk/products/biochemicals-and-labware/nucleic-acids/unmethylated-lambda-dna" style="font-family: Georgia, "Times New Roman", serif;">unmethylated lambda phage DNA (Promega)</a><span style="font-family: "georgia" , "times new roman" , serif;"> to control for, and check, the C/T conversion rate was greater than 99%.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: center;">
<span style="font-family: "georgia" , "times new roman" , serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpFk6j47_d6NvydYkBwz0fiJGlO6VwUAgwyhaghAUwmiztlWqKTrkisSP9ZtGttBcSLPQwUA-YBub2zPDVpv19CfI62gGJ2FtBb1-UDcUHLTeQMOSCKB2GBmjVM_urLskW1ZVQ-BSiUUR4/s1600/Wikipedia+Bisulfite-reaction.png"><img border="0" height="133" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpFk6j47_d6NvydYkBwz0fiJGlO6VwUAgwyhaghAUwmiztlWqKTrkisSP9ZtGttBcSLPQwUA-YBub2zPDVpv19CfI62gGJ2FtBb1-UDcUHLTeQMOSCKB2GBmjVM_urLskW1ZVQ-BSiUUR4/s400/Wikipedia+Bisulfite-reaction.png" width="400" /></a></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The bisulfite conversion of cytosine bases to uracils, by deamination of unmethylated cytosine (as shown above) is the gold standard for methylation analysis. </span><br />
<a name='more'></a><br />
<span style="font-family: "georgia" , "times new roman" , serif;">Users identify the C/T transitions in a comparison of bisufite treated/untreated samples, or by comparing to a known reference. However bisulfite treatment is a harsh biochemical reaction, and can cause large losses in template DNA. As such controlling for and measuring conversion efficiency is important in making conclusions about the methylation data from NGS experiments. As a reminder - Bisulfite does not convert methylated or hydroxy-mehtylated cytosine allowing users to discriminate between non-methylcytosine (C) and <a href="https://en.wikipedia.org/wiki/5-methylcytosine">methylcytosine</a> (mC) or <a href="https://en.wikipedia.org/wiki/5-Hydroxymethylcytosine">hydroxymethylated</a> (hmC) cytosine.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span><span style="font-family: "georgia" , "times new roman" , serif;"></span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
We're likely to start using this control if it works well in the project we have just kicked off. In the paper they added 1ng of to 1000 oocytes before performing a <a href="https://www.ncbi.nlm.nih.gov/pubmed/22649061">PBAT</a> analysis. We'll aim for 1% spike-in, but need to consider how much to add to each sample, and whether Lambda is the right spike-in as we're using an RRBS method or this project. To check the suitability I grabbed the <a href="https://www.ncbi.nlm.nih.gov/nuccore/215104?report=fasta">Lambda sequence from Genbank</a> and did an <i>in silico</i> <a href="https://www.neb.com/products/r0106-mspi">Msp1</a> digest using <a href="http://rna.lundberg.gu.se/cgi-bin/cutter2/cutup">WebCutter2.0</a>. I found <a href="http://rna.lundberg.gu.se/cgi-bin/cutter2/cutter#TableByEnzyme">330 cut sites</a> - which should be plenty for checking efficiency.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Want to learn more about bisulfite conversion in general? Take a look at <a href="http://www.zymoresearch.com/bisulfite-beginner-guide">Zymo</a>'s website, it's an excellent resource.</div>
</span></div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com248tag:blogger.com,1999:blog-6334453475526523597.post-26634370724890103732016-10-17T10:14:00.000+01:002016-11-02T11:54:12.957+00:00SIRVs: RNA-seq controls from @Lexogen<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="background-color: white; color: #999999; font-size: 13.2px;">This article was </span><span style="background-color: white; color: #999999; font-size: 13.2px;"><a href="http://core-genomics.blogspot.co.uk/2016/07/core-genomics-is-going-cor-porate-sort.html">commissioned</a> by</span><span style="background-color: white; color: #999999; font-size: 13.2px;"> Lexogen </span></span><span style="color: #999999; font-family: "georgia" , "times new roman" , serif;"><span style="font-size: 13.2px;">GmbH</span></span><span style="background-color: white; color: #999999; font-family: "georgia" , "times new roman" , serif; font-size: 13.2px;">.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;">My lab has been performing RNA-seq for many years, and is currently building new services around single-cell RNA-seq. </span><a href="https://www.fluidigm.com/products/c1-system" style="font-family: Georgia, "Times New Roman", serif;">Fluidigm’s C1</a><span style="font-family: "georgia" , "times new roman" , serif;">, academic efforts such as </span><a href="http://www.cell.com/abstract/S0092-8674(15)00549-8" style="font-family: Georgia, "Times New Roman", serif;">Drop-seq</a><span style="font-family: "georgia" , "times new roman" , serif;"> and </span><a href="http://www.cell.com/cell/abstract/S0092-8674(15)00500-0" style="font-family: Georgia, "Times New Roman", serif;">inDrop</a><span style="font-family: "georgia" , "times new roman" , serif;">, and commercial platforms from </span><a href="http://www.10xgenomics.com/single-cell/" style="font-family: Georgia, "Times New Roman", serif;">10X Genomics</a><span style="font-family: "georgia" , "times new roman" , serif;">, </span><a href="http://www.dolomite-bio.com/applications/single-cell-rna-seq/" style="font-family: Georgia, "Times New Roman", serif;">Dolomite Bio</a><span style="font-family: "georgia" , "times new roman" , serif;">, </span><a href="http://www.wafergen.com/products/icell8-single-cell-system" style="font-family: Georgia, "Times New Roman", serif;">Wafergen</a><span style="font-family: "georgia" , "times new roman" , serif;">, </span><a href="http://www.illumina.com/company/news-center/press-releases/press-release-details.html?newsid=2128278" style="font-family: Georgia, "Times New Roman", serif;">Illumina/BioRad</a><span style="font-family: "georgia" , "times new roman" , serif;">, </span><a href="http://raindancetech.com/droplet-microfluidic-technology-for-single-cell-high-throughput-screening" style="font-family: Georgia, "Times New Roman", serif;">RainDance</a><span style="font-family: "georgia" , "times new roman" , serif;"> and others makes establishing the technology in your lab relatively simple. However the data being generated can be difficult to analyse and so we’ve been looking carefully at the controls we use, or should be using, for single-cell, and standard, RNA-seq experiments. The three platforms I’m considering are the <a href="https://www.lexogen.com/sirvs">Lexogen SIRVs</a> (Spike-In RNA Variants), or </span><a href="http://www.sequin.xyz/" style="font-family: Georgia, "Times New Roman", serif;">SEQUINs</a><span style="font-family: "georgia" , "times new roman" , serif;">, or <a href="https://www.nist.gov/programs-projects/ercc-20-developing-new-suite-rna-controls">ERCC 2.0</a> (External RNA Controls Consortium) controls. </span><span style="font-family: "georgia" , "times new roman" , serif;">All are based on synthetically produced RNAs that aim to mimic complexities of the transcriptome: Lexogen’s SIRVs are the only controls that are currently available commercially; ERCC 2.0 is a developing standard (Lexogen is one of the groups contributing to the discussion), and SEQUINs for <a href="http://www.nature.com/nmeth/journal/v13/n9/pdf/nmeth.3958.pdf">RNA</a> </span><span style="font-family: "georgia" , "times new roman" , serif;">and </span><a href="http://www.nature.com/nmeth/journal/v13/n9/pdf/nmeth.3957.pdf" style="font-family: Georgia, "Times New Roman", serif;">DNA</a><span style="font-family: "georgia" , "times new roman" , serif;"> were only recently published in Nature Methods.</span></div>
<div>
<div class="msocomtxt" id="_com_3" language="JavaScript">
<div class="MsoCommentText">
<o:p><br />
</o:p></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://www.lexogen.com/award-2016-2/" style="margin-left: auto; margin-right: auto;"><img border="0" height="134" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxhQzhsKvyfLwYVK9gr-i8zSRyrRck8nGbg5VP8SjOU8u3XxpTIYkSyo9mY66t4Tb8iIYrpX5OXnb71zyCW3HAmQ6wrJzkqfcxI1nCbsclChY2Ein_XzYFzYdeLMj4dFBvcFV5OrFCnII4/s640/Lexogen+award+2016.png" width="500" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><div class="p1">
<span class="s1">You can win a free lane of HiSeq 2500 sequencing of your own RNA-seq libraries (with SIRVs of course) by applying for the <a href="https://www.lexogen.com/award-2016-2/"><span class="s2">Lexogen Research Award</span></a></span></div>
</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Lexogen’s SIRVs are probably the most complex controls available on the market today as they are designed to assess alternative splicing, alternative transcription start and end sites, overlapping genes, and antisense transcription. They consist of seven artificial genes in-vitro transcribed as multiple (6-18) isoforms to generate a total of 69 transcripts. Each has a 5’triphosphate and a 30nt poly(A)-tail, enabling both mRNA-Seq and TotalRNA-seq methods. Transcripts vary from 191 to 2528nt long and have variable (30-50%) GC-content.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Want to know more: </b>Lexogen are hosting a webinar to describe SIRVs in more detail on October 19th: <a href="https://www.labroots.com/ms/webinar/controlling-rna-seq-experiments-using-spike-in-rna-variants">Controlling RNA-seq experiments using spike-in RNA variants</a>. They have also uploaded a manuscript to <b><a href="http://biorxiv.org/content/early/2016/10/13/080747">BioRxiv</a></b> that describes the evaluation of SIRVs and provides links to the underlying RNA-Seq data. As a Bioinformatician you might want to download this data set and evaluate the SIRV reads yourself. Or read about <a href="http://biorxiv.org/content/early/2016/09/08/073692">how SIRVs are being used in single-cell RNA seq</a> in the latest paper from <a href="http://www.teichlab.org/">Sarah Teichmann’s group</a> at <a href="http://www.ebi.ac.uk/research/teichmann">EBI</a>/<a href="http://www.sanger.ac.uk/science/groups/teichmann-group">Sanger</a>.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Before diving into a more in-depth description of the Lexogen SIRVs, and how we might be using them in our standard and/or single-cell RNA-seq studies, I thought I’d start with a bit of a historical overview of how RNA controls came about...and that means going back to the days when microarrays were the tool of choice and NGS had yet to be invented!</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<a name='more'></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br />
</b> <b>RNA quality control – MAQC: </b>The use of controls is recommended in any experiment, and the lack of them is one of the oft cited reasons for the current <a href="http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970">reproducibility crises</a>. Nearly everyone who’s worked on differential gene expression in the last fifteen years has heard of the <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3272078">MAQC</a> (MicroArray Quality Control) study. Although four sources of RNA were evaluated <a href="http://www.genomics.agilent.com/article.jsp?pageId=1452">Stratagene’s Universal Human Reference RNA</a> and <a href="https://www.thermofisher.com/order/catalog/product/AM7962">Ambion’s Human Brain RNA</a> samples were chosen because of the number of genes expressed at a detectable level, and the size of the fold changes between the two samples. These two control samples were used to evaluate five microarray platforms, in an international project involving 137 participants from 51 organisations (<a href="http://www.nature.com/nbt/journal/v24/n9/full/nbt1239.html">see Nat Biotech 2006</a>). Labs like mine adopted, and continue to use the MAQC controls in our differential gene expression pipelines, which today are almost all based on RNA-seq methods. We used them in my lab to show how <a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-11-540">detection sensitivity drops as RNA inputs are reduced to under 100ng</a> <span style="color: #999999;">(something I keep meaning to repeat with RNA-seq)</span>.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The move to RNA-seq has had a dramatic impact on our ability to perform complex experiments. We are no longer limited to asking questions about the differential expression of genes where we have sequence information available to make an array. RNA-seq allows us to analyse the whole transcriptome; to assess differential gene expression (oligo-dT enriched mRNA-seq is the most widely used method), as well as differential splicing, allele specific expression, polyA tail length, transcription initiation and termination, microRNA, lincRNA, etc, etc, etc <span style="color: #666666;">(see my "wish list" for controls at the bottom of this post)</span>.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The MAQC controls we used are simply not up to the more complex job that RNA-seq presents. Both the <a href="http://www.nature.com/nbt/journal/v32/n9/full/nbt.2972.html">ABRF</a> and <a href="http://www.nature.com/nbt/journal/v32/n9/full/nbt.2957.html">SEQC</a> papers used MAQC samples, which are admixtures of multiple individuals (<a href="http://core-genomics.blogspot.co.uk/2014/08/seqc-kills-microarrays-not-quite.html">I discussed these limitations in a 2014 post</a>), but both included the ERCC controls as well.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Newer, more carefully designed and manufactured controls are available that can better serve the needs to biologists; and this is where SIRVs come in.</span></div>
<br />
<div style="mso-element: comment-list;">
<div style="mso-element: comment;">
<div class="msocomtxt" id="_com_8" language="JavaScript">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1qZLtX7XhI5B-lO6jwW1S89iMA1WyCM9USI63P6qll6ZLYdiKs7LJbMo_foQu1BJKfkS6k0eTjXk3ApOBeEU43xtlZgTVFBd3ZWWOo9gx7s-xzfqBIK2jfYMWLqVS8yVqHbZn8W141UtI/s1600/Fig.03+SIRV+workflow.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="134" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1qZLtX7XhI5B-lO6jwW1S89iMA1WyCM9USI63P6qll6ZLYdiKs7LJbMo_foQu1BJKfkS6k0eTjXk3ApOBeEU43xtlZgTVFBd3ZWWOo9gx7s-xzfqBIK2jfYMWLqVS8yVqHbZn8W141UtI/s640/Fig.03+SIRV+workflow.png" width="500" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The SIRV workflow: from sample to answer</td></tr>
</tbody></table>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>RNA quality control – Lexogen and beyond: </b>SIRVs are designed to represent much of, but not all of, the complexity of Eukaryotic transcriptomes e.g. differential gene expression, differential splicing, polyA tail length variation, GC content, etc. </span><span style="font-family: "georgia" , "times new roman" , serif;">SIRVs are designed to be added to samples before RNA extraction, or starting the RNA-seq library prep. They should allow an objective assessment of the technical biases in library preparation, sequencing and analysis; and ultimately should improve our ability to make biological insights from comparison of experimental conditions. They are a huge leap forward from the MAQC controls, and a significant step ahead of the ERCC1.0 controls, which are restricted to single-exon transcripts.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br />
</b></span> <span style="font-family: "georgia" , "times new roman" , serif;"><b>How are SIRVs made: </b>SIRVs were designed to be similar to Human gene structures with overlapping multi-exon genes that are transcribed in both sense and antisense, with alternative splicing and alternative transcription start and end sites. Genes are in-vitro transcribed from linearized plasmids to produce full-length transcripts which are subject to very careful quality control and quantitation. This includes spectrophotometric, molecular weight, and Agilent Bioanalyser analyses. After QC and QT SIRV transcripts are mixed at equimolar concentrations (E0), or at 8-fold (E1) or 128-fold (E2) variations.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <br />
<div class="msocomtxt" id="_com_8" language="JavaScript" style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div style="text-align: justify;">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4QMKft2GDxfzb62NnB0KSkb9sDVL0mK-fuSiTbRBrf4yO3KdbhiGyM4LObJm5CcWwaNzDQUDdXKA9-Yh181DfPfeN1tizjhaHEdQrXRtUUf6IHIO6UOViuta3UZYCtlBRjqGUMnaeUFZV/s1600/Fig.01+SIRV-1+and+KLK5+complete.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="165" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4QMKft2GDxfzb62NnB0KSkb9sDVL0mK-fuSiTbRBrf4yO3KdbhiGyM4LObJm5CcWwaNzDQUDdXKA9-Yh181DfPfeN1tizjhaHEdQrXRtUUf6IHIO6UOViuta3UZYCtlBRjqGUMnaeUFZV/s320/Fig.01+SIRV-1+and+KLK5+complete.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Designing SIRVs: A comparison of SIRV1 and <i>KLK5</i></td></tr>
</tbody></table>
</div>
</div>
</div>
</div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br />
</b></span> <span style="font-family: "georgia" , "times new roman" , serif;"><b>How are SIRVs used: </b>Spiking SIRVs into your samples requires some careful consideration of how you’ll use the data they provide in downstream assessment. Today the most important control in my lab is simply whether the library prep has worked, or more importantly where it did not work whether it was the lab or the sample that was the cause of the failure. Our use of MACQ controls on a plate of samples is great, but extending this to an internal control in every sample is going to be better. However I don’t want controls to dominate the experiment or they’ll add too much to the costs of library preparation and sequencing.</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">SIRVs themselves don’t need much data to generate useful results and around 1% of your sequencing reads should be sufficient for most labs. However determining how much SIRV mix to add to your samples before extraction, or your RNA before library prep can require some empirical testing as the amount of RNA in a sample or a cell differs so much. As a rule of thumb 95% of RNA is ribosomal RNA’s, and the other 5% is mRNA (and non-coding RNAs). For an experiment starting with 100ng of TotalRNA in an mRNA-seq workflow approximately 50pg would represent 1% of the 5ng of mRNA present.</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">SIRVs are available in three configurations E0, E1 & E2 that mix the in vitro transcribed RNAs at equimolar (mix E0), up to 8-fold (mix E1), or up to 128-fold (mix E2), variation in concentration. </span><span style="font-family: "georgia" , "times new roman" , serif;">Importantly SIRVs are built in a modular format and should be compatible to other spike in controls like the ERCC. Additional modules should address transcript lengths, polyA tail length variation, etc.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;">Coinciding with the <a href="https://www.labroots.com/ms/webinar/controlling-rna-seq-experiments-using-spike-in-rna-variants">webinar on October 19th</a><span style="font-family: "georgia" , "times new roman" , serif;">, Lexogen will release the <a href="https://www.lexogen.com/sirvs">“SIRVs suite”</a></span><span style="font-family: "georgia" , "times new roman" , serif;"> <span style="color: #666666;">(see "</span></span></span><b style="font-family: georgia, "times new roman", serif;"><span style="color: #666666;">How are SIRVs analysed" </span></b><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="color: #666666;">below)</span> for analysis of spike-in data. This will also include </span></span><span style="font-family: "georgia" , "times new roman" , serif;">an "Experiment Designer" tool to calculate recommended spike-in ratios based on known or expected input for the RNA content, mRNA ratio, and type and efficiency of the workflow.</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>SIRVs in bulk RNA-seq: </b>Bulk RNA-seq experiments can use SIRVs as process controls in place of the MAQC Brain and UHRR samples allowing a full 96 samples to be run on each plate. Assuming the 100ng TotalRNA input then just 50pg of SIRVs are needed per sample, with 5ng added to the oligo-dT master-mix used in the enrichment step. The use of SIRV E0 is recommended for process QC, but E1 and E2 may be useful when evaluating new methods for accuracy and precision of differential transcript detection and quantitation.</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript" style="text-align: justify;">
<b style="font-family: Georgia, "Times New Roman", serif;">SIRVs in scRNA-seq: </b><span style="font-family: "georgia" , "times new roman" , serif;">Single-cell RNA-seq has quickly adopted spike-in controls with Hashimshony et al presenting their use of ERCC spikes in the <a href="http://www.sciencedirect.com/science/article/pii/S2211124712002288">CELSeq protocol</a>. Both <a href="http://www.nature.com/nmeth/journal/v11/n1/full/nmeth.2694.html">Wu et al 2013</a> and <a href="http://www.nature.com/nature/journal/v509/n7500/full/nature13173.html">Truetlein et al 2014</a></span><span style="font-family: "georgia" , "times new roman" , serif;"> used the ERCC mixes at a 1:40,000 dilution spiked into the cell lysis mix of the Fluidigm C1 protocol. And <a href="http://biorxiv.org/content/early/2016/09/08/073692">Svensson <i>et al</i></a> use the</span><span style="font-family: "georgia" , "times new roman" , serif;"> ERCC and SIRV spikein's</span><span style="font-family: "georgia" , "times new roman" , serif;"> to </span><span style="font-family: "georgia" , "times new roman" , serif;">assess sensitivity and accuracy of various protocols across a standard analysis pipeline.</span><span style="font-family: "georgia" , "times new roman" , serif;"> This demonstrates the utility of using RNA control spike-ins, but also the requirement for careful dilution to avoid swamping single-cell RNA-seq experiments with control data, or not having enough to QC data before interpreting results. Assuming each single cell has around 20pg of TotalRNA then just 200fg of SIRVs are needed per sample, the amount of SIRV added, and exactly where to add it the protocol is highly dependent on the single-cell RNA-seq protocol being used.</span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <br />
<div style="text-align: center;">
<span style="font-family: "georgia" , "times new roman" , serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNWyyan0joZhFlGX0FN6llY7_HgEoyVCu7wVnewAWisK7_rYDo0a3C0r_RY5CA6DMLNKTzyhiumJTJQ5PV3r37HZWxlUcmsi3mMoH2L3GG8Mgx-DrHO60Uta40oiUALw1DxQFiljkFcGpJ/s1600/Screen+Shot+2016-09-28+at+10.51.49.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNWyyan0joZhFlGX0FN6llY7_HgEoyVCu7wVnewAWisK7_rYDo0a3C0r_RY5CA6DMLNKTzyhiumJTJQ5PV3r37HZWxlUcmsi3mMoH2L3GG8Mgx-DrHO60Uta40oiUALw1DxQFiljkFcGpJ/s400/Screen+Shot+2016-09-28+at+10.51.49.png" /></a></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>How are SIRVs analysed: </b></span></span><span style="font-family: "georgia" , "times new roman" , serif;">Lexogen will release the <a href="https://www.lexogen.com/sirvs">Galaxy-based “SIRVs suite”</a></span><span style="font-family: "georgia" , "times new roman" , serif;"> for uploading, evaluating and comparing spike-in data. This will allow SIRV users to compare results from their experiments to anonymised data, and should help determine if your own experiment is any good. Back in </span><span style="font-family: "georgia" , "times new roman" , serif;">2003/4</span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><span style="font-family: "georgia" , "times new roman" , serif;">I developed </span><a href="http://www.eposters.net/pdfs/rptdb-a-prototype-affymetrix-rpt-qc-tool.pdf" style="font-family: georgia, "times new roman", serif;">rptDB</a><span style="font-family: "georgia" , "times new roman" , serif;">: a tool </span><span style="font-family: "georgia" , "times new roman" , serif;">to compare QC data between Affymetrix arrays. This had over 3500 samples submitted to it, and allowed a quick easy call on whether your data was "good" or "bad" - highly context dependant of courrse!</span><span style="font-family: "georgia" , "times new roman" , serif;"> As a user if I had received data from a core lab or service provider, or were downloading RNA-seq data for meta-analysis, then being able to select only data where SIRV, or other, controls had been used, and where results were shown to be high-quality, would most likely save me considerable time in cleaning up data before starting.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;">SIRVS are not designed to be used as a normalisation tool. Whilst spike-ins have been considered they are not really reliable enough for standard normalisation procedures. The development of novel normalisation algorithms appears to offer hope for the future (see </span><a href="http://www.nature.com/nbt/journal/v32/n9/full/nbt.2931.html" style="font-family: georgia, "times new roman", serif;">Risso 2014</a><span style="font-family: "georgia" , "times new roman" , serif;">), and approaches like this might be applicable to SIRVs. I suspect this will be an active area of algorithm development over the next couple of years because of the huge interest in single-cell RNA-seq.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<br /></div>
</div>
</div>
<div class="msocomtxt" id="_com_8" language="JavaScript">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>The competition: alternative RNA-seq controls</b></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span></div>
<div class="msocomtxt" id="_com_8" language="JavaScript">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Sequins:</b> </span><a href="http://www.sequin.xyz/" style="font-family: Georgia, "Times New Roman", serif;">‘Sequins’</a><span style="font-family: "georgia" , "times new roman" , serif;"> (sequencing spike-ins) were developed by the Garvan Institute and recently published in Nature Methods. Sequins are conceptually similar to SIRVs. They are a set of synthetic RNA isoforms that align to an artificial <i>in silico</i> chromosome, with no homology to known genomes. They represent full-length spliced mRNA isoforms, at a range of concentrations. They can be used to assess differential gene expression and alternative splicing pipelines. The authors state that sequins can by used for normalisation, and refer to the same <a href="http://www.nature.com/nbt/journal/v32/n9/full/nbt.2931.html">Nature Biotech</a> as I did above. In their Nature Methods paper they do show some very nice results from scaling normalisation using sequins and I hope these results will ultimately be achieveable with any well-designed spike-in series.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">In the back-to-back Nature Methods publications the team at Garvan show how sequins can be used in </span><a href="http://www.nature.com/nmeth/journal/v13/n9/pdf/nmeth.3958.pdf" style="font-family: Georgia, "Times New Roman", serif;">RNA-seq</a><span style="font-family: "georgia" , "times new roman" , serif;"> and </span><a href="http://www.nature.com/nmeth/journal/v13/n9/pdf/nmeth.3957.pdf" style="font-family: Georgia, "Times New Roman", serif;">DNA-seq</a><span style="font-family: "georgia" , "times new roman" , serif;"> experiments to asses biases and determine the limits of detection, quantitation and analytical methods. Sequin genes are mixed in a two-fold serial dilution, with a minimum three genes per dilution, to span an ~106-fold range. The team also developed 24 Sequins to represent cancer fusion genes and used these to assess fusion gene detection and quantitation. They also reported that split reads significantly outperformed read-pairs in their correlation with Sequin concentration – this has a significant impact on the sequencing format as many groups today use paired-end reads where longer single-end reads may be more sensitive, and would also be around 40% cheaper.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br />
</b></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>ERCC 2.0:</b> the original <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325234">ERCC1.0 controls</a> are a mix of 92 relatively simple single-exon transcripts of varying length and GC content. They are used in a mix at known concentrations spikedinto samples before library preparation. ERCC2.0 aims to update the spikes to better represent the complexity of the transcriptome, and to provide FFPE derived controls. Again they are are conceptually similar to SIRVs and Lexogen were one of 9 groups invited to present at the 2014 NIST ERCC2.0 workshop at Stanford University.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Conclusions: </b>The use of controls in RNA-seq experiments is an absolute requirement if you want to get the best out of your experiments. Bulk RNA-seq can benefit from a relatively simple data QC of the controls before moving onto more complex differential gene expression and splicing analyses. And including spike-in controls may allow easier comparison of longitudinal data sets, or between labs. Single-cell RNA-seq has shown an absolute requirement to include spike-ins, although the very latest papers suggest that spiked-in transcripts may not truly mirror Human mRNAs in the protocols used, due to much shorter poly-A tails (30 vs 200+bp), and that they may underestimate detection sensitivity by up to ten-fold.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">SIRVs, more recently SEQUINs, and soon ERCC2.0 controls can be further enhanced and manufacturers should not be consider their job complete! With protocols like <a href="http://www.nature.com/articles/ncomms12065">Pacific Bioscience’s ISO-seq</a> and the advent of <a href="http://biorxiv.org/content/early/2016/08/12/068809">Oxford Nanopores direct RNA-sequencing</a> longer and longer transcripts could be assessed and this will need to be controlled. Phased sequencing, possibly from long RNA molecules on 10X Genomics, is likely to need controls where variants are phased. Additionally PacBio and Nanopore sequencing also offer the ability to detect and quantify RNA base modifications. All of this shows how far the controls we might use still have to go.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>My RNA controls wish list:</b></span><br />
<ol>
<li><span style="font-family: "georgia" , "times new roman" , serif;">differential gene expression normalisation</span></li>
<span style="font-family: "georgia" , "times new roman" , serif;">
<li>differential splicing</li>
<li>allele specific expression</li>
<li>transcript and polyA tail length variation</li>
<li>GC content</li>
<li>transcription initiation and termination</li>
<li>non poly-adenylated RNAs e.g. microRNA, lincRNA</li>
<li>pseudogene mapping</li>
<li>limits of detection</li>
<li>RNA variant detection at different MAF</li>
<li>High-quality and degraded FFPE RNA</li>
<li>Spike-in's with corresponding baits for in-solution capture</li>
<li>Spike-in RNA encapsulated in synthetic cells</li>
<li>Phased variants on long RNAs</li>
<li>RNA base modifications</li>
</span></ol>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><b></b></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><b>Please let me know what you’d like to add by leaving a comment below.<a href="http://biorxiv.org/content/early/2016/10/13/080747">http://biorxiv.org/content/early/2016/10/13/080747</a></b></b></span></div>
<br />
<div style="mso-element: comment-list;">
<div style="mso-element: comment;">
<div class="msocomtxt" id="_com_20" language="JavaScript">
<!--[if !supportAnnotations]--></div>
<!--[endif]--></div>
</div>
<!--[if !supportAnnotations]--></div>
<!--[endif]--></div>
</div>
<!--[if !supportAnnotations]--></div>
<!--[endif]--></div>
<!--EndFragment--> <!--[if gte mso 9]><xml> <o:OfficeDocumentSettings> <o:RelyOnVML/> <o:AllowPNG/> </o:OfficeDocumentSettings> </xml><![endif]--> <!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-GB</w:LidThemeOther> <w:LidThemeAsian>JA</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:EnableOpenTypeKerning/> <w:DontFlipMirrorIndents/> <w:OverrideTableStyleHps/> <w:UseFELayout/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276"> <w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--> <!--[if gte mso 10]> <style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin:0cm;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style> <![endif]--> <!--StartFragment--> <!--EndFragment--></div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com5tag:blogger.com,1999:blog-6334453475526523597.post-10288891576571841172016-10-14T12:46:00.001+01:002016-10-21T07:57:03.309+01:00Batch effects in scRNA-seq: to E or not to E(RCC spike-in)<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">At the recent Wellcome Trust conference on Single Cell Genomics (<a href="https://twitter.com/search?q=%23SCGen16">Twitter #scgen16</a>) there was a great talk <a href="https://speakerdeck.com/stephaniehicks/towards-progress-in-batch-effects-and-biases-in-single-cell-rna-seq-data">(her slides are online)</a> from Stephanie Hicks in the <b><a href="https://twitter.com/rafalab">@irrizarry</a> </b><a href="http://rafalab.github.io/">group</a> (Department of Biostatistics and Computational Biology at Dana-Farber Cancer Institute). Stephanie was talking about the recent work she's been doing looking at batch effects in single-cell data, all of which you can read about in her paper is on the BioRxiv: <a href="http://biorxiv.org/content/early/2015/12/27/025528"><b>On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data</b></a>. You can also read about this paper over at <a href="http://nextgenseek.com/2016/01/paper-summary-systematic-bias-and-batch-effects-in-single-cell-rna-seq-data">NExtGenSeek</a>.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwbipi0h9B53sKlI4pFfTmpTWSiiGgkA-E80Rp4UaIMZfD_6s5m92RnnFu4xUN_7Z4KZODOi8-ac8SPybCZYNmsNSPZm9-1FsB_eFXcWmYklq21jm-c6ORY9vK81YmiCAmzRu3QitrOZxz/s1600/The+problem+of+confounding+batch+effects+in+Fluidigm+Dropseq+and+10X+Genomics+systems.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="293" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwbipi0h9B53sKlI4pFfTmpTWSiiGgkA-E80Rp4UaIMZfD_6s5m92RnnFu4xUN_7Z4KZODOi8-ac8SPybCZYNmsNSPZm9-1FsB_eFXcWmYklq21jm-c6ORY9vK81YmiCAmzRu3QitrOZxz/s400/The+problem+of+confounding+batch+effects+in+Fluidigm+Dropseq+and+10X+Genomics+systems.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: small;">Adapted from Figure 1 in <a href="http://biorxiv.org/content/early/2015/12/27/025528">Hicks et al</a>.</span></td></tr>
</tbody></table>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<a name='more'></a><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Almost without exception every new technology gets published with a slew of high-impact papers. And almost without exception those papers turn out to be heavily biased. This is not to say we should expect every wrinkle to be ironed out before initial publication - new technologies take a lot of effort and the faster they make it into the public domain the sooner the community can improve them and make them more robust. Often batch effect is the first problem identified: with arrays, with NGS, and now with single-cell RNA-seq.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Stephanie et al looked at 15 published single-cell RNA-seq papers and found that in the 8 studies investigating differences between group, and where they could assess confounding effect it ranged from 82.1% to 100% <span style="color: #666666;">(see table 1 from the paper - 82,85,93,96,98,100 & 100%)</span>. All of these studies were designed such that the samples were confounded with processing batch. They report that the number of genes detected expressed explained a significant proportion of observed variability, but that this varied across experimental batches. This confounding of biological question with experimental batch effectively cripples the project; </span><br />
<i style="font-family: georgia, "times new roman", serif;"><br /></i>
<br />
<div style="text-align: center;">
<i style="font-family: georgia, "times new roman", serif;"><span style="font-size: large;">"Batch effects lead to differences in detection rates, which lead to apparent differences between biological groups"</span></i></div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">However the authors do point out that relatively simple experimental design choices can be used to remove the problem.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>What does this mean for ERCC and other spike-ins : </b>In her final slides, see "The Wild West", Stephanie clearly explains the problems we face with batch effects and in normalising single-cell RNA-seq experiments.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
<ul>
<li>Batch effects can be a big problem in scRNA-Seq data (but not always). </li>
<li>Batch effects and methods to correct for batch effects have been around for many years (lots of places to start). </li>
<li><b>Bad news: </b>Poor experimental design is a big liming factor…. also, more complicated because of sparsity (biology and technology), capture efficiency, etc</li>
<li><b>Good news:</b> Increase awareness about good experimental design. New methods specific for scRNA-Seq are being developed</li>
</ul>
</div>
<div style="text-align: justify;">
It is looking more and more possible to use RNA spike-in's in scRNA-seq experiments specifically as a tool to help in the normalisation of the data, and also as a way to reduce/remove batch effects. Stephanie does state that there are still challenges in doing this, and also points to the use of UMI counts to help fix the problem by reducing amplification bias, etc.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
However not every protocol recommends spike-in's and there is certainly no clear preference in the community - although I think this is beginning to emerge. Read about <a href="http://biorxiv.org/content/early/2016/09/08/073692">how ERCC's & SIRVs are being used in single-cell RNA seq</a> in the latest paper from <a href="http://www.teichlab.org/">Sarah Teichmann’s group</a> at <a href="http://www.ebi.ac.uk/research/teichmann">EBI</a>/<a href="http://www.sanger.ac.uk/science/groups/teichmann-group">Sanger</a>.<br />
<br />
I'm putting effort into understanding spikes in a lot more detail and am sure we'll all be using them routinely in a few more months.<br />
<br />
<b>What does this mean for the choice of scRNA-seq platform: </b>My briefest of surveys for the three platforms we're using or looking at in my lab are as follows. <a href="https://www.fluidigm.com/products/c1-system">Fluidigm</a> suggest using the <a href="https://www.thermofisher.com/order/catalog/product/AM1780">ArrayControl RNA Spikes</a> (Thermo Fisher Scientific AM1780). <a href="http://www.cell.com/abstract/S0092-8674(15)00549-8">Drop-seq</a> suggest using the ERCC spikes (although this is not mentioned in their <a href="http://mccarrolllab.com/download/905/">online protocol</a>). <a href="http://www.10xgenomics.com/single-cell/">10X Genomics</a> don't say anything about spikes in their current protocols!<br />
<br />
I generated the figure at the top of this post to show where these 3 scRNA-seq platforms fit into Stephanie's figure 1 from the paper. Both C1 and Drop-seq are completely confounded as only one sample is processed per batch. 10X Genomics allows up to 8 samples to be processed together so a replicated "AvsB" study could be completed with zero batch effect.<br />
<br />
But in the future we're likely to need 12, 24 or even 96 sample systems that allow us to process a scRNA-seq experiment in one go. Whilst it may well be possible to design Fluidigm C1 chips that can process more samples, each with fewer cells, or for Drop-seq to emulate 10X Genomics, or even for 10X Genomics to move to a larger sample format chip; none of this will solve the problem of collecting large numbers of single-cell samples without introducing batch effects further upstream in the experiment.</div>
</span><br />
<div>
<div style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">
<div>
<br /></div>
<div>
<span style="font-family: "georgia" , "times new roman" , serif;">The take home message is to spend time on experimental design, and to replicate your study - simple enough stuff! Biological replication will allow batches to be randomised during the experiment to scRNA-seq prep runs and across sequencing flowcells if necessary. This generally allows batch effects to be removed from the experiment, even if they are significant.</span></div>
</div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com6tag:blogger.com,1999:blog-6334453475526523597.post-50426463255283430962016-10-11T07:04:00.001+01:002016-10-11T07:05:30.884+01:00Clinical trials using ctDNA<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">DeciBio have a great interactive Tableau dashboard which you can use to browse and filter their analysis of 97 “laboratory biomarker analysis” ImmunOncolgy clinical trials; see: </span><a href="http://www.decibio.com/blog/2016/08/11/diagnostic-biomarkers-for-cancer-immunotherapy-moving-beyond-pd-l1/" style="font-family: Georgia, "Times New Roman", serif;">Diagnostic Biomarkers for Cancer Immunotherapy – Moving Beyond PD-L1</a><span style="font-family: "georgia" , "times new roman" , serif;">. The raw data comes from <a href="https://clinicaltrials.gov/">ClinicalTrials.gov</a> where you can specify a <a href="https://clinicaltrials.gov/ct2/results?term=ctDNA">"ctDNA"</a> search and get back 50 trials, <a href="https://clinicaltrials.gov/ct2/results?term=ctDNA&recr=Open">40 of which are open</a>.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Two of these trails are happening in the UK. Investigators at <a href="https://clinicaltrials.gov/ct2/show/NCT02579278?term=ctDNA&recr=Open&cntry1=EU%3AGB&rank=1">The Royal Marsden</a> are looking to measure the presence or absence of ctDNA post CRT in EMVI-positive rectal cancer. And <a href="https://clinicaltrials.gov/ct2/show/NCT02588105?term=ctDNA&recr=Open&cntry1=EU%3AGB&rank=2">Astra Zeneca</a> are looking for ctDNA as a secondary outcome to obtain a preliminary assessment of safety and efficacy of AZD0156 and its activity in tumours by evaluation of the total amount of ctDNA.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<div>
<span style="font-family: "georgia" , "times new roman" , serif;">You can also specify your own search terms and get back lists of trials from </span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><a href="http://explorer.opentrials.net/search?q=ctDNA" style="font-family: Georgia, "Times New Roman", serif;">OpenTrials</a><span style="font-family: "georgia" , "times new roman" , serif;"> which went live very recently. The Marsden's ctDNA trials above is currently listed.</span></div>
</div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">You can use the <a href="http://www.decibio.com/blog/2016/08/11/diagnostic-biomarkers-for-cancer-immunotherapy-moving-beyond-pd-l1/">DeciBio dashboard</a> on their site. In the example below I filtered for trials using ctDNA analysis and came up with 7 results:</span></div>
<br />
<ol style="text-align: left;">
<li style="text-align: justify;"><a href="https://clinicaltrials.gov/show/NCT02224781">Dabrafenib and Trametinib Followed by Ipilimumab and Nivolumab or Ipilimumab and Nivolumab Followed by Dabrafenib and Trametinib in Treating Patients With Stage III-IV BRAFV600 Melanoma</a></li>
<li style="text-align: justify;"><a href="https://clinicaltrials.gov/show/NCT02275533">Nivolumab in Eliminating Minimal Residual Disease and Preventing Relapse in Patients With Acute Myeloid Leukemia in Remission After Chemotherapy</a></li>
<li style="text-align: justify;"><a href="https://clinicaltrials.gov/show/NCT02408861">Nivolumab and Ipilimumab in Treating Patients With Advanced HIV Associated Solid Tumors</a></li>
<li style="text-align: justify;"><a href="https://clinicaltrials.gov/show/NCT02453620">Entinostat, Nivolumab, and Ipilimumab in Treating Patients With Solid Tumors That Are Metastatic or Cannot Be Removed by Surgery or Locally Advanced or Metastatic HER2-Negative Breast Cancer</a></li>
<li style="text-align: justify;"><a href="https://clinicaltrials.gov/show/NCT02631746">Nivolumab in Treating Patients With HTLV-Associated T-Cell Leukemia/Lymphoma</a></li>
<li style="text-align: justify;"><a href="https://clinicaltrials.gov/show/NCT02701400">Tremelimumab and Durvalumab With or Without Radiation Therapy in Patients With Relapsed Small Cell Lung Cancer</a></li>
<li style="text-align: justify;"><a href="https://clinicaltrials.gov/show/NCT02778685">Pembrolizumab, Letrozole, and Palbociclib in Treating Patients With Stage IV Estrogen Receptor Positive Breast Cancer With Stable Disease That Has Not Responded to Letrozole and Palbociclib</a></li>
</ol>
<div>
<br /></div>
<div class="tableauPlaceholder" id="viz1476163375514" style="position: relative;">
<noscript><a href='#'><img alt="IO Clinical Trials " src="https://public.tableau.com/static/images/QR/QRM3X9HC5/1_rss.png" style="border: none"></a></noscript><span style="font-family: "georgia" , "times new roman" , serif;"><object class="tableauViz" style="display: none;"><param name='host_url' value='http%3A%2F%2Fpublic.tableau.com%2F' /><param name='path' value='shared/QRM3X9HC5' /><param name='toolbar' value='yes' /><param name='static_image' value='http://public.tableau.com/static/images/QR/QRM3X9HC5/1.png' /><param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='tabs' value='no' /></object></span></div>
<br />
<span style="font-family: "georgia" , "times new roman" , serif;">Thanks to DecBio's Andrew Aijian for the analysis, dashboard and commentary. And to OpenTrials for making this kind of data open and accessible.</span></div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com5tag:blogger.com,1999:blog-6334453475526523597.post-77824265161439425782016-10-07T11:59:00.000+01:002016-10-13T11:10:08.413+01:00Index mis-assignment to Illumina's PhiX control<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Multiplexing is the default option for most of the work being carried out in my lab, and it is one of the reasons Illumina has been so successful. Rather than the one-sample-per-lane we used to run when a GA1 generated only a few million reads per lane, we can now run a 24 sample RNA-seq experiment in one HiSeq 4000 lane and expect to get back 10-20M reads per sample. For almost anything other than genomes multiplexed sequencing is the norm.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">But index sequencing can go wrong, and this can and does happen even before anything gets on the sequencer. </span><span style="font-family: georgia, "times new roman", serif;">We noticed that PhiX has been turning up in demultiplexed sample Fastq. PhiX does not carry a sample index index so something is going wrong! What's happening? Is this a problem for indexing and multiplexing in general on NGS platforms? These were the questions I have recently been digging into after our move from HiSeq 2500 to HiSeq 4000.</span><span style="font-family: georgia, "times new roman", serif;"> </span><span style="font-family: georgia, "times new roman", serif;">In this post I'll describe what we've seen with mis-assignment of sample indexes to PhiX. And I'll review some of the literature that clearly pointed out the issue - in particular I'll refer to </span><a href="https://repositories.lib.utexas.edu/handle/2152/31375" style="font-family: georgia, "times new roman", serif;">Jeff Hussmann's PhD thesis from 2015</a><span style="font-family: georgia, "times new roman", serif;">.</span><br />
<span style="font-family: georgia, "times new roman", serif;"><br /></span>
<span style="font-family: georgia, times new roman, serif;">The problem of index mis-assignment to PhiX can be safely ignored, or easily fixed (so you could stop reading now). But understanding it has made me realise that index mis-assignment <b>between</b> samples is an issue we don not know enough about - and that the tools we're using may not be quote up to the job (but I'll not cover this in depth in this post).</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<span style="text-align: justify;"><br /></span>
</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxxl5PwXLEmpOFyc1RPK3dUe-U5epp_keRLplceagZDKl2AXImLkejWbrHuCpcGADYtSKaiYwAVaiAKlmT5GV6OqUxZTKxvUzU8h4xCQjX3-ft5hFFGzfIYvl1y4HtpMMTKxE0jtYMEkK7/s1600/Mis-assignment+of+Indexes+to+PhiX.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="298" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxxl5PwXLEmpOFyc1RPK3dUe-U5epp_keRLplceagZDKl2AXImLkejWbrHuCpcGADYtSKaiYwAVaiAKlmT5GV6OqUxZTKxvUzU8h4xCQjX3-ft5hFFGzfIYvl1y4HtpMMTKxE0jtYMEkK7/s400/Mis-assignment+of+Indexes+to+PhiX.png" width="400" /></span></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"></span></div>
<a name='more'></a><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Issues with index mis-assignment and quality were initially noticed when we detected Illumina's PhiX control in demultiplexed Fastq data. PhiX is supplied by Illumina as a non-indexed library and as such should never appear in demultiplexed Fastq files. In our default analysis pipeline it should only appear in the "lost-reads" file and should be around 1% in data from lanes 1-7, and 5% in data from lane 8 of an Illumina flowcell (the actual percentage of PhiX can vary for several reasons, so we're not surprised to see higher or lower percentages than expected). We are still running PhiX in almost every lane of sequencing as an easy control to monitor run quality. But if PhiX is getting a barcode what's going wrong?</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
The main concern is that if the barcode read is failing in some manner, and attributing barcodes incorrectly, this will lead to erroneous results. There are two major things that index mis-assignment causes<br />
<br />
<ol>
<li>reads are lost because a spurious barcode was assigned; this data would usually be discarded, should be minimal, and can potentially be ignored.</li>
<li>barcodes are mis-assigned to the wrong sample; this is a much more serious issue, and understanding what causes it, and the likelihood of it happening, will be critical in reducing the technical factors that could limit low variant calling. </li>
</ol>
</div>
<div style="text-align: justify;">
With PhiX on every lane we should be able to monitor index mis-assignment in every run. PhiX may also allow us to estimate the rate of mis-assignment between samples, which will be vital if users need to allow for this in their analysis, particularly in low-frequency variant calling.<br />
<br /></div>
<div style="text-align: justify;">
<b>Previous reports about multiplexing on Illumina sequencers:</b> As was anticipated several years ago multiplex sequencing has become a common tool in many studies, the level of multiplexing varies but it is almost ubiquitous – an anomaly to this is the creation of indexed libraries in the Genomics England sequencing program but the running of non-indexed sequencing and single-sample-per-lane by the sequencing contractor Illumina.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Several key papers are listed below that describe this issue, probably the most useful papers are <a href="http://nar.oxfordjournals.org/content/40/1/e3.full">Kircher et al</a> from the <a href="http://www.zmbh.uni-heidelberg.de/Mayer">Meyer lab at the MPI</a>, <a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-110">Mike Quail and Peter Ellis's SASI-seq paper from the Sanger</a>, and <a href="https://repositories.lib.utexas.edu/handle/2152/31375">Jeff Hussmann's PhD thesis</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The Kircher paper presents data from three slightly different preps no-CAP (standard library prep), SP-CAP (single-plex in-solution capture libraries), and MP-CAP (multi-plex in-solution capture libraries). They were able to determine the fraction of mis-tagging events caused by either barcode contamination during oligo synthesis, pooling or handling, by mixed clusters, or by PCR recombination. After removing possible contamination as a source of error they reported that both no-CAP and SP-CAP had low levels of index mis-assignment (0.018% and 0.034%) but that the MP-CAP libraries had more than ten times higher mis-assignment (0.390%). The low percentages in the first tow libraries were due to mixed cluster that could not be eliminated by quality filtering. The high, almost 0.5%, mis-assignment in the MP-CAP library was due to PCR recombination during multiplex PCR after in-solution capture. Importantly they calculated that if this recombination is occurring primarily in the adapter sequences then half of the chimeric reads, almost 0.25% of all exome reads, would be mis-assigned to a sample if a single index was used, and that dual-indexing would be recommended.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Their analysis was confirmed by <a href="http://strategies%20for%20achieving%20high%20sequencing%20accuracy%20for%20low%20diversity%20samples%20and%20avoiding%20sample%20bleeding%20using%20illumina%20platform/">Mitra et al 2015</a> who went further in showing that the template read on the HiSeq was part of the problem - on HiSeq 2500 this is kept to 4 cycles to reduce memory requirements, but when Mitra et al increased template read lengths to 20 cycles they saw 2-5 fold better results for index mis-assignment. Such a long template read would kill most of our HiSeq instruments, but upgrading the memory is suggested by the authors and could be very economical given the impact of low quality cluster detection and index mis-assignment.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In <a href="https://repositories.lib.utexas.edu/handle/2152/31375">Jeff's PhD</a> he used reads from the shortest library molecules with read-through into the adapters to determine that the PhiX control use the older ‘PE’ primers, which have no sequence complementarity to the standard indexing read primers; as such they cannot generate a signal during the index read. He noticed the same drop in quality scores for PhiX index reads compared to the indexed samples as we had. But he also shows that the PhiX reads that appear to be indexed are physically closer to an indexed cluster than PhiX reads with no index read. This led him to propose the same the model of index bleeding as I have here.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Jeff also carefully investigated PCR-mediated recombination (as did Kircher et al) as an additional source of index mis-assignment. This was first reported back at the start of the 1990's by <a href="http://nar.oxfordjournals.org/content/18/7/1687">Meyerhans et al</a>. In any PCR the polymerase can stall or fall off the template creating a short extension products, this can then hybridise in place of a primer in the next round of PCR. The issue with Illumina libraries is that such a product could create a chimeric index mis-assignment due to molecular swapping of indexes. This is likely to be most pronounced in multiplexed amplification after indexed library prep i.e. most exome and amplicome strategies. He also stated that his analysis "constituted overwhelming evidence that PCR-mediated recombination happens during cluster generation". His analysis was all on HiSeq 2500 "Manteia" clustering chemistry, this is likely to perform quite differently from the patterned flowcell <a href="http://core-genomics.blogspot.co.uk/2016/01/almost-everything-you-wanted-to-know.html">"Exclusion Amplification"</a> chemistry and we're looking into index mis-assignment on that right now.<br />
<br />
In the <a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-110">SASI-seq</a> paper Quail <i>et al</i> highlighted the issue of index mis-assignment and discussed the need for confirmation that contamination is not present before a data set is analysed. They presented a simple and inexpensive method to verify that results are not contaminated. They prepared a mix of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, and added these to samples at a spike-in level of approximately 0.1%. They also designed a set of 384 11bp Illumina indexes sequences with high Hamming distance (5bp apart) higher levels of error correction and very low levels of barcode mis-assignment due to sequencing errors.<br />
<br /></div>
</span><div style="text-align: justify;">
<b style="font-family: Georgia, "Times New Roman", serif;">Our PhiX mis-assignment analysis results:</b><span style="font-family: "georgia" , "times new roman" , serif;"> We took historical data </span><span style="font-family: "georgia" , "times new roman" , serif;">to verify if PhiX mis-assignment was happening across all flowcells and could clearly see this was the case, </span><b style="font-family: Georgia, "Times New Roman", serif;">(A) </b><span style="font-family: Georgia, "Times New Roman", serif;">simply</span><b style="font-family: Georgia, "Times New Roman", serif;"> </b><span style="font-family: Georgia, "Times New Roman", serif;">shows the percentage of PhiX we added to each</span><b style="font-family: Georgia, "Times New Roman", serif;"> </b><span style="font-family: Georgia, "Times New Roman", serif;">lane</span><span style="font-family: "georgia" , "times new roman" , serif;">. In </span><b style="font-family: georgia, "times new roman", serif;">(B)</b><span style="font-family: georgia, "times new roman", serif;"> you can see that t</span><span style="font-family: "georgia" , "times new roman" , serif;">he majority of lanes show a reasonably low level of index mis-assignment to PhiX, at just 0.01-1% in single indexed samples (green), and 0.01-0.0001% in dual-</span><span style="font-family: "georgia" , "times new roman" , serif;">indexed samples (red). Dual indexing appears to help significantly.</span><span style="font-family: "georgia" , "times new roman" , serif;"> We also saw that the level of PhiX contamination was worse on 2500 than 4000, and increased as the amount of PhiX used increased. In fact t</span><span style="font-family: "georgia" , "times new roman" , serif;">he rate of PhiX index mis-assignment was more strongly correlated with the amount of phiX in lane for single indexed samples than for dual indexed samples</span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><b style="font-family: Georgia, "Times New Roman", serif;">(C)</b><span style="font-family: "georgia" , "times new roman" , serif;">.</span><span style="font-family: "georgia" , "times new roman" , serif;"> We see PhiX appearing at as much as 1% of the sample in the very worst cases - however this is generally in single-indexed multiplexed sequencing with very high levels of PhiX e.g. low-diversity spiking.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtiAUTRrmD3i4TvcEpr2wLXBPo8bOJzpsO8AABn3cJ5k6EUiQxtBBhglIDrZeqYFV6yEU5Gu_1tuzXnr8eD69z564GWuf_Hf5y-qcgH4l4ymQdinaWrYAXub16w4johyjk2gq60S5xn2jd/s1600/PhiX+index+mis-assignment+analysis.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtiAUTRrmD3i4TvcEpr2wLXBPo8bOJzpsO8AABn3cJ5k6EUiQxtBBhglIDrZeqYFV6yEU5Gu_1tuzXnr8eD69z564GWuf_Hf5y-qcgH4l4ymQdinaWrYAXub16w4johyjk2gq60S5xn2jd/s640/PhiX+index+mis-assignment+analysis.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal" style="text-align: justify; text-justify: inter-ideograph;">
<span lang="EN-US" style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div class="MsoNormal" style="text-align: justify; text-justify: inter-ideograph;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Indexed versus non-indexed PhiX analysis:</b> Whilst the Illumina PhiX control is not indexed, it is possible to purchase an indexed version from <a href="http://www.seqmatic.com/products/tailormix-dual-indexed-phix/">SEQMATIC</a>. When we compare indexed versus non-indexed PhiX the results were clear - non-indexed PhiX shows around 0.02% bleed through, while the SEQMATIC index is around 0.005%; a four fold reduction in bleed through.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkjqLYdfQi-TFhUMeEJyNYgYRe7RG-cNVAckVGABOSYR92lWomH6SRQCu78mXCuh1C0q7uKf-Q9atnH8Ra3hfoEQUTy9OlPtjHjiiBcw4Sc7sW7zl-6QJmf-ImOy0VdDHsFyaQ9cf-Or8I/s1600/Indexed+vs+non-indexed+PhiX.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="266" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkjqLYdfQi-TFhUMeEJyNYgYRe7RG-cNVAckVGABOSYR92lWomH6SRQCu78mXCuh1C0q7uKf-Q9atnH8Ra3hfoEQUTy9OlPtjHjiiBcw4Sc7sW7zl-6QJmf-ImOy0VdDHsFyaQ9cf-Or8I/s320/Indexed+vs+non-indexed+PhiX.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Indexed versus non-indexed PhiX comparison</td></tr>
</tbody></table>
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span>
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Index-read base-quality scores are worthless: </b>We saw that </span><span style="font-family: "georgia" , "times new roman" , serif;">mis-assigned PhiX (PhiX FQ below) reads generally had lower sequence read quality scores than the correctly assigned samples <b>(D)</b>. The </span><span style="font-family: "georgia" , "times new roman" , serif;">mis-assigned PhiX index reads were also had generally lower quality scores than the correctly assigned samples <b>(E & F)</b>, and it would be great to filter on base quality</span><span style="font-family: "georgia" , "times new roman" , serif;"> scores to remove mis-assigned reads. Unfortunately the quality score you get from an Illumina index read is pretty much useless. This is primarily due to its short length. Actually getting the index quality scores requires quite a bit of messing around with the default bcl-fastq pipeline.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span lang="EN-US"><br /></span></span>
<span style="font-family: "georgia" , "times new roman" , serif;">These index Q-scores are currently discarded. Just to get the data for the plots below we had to rerun the flowcell through a modified bcl-fastq pipeline. Keeping index Q-scores would require changes to our default pipelines and increase in our compute storage requirements. However we may be able to develop methods similar to <a href="http://www.illumina.com/documents/products/whitepapers/whitepaper_datacompression.pdf">Q-score binning</a>, to reduce this extra data, and still allow an assessment of index quality.</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCG5_BXbV3ONu1MBnlh5TMWMhIkqwIFp_7EKsHuF-GnrbZ_utZlYLGssS2vf2sGMxyMPYnKDnHoF7pytVnrFk3xwSY5rhipTle_KhwTiV3Go3BmcOFU6gAlySb01efYkoV6fkU4PSpGBm1/s1600/PhiX+index+read+and+index+q-scores.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="268" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCG5_BXbV3ONu1MBnlh5TMWMhIkqwIFp_7EKsHuF-GnrbZ_utZlYLGssS2vf2sGMxyMPYnKDnHoF7pytVnrFk3xwSY5rhipTle_KhwTiV3Go3BmcOFU6gAlySb01efYkoV6fkU4PSpGBm1/s400/PhiX+index+read+and+index+q-scores.png" width="400" /></a></div>
<span lang="EN-US" style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span lang="EN-US" style="font-family: "georgia" , "times new roman" , serif;">Going further than this Illumina sequencing might benefit from running a longer template read at the beginning of all reads e.g. read 1, i5, i7 and read 2. What the computational burden might be and exactly the impact on index mis-assignment this would have is difficult to predict. But even small reductions in errors like this would be worthwhile for low allele frequency applications. I'd expect that companies aiming for tumour screening in the general population (e.g. <a href="http://www.grailbio.com/">Grail</a>) would benefit the most from doing these experiments.</span><br />
<br /></div>
<div style="text-align: justify;">
<b style="font-family: Georgia, "Times New Roman", serif;">PhiX mis-assignment analysis conclusions:</b><span style="font-family: "georgia" , "times new roman" , serif;"> Based on our analysis, and the results presented in Jeff's PhD we've come to the conclusion that PhIX index mis-assignment is caused by two issues: index bleeding and/or poly-clonal clusters. And that this can be fixed or safely ignored.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqc9znHGguOPNLeXOq1XrSqThxXplW8pij6CgbX9zu450RtB5PGgHpDfZqcjWrOM_tTf3R7OzMmK3tYNd84Y2eqMVPR5tkBxgTEf2-Acja-w9bnMEZ_b1jNMOSjkDHwvjGcwcLHDuADg1k/s1600/Mis-assignment+of+Indexes+to+PhiX.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="238" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqc9znHGguOPNLeXOq1XrSqThxXplW8pij6CgbX9zu450RtB5PGgHpDfZqcjWrOM_tTf3R7OzMmK3tYNd84Y2eqMVPR5tkBxgTEf2-Acja-w9bnMEZ_b1jNMOSjkDHwvjGcwcLHDuADg1k/s320/Mis-assignment+of+Indexes+to+PhiX.png" width="320" /></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; text-align: left;">In the figure above <b>(1A)</b> I've tried to present</span><span style="font-family: "georgia" , "times new roman" , serif;"> “index bleeding” - each library template cluster emits a signal according to it’s base-fluorophore, represented by the capitalised circles as <b>GAT</b>, (green=G/T, red=A/C), however this fluorescent signal “bleeds” outward from each cluster. A non-indexed PhiX cluster, </span><span style="font-family: georgia, "times new roman", serif;">represented by the lower-case circles,</span><span style="font-family: "georgia" , "times new roman" , serif;"> does not emit signal and is base-called from the erroneous "index bleeding" library cluster signal as </span><b style="font-family: georgia, "times new roman", serif;">gat</b><span style="font-family: "georgia" , "times new roman" , serif;">. An indexed PhiX cluster emits a signal according to it’s base-fluorophore and is correctly base-called as </span><b style="font-family: georgia, "times new roman", serif;">CTA</b><span style="font-family: "georgia" , "times new roman" , serif;">. In figure </span><span style="font-family: "georgia" , "times new roman" , serif;"><b>1B</b></span><span style="font-family: "georgia" , "times new roman" , serif;"> <span style="font-family: "georgia" , "times new roman" , serif; text-align: left;">I've tried to present what may be happening on</span> mixed template poly-clonal clusters. These are caused by the random nature of clustering where some clusters are made from two template molecules, that may have seeded at different times. A cluster produced from a single library molecule (α) is correctly base-called as <b>GAT</b>. A mixed template non-indexed PhiX cluster (β) is base-called on the low-signal from the erroneous library cluster signal in the indexing read only, due to lack of PhiX index signal as <b>gat</b>. A mixed template indexed PhiX cluster(γ) emits a signal according to it’s base-fluorophore that is higher than signal from the erroneous library cluster and is correctly base-called as <b>CTA</b>.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Index-bleeding should only be an issue for non-</span><span style="font-family: "georgia" , "times new roman" , serif;">patterned</span><span style="font-family: "georgia" , "times new roman" , serif;"> flowcells, whilst poly-clonal clusters will be a problem on both patterned and </span><span style="font-family: "georgia" , "times new roman" , serif;">non-</span><span style="font-family: "georgia" , "times new roman" , serif;">patterned</span><span style="font-family: "georgia" , "times new roman" , serif;"> flowcells i.e. HiSeq 4000 and 2500.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; text-indent: -18pt;"><br /></span></div>
<div style="text-align: justify;">
<b style="font-family: Georgia, "Times New Roman", serif;">How to fix the problem:</b><span style="font-family: "georgia" , "times new roman" , serif;"> for index mis-assignment to PhiX the fix is relatively straight-forward. Either use an <a href="http://www.seqmatic.com/products/tailormix-dual-indexed-phix">indexed PhiX</a>, or spike in an oligo to the indexing read primers such that PhiX generates a signal. Both strategies will mean the PhiX clusters generate a signal that outcompetes the index-bleeding, or poly-clonal cluster signals. PhiX will no longer appear in your demultiplexed fastq, or will be at such low levels you'd only see it if you specifically went looking.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Unfortunately </span><span style="font-family: georgia, "times new roman", serif;">index mis-assignment</span><span style="font-family: georgia, times new roman, serif;"> between samples is still an unresolved issue. In a follow up post I'm going to discuss what we've seen, and what the apparent causes are. Again some relatively simple fixes are available - but if you are using multiplexed sequencing to detect low-frequency alleles in populations; e.g. cancer, single-cells, population genomics, then you need to consider whether you understand how your experiments might be affected.</span></div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /><b>PS: </b>I think it is pretty lax of Illumina not to provide an indexed PhiX. The V2 PhiX was indexed but V3 dropped this, probably due to there only being 96 TruSeq indexes. Come on Illumina sort this one out!</span><div>
<b style="text-align: justify;"><span lang="EN-US" style="font-family: "georgia" , "times new roman" , serif;"><br /></span></b></div>
<div>
<b style="text-align: justify;"><span lang="EN-US" style="font-family: "georgia" , "times new roman" , serif;">Useful references:</span></b><div class="MsoNormal" style="text-align: justify; text-justify: inter-ideograph;">
<ol>
<li>Kircher <i>et al</i>. 2011: <a href="http://nar.oxfordjournals.org/content/40/1/e3.full">Double indexing overcomes inaccuracies in multiplexsequencing on the Illumina platform</a></li>
<li>Quail <i>et al.</i> 2014: <a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-110">SASI-Seq: sample assurance Spike-Ins, and highlydifferentiating 384 barcoding for Illumina sequencing.</a></li>
<li>Hussmane J PhD thesis 2015: <a href="https://repositories.lib.utexas.edu/handle/2152/31375" style="font-family: georgia, "times new roman", serif;">Expanding the applications of high-throughput DNA sequencing</a><span style="font-family: georgia, "times new roman", serif;">.</span></li>
<li>Phillipe <i>et al</i>. 2015: <a href="http://nar.oxfordjournals.org/content/early/2015/02/16/nar.gkv107.full">Accurate multiplexing and filtering for high-throughputamplicon-sequencing</a></li>
<li>Carlsen et al. 2012: <a href="http://www.sciencedirect.com/science/article/pii/S1754504812000918">Don’t make a mista(g)ke - is tag switching an overlookedsource of error in amplicon pyrosequencing studies</a></li>
<li>Mukherjee <i>et al</i>. 2015: <a href="https://standardsingenomics.biomedcentral.com/articles/10.1186/1944-3277-10-18">Large-scale contamination of microbial isolate genomes by Illumina PhiX control</a></li>
<li>Williams <i>et al</i>. 2006: <a href="http://www.nature.com/nmeth/journal/v3/n7/full/nmeth896.html">Amplification of complex gene libraries by emulsion PCR</a>. Nature methods, 3(7):545–550, 2006.</li>
<li>Meyerhans <i>et al</i>. 1990: <a href="http://nar.oxfordjournals.org/content/18/7/1687">DNA recombination during PCR</a>.</li>
<li>Mamanova <i>et al</i>. 2010 <a href="http://www.nature.com/nmeth/journal/v7/n2/abs/nmeth.1419.html">Target-enrichment strategies for next-generation sequencing</a>.</li>
</ol>
</div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com12tag:blogger.com,1999:blog-6334453475526523597.post-88884155804842104232016-09-20T15:34:00.001+01:002016-09-20T15:34:18.275+01:00The future of Illumina according to @chrissyfarr<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">In yesterdays <a href="https://www.fastcompany.com/3061591/body-os/illumina-owns-the-dna-sequencing-market-now-its-building-an-app-store-too">Fast Company piece</a> </span><a href="https://twitter.com/chrissyfarr" style="font-family: Georgia, "Times New Roman", serif;">Christina Farr (on Twitter)</a><span style="font-family: "georgia" , "times new roman" , serif;"> gives a very nice write up of Illumina's history and where they are going with respect to bringing DNA sequencing into the clinic. I really liked the piece and wanted to share my thoughts after reading it with Core-Genomics readers.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibAgKh3wLwWiLkjqoYYbgysO4oasLYVnsJq7dBwjAYG6FQbqewqv_-bwEgvrtM837pUTWS913nV9VOglzg1qownssUumiXkIoDJ0U2CUbpL0tAx3u63NafVe0OkCUTtmtZP-6-qktTmuys/s1600/FastCompany+Illumina+story+Sept+2016.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="317" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibAgKh3wLwWiLkjqoYYbgysO4oasLYVnsJq7dBwjAYG6FQbqewqv_-bwEgvrtM837pUTWS913nV9VOglzg1qownssUumiXkIoDJ0U2CUbpL0tAx3u63NafVe0OkCUTtmtZP-6-qktTmuys/s320/FastCompany+Illumina+story+Sept+2016.png" width="320" /></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<a name='more'></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">To showcase how Illumina is impacting medicine Christina mentions two recent Illumina spin-outs; </span><a href="http://www.helix.com/" style="font-family: Georgia, "Times New Roman", serif;">Helix</a><span style="font-family: "georgia" , "times new roman" , serif;"> (an Apple-esque app store for genome applications) and </span><a href="http://www.grailbio.com/" style="font-family: Georgia, "Times New Roman", serif;">Grail</a><span style="font-family: "georgia" , "times new roman" , serif;"> (aiming to develop early cancer detection tests from deep sequencing of ctDNA). And also highlights some wonderful examples of where Illumina themselves have applied sequencing to clinical cases; the Jaynome (Flately's own genome) and discovery of his having the condition malignant hypothermia; to the more compelling rare disease cases such as Massimio, a boy with a genetic mutation causing HBSL (<a href="http://www.sciencedirect.com/science/article/pii/S0002929713001675">Hypomyelination in the Brain stem and Spinal cord</a>) a new disease found only by the use of Illumina's technology.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Next-generation sequencing is changing medicine and the reality is when we say NGS most of us mean Illumina sequencing - for now at least.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>New business models are emerging in genomics: </b>Illumina's Helix is subsidising exome sequencing costs with the hope that users will pay to query the data over time and that this use will more than cover sequencing costs. In an era of very low borrowing costs buying in now to sequence 100 million genomes might only require users to sign up to a $10 a month plan for the rest of their lives, with queries costing a few dollars - in the case of Flatley's own malignant hypothermia, which can result in sudden death while under general anesthesia, a user might query this before deciding on surgery. for instance Or a family might check for an <a href="http://www.ncbi.nlm.nih.gov/books/NBK285956">MT-RNR1 m.1555A>G</a> mutation before their child is being treated with </span><span style="font-family: "georgia" , "times new roman" , serif;">gentamicin</span><span style="font-family: "georgia" , "times new roman" , serif;"> saving the 1:500 kids with this particular variant from going deaf while in the ICU.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">$10 per month is pretty low compared to life-insurance policies and if Illumina or others can do a deal with the "Man from the Pru" personalised genomes outside of the clinic really could become the norm. $10 per month over 10 years is $1200 versus a $1000 genome, but over 40 or even 80 years should be attractive, and this does not consider the reselling of consumer genomics data as 23andMe are showing is possible.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>The negative impact of Illumina's lack of competition: </b>Christina comes back to an issue Illumina are facing more and more several times during her article; the fine line Illumina are walking to bring new products to clinical and even consumer markets without competing with their academic and clinical customers. The Liquid biopsy market is predicted to be worth $1 billion by 2020 (<a href="http://core-genomics.blogspot.co.uk/2015/11/how-many-liquid-biopsies-per-year-by.html">personally I reckon a figure much higher than this</a>), and NIPT possibly $2.4 billion by 2022. The size of these markets is a temptation for the company that is delivering most of the infrastructure being used to service them today.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">John Stuelpnagel (Illumina's cofounder) and Jonathan Groberg (biotech analyst at UBS) both express some reservations about where Illumina are going in the comments Christina quotes in her article. John immediately jumps into one of the worries I hear about at conferences and meetings, especially when talking to the commercial sector, he says <i>"people [companies] are apprehensive about Illumina and worried about if, and when, they might choose to compete against them"</i>. When asked about this fine line that Illumina should walk to stay on the right side of their customers Jon Groberg says <i>"As Illumina moves into the clinical markets, it's making for some tough conversations"</i>, and Chrisina acknowledges that some of the people she spoke to were reluctant to talk openly. This comes out later in the article when Christina is interviewing Christian Henry (Illumina EVP & COO) about the purchase of Verinata and the signal it send to Illumina's users, possibly viewed as competitors. Whilst Henry is clear that competition with customers is <i>"a foundational question for Illumina"</i> (i.e. Illumina does not want to compete directly), Groberg adds that Illumina might be unable to NOT compete. And a description of Illumina as an <i>"800-pound gorilla in genomics"</i> by 23andMe’s director of research Joyce Tung is not completely flattering.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">In the article Christina highlights Illumina's early days facing litigation from the likes of Affymetrix, where it was the underdog, to its own litigation against ONT, where it has been described as a bully trying to stifle competition. Illumina's dominance in the NGS market is so large that questions are being asked about whether it is unfairly abusing its monopoly position. As a long-term user, and being previously described as <i>"an Illumina fan-boy"</i> I see Illumina's dominance as down to the simple fact that they bought the best technology (an element of luck), but they put a team together that made it work really really well (they made their own luck by investing and working hard). It is Illumina's investment in R&D that has given us the family of instruments from the Mini-seq to the HiSeq X. I'd love to see stronger competition, but its' not there yet, and some big guns have tried and failed (454 LifeTech and CGI). I hope Illumina don't become another ABI bullying other companies trying to get into the space, as well as users - 10 years ago ABI was not a nice company to work with and users were pretty happy to drop them and move over to Illumina. I'm sure Illumina are working on not making the same mistake. But in her article Christina mentions that some of the people she spoke to were afraid to talk openly about this aspect of their relationships with Illumina.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">NGS is here to stay and it is going to become more and more common to hear about it in the news and even down the pub. Jay Flately, Shankar Balasubramanian, David Klenerman <i>et al</i>, Solexa and Illumina will be remembered for developing a technology that changed the world (has anyone written a screenplay). Illumina may not be an Apple yet, but it can't be far away. However predicting the future of NGS has proven to be tough, nearly everyone has under-estimated what/when something might be possible in the future. New technologies like <a href="http://biorxiv.org/content/early/2015/06/05/020420">Oxford Nanopore's sequencers</a> are looking like they may be ready for the clinic in as little as two or three years.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">I am certain that after almost ten years working with NGS the next ten are likely to be almost as exciting.</span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com3tag:blogger.com,1999:blog-6334453475526523597.post-35958573617337082482016-09-16T14:49:00.001+01:002016-09-16T16:40:20.178+01:00Reporting on Fluidigm's single-cell user meeting at the Sanger Institute<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The Genomics community is pushing ahead fast on single-cell analysis methods as these are revolutionising how we approach biological questions. Unfortunately my registration went in too late for the meeting running at the </span><a href="https://coursesandconferences.wellcomegenomecampus.org/events/item.aspx?e=596" style="font-family: Georgia, "Times New Roman", serif;">Sanger Institute</a><span style="font-family: "georgia" , "times new roman" , serif;"> this week <b><span style="color: #999999;">(</span></b></span><b style="font-family: georgia, "times new roman", serif;"><span style="color: #999999;"><a href="https://twitter.com/search?q=%23SCGen16">Follow #SCG16 on Twitter</a>)</span></b><span style="font-family: "georgia" , "times new roman" , serif;">, but the </span><a href="https://www.fluidigm.com/events/single-cell-genomics-2016" style="font-family: Georgia, "Times New Roman", serif;">Fluidigm pre-meeting</a><span style="font-family: "georgia" , "times new roman" , serif;"> was a great opportunity to hear what people are doing with their tech. And it should be a great opportunity to pick other users brains about their challenges with all single-cell methods.</span><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglKoesTdkTva5KtLt3Lv-AWub-JJpQuyRCS0OkbAWFC4my0-HWWMU75iYhtnjgalcAjP8zquBNKwKdRwodLRqSv7-1iQP68P6dQsWuxmIVyZ2O5d4D3Ae8_zPKpFJ_GCIVG3nbITqLTH9B/s1600/Fluidigm+imaging+mass-cytometry+nmeth.2869-F1.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="166" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglKoesTdkTva5KtLt3Lv-AWub-JJpQuyRCS0OkbAWFC4my0-HWWMU75iYhtnjgalcAjP8zquBNKwKdRwodLRqSv7-1iQP68P6dQsWuxmIVyZ2O5d4D3Ae8_zPKpFJ_GCIVG3nbITqLTH9B/s400/Fluidigm+imaging+mass-cytometry+nmeth.2869-F1.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Imaging mass-cytometry: the most exciting thing to happen in 'omics?</td></tr>
</tbody></table>
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Mark Unger (Fluidigm VP of R&D) started the meeting off by asking the audience to consider the two axes of single-cell analysis: 1) Number of cells being analysed, 2) what questions can you ask of those cells (mRNA-seq is only one assay) - proteomics, epigenetics, SNPs, CNVs, etc.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Right now Fluidigm has the highest number of applications that can be run on single-cells with multiple Fluidigm and/or user developed protocols on the <a href="https://www.fluidigm.com/c1openapp/scripthub">Fludigm Open App</a> website; <a href="http://www.10xgenomics.com/single-cell/">10X Genomics</a> only have single-cell 3' mRNA-seq right now, as do <a href="https://www.genomeweb.com/pcr/illumina-bio-rad-form-alliance-single-cell-sequencing-system">BioRad/Illumina</a> and Drop-seq. But I am confident other providers will expand into non 3'mRNA assays...I'd go further and say that if they don't they'll find it hard to get traction as users are likely require a platform that can do more than one thing.</span></div>
<div style="text-align: justify;">
<div style="font-family: Georgia, "Times New Roman", serif;">
<div>
<a name='more'></a></div>
</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<div style="text-align: left;">
<span style="font-family: "georgia" , "times new roman" , serif;">There are three sessions over the two days:</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<ul>
<li style="text-align: left;"><span style="font-family: "trebuchet ms" , sans-serif;">Session I: Single-cell heterogeneity, classification and discovery</span></li>
<li style="text-align: left;"><span style="font-family: "trebuchet ms" , sans-serif;">Session II: Immunotherapy in oncology—new insights at single-cell resolution</span></li>
<li style="text-align: left;"><span style="font-family: "trebuchet ms" , sans-serif;">Session III: Single-cell functional biology </span></li>
</ul>
<div style="text-align: left;">
</div>
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<div style="text-align: justify;">
<b><span style="font-family: "trebuchet ms" , sans-serif; font-size: large;">Session I: Single-cell heterogeneity, classification and discovery</span></b></div>
<div style="text-align: justify;">
<br /></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Achieve new insights through single-cell biology. </b>Candia Brown, Director Strategic Marketing, Fluidigm</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;">Candia asked the audience <i>"what are we trying to do with single-cell genomics methods?"</i> She focussed her brief introductory presentation on understand biological mechanisms and pathways, cell differentiation, cell lineage, etc, and for biomarker discovery, therapeutics...or even in the clinic in the future? Much of the initial work has been done on identifying cell types within populations and to understand heterogeneity. Moving beyond this kind of classification requires more complex methods and analyses. Ultimately we'll need to be using spatio-temporal methods such as in-situ sequencing of carefully prepared samples, and combination analyses with data from RNA, DNA and proteins. We need to detect from single cells (this was a hot topic for Fluidigm at the beginning of 2016) and Candia shoewed examples of population classification and discussed how we might move past relatively "simple" atlasing studies to more complex experiments that aim to make mechanistic insights. Fluidigm aim to present all the latest updates on their tech during this meeting for the C1, Biomark, Helios and Polaris systems.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>Dissecting cerebral organoids and fetal cortex using single-cell RNA-seq. </b>Gray Camp, a post-doc in </span><a href="http://www.eva.mpg.de/genetics/staff/paabo/home.html">Svante Pääbo's group at Max Planck Institute for Evolutionary Biology</a><span style="font-family: "georgia" , "times new roman" , serif;">, Germany. </span><span style="font-family: "georgia" , "times new roman" , serif;">Gray is also collaborating closely with the </span><a href="http://www.treutleinlab.org/">Treutlein lab</a><span style="font-family: "georgia" , "times new roman" , serif;">.</span></span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;"><span style="font-family: "georgia" , "times new roman" , serif;">Cerebral organoids make biological experimentation easier in the same way that tumour organoids are better informing cancer biology. The group are </span><a href="http://www.pnas.org/cgi/pmidlookup?view=long&pmid=26644564">deconstructing cellular heterogeneity in cerebral organoids using single-cell RNA-seq</a><span style="font-family: "georgia" , "times new roman" , serif;"> compared to bulk analysis. Now using organoids developed from patients to generate samples that recapitulate </span><a href="https://en.wikipedia.org/wiki/Gray_matter_heterotopia">periventricular neuronal heterotopia</a><span style="font-family: "georgia" , "times new roman" , serif;">. Also following the reprogramming of fibroblasts into induced neurons (recently published in </span><a href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature18323.html">Nature</a><span style="font-family: "georgia" , "times new roman" , serif;"> and in their </span><a href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature18444.html">News and Views</a><span style="font-family: "georgia" , "times new roman" , serif;">). This great </span><a href="http://thenode.biologists.com/editorial-closing-circle-organoids-back-development/news/">editorial in Development</a><span style="font-family: "georgia" , "times new roman" , serif;"> discusses the impact that organoids are having on biological research.</span></span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Becoming a new neuron in the cerebral cortex.</b> </span><a href="http://jablab.squarespace.com/ludovic-telley/" style="font-family: Georgia, "Times New Roman", serif;">Ludovic Telley, University of Geneva</a><span style="font-family: "georgia" , "times new roman" , serif;">, Switzerland.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span></div>
<div>
<div style="text-align: justify;">
<span style="font-size: x-small;"><span style="font-family: "georgia" , "times new roman" , serif;">Ludovic is also talking about cells in the brain, single-cell methods are having a huge impact on brain biology. His talk focussed on the "L4 neurons" the main recipient of sensory input into the brain. Using a novel technology called FlashTag to visualise and isolate neurons during their development, see </span><a href="https://www.sciencedaily.com/releases/2016/03/160303145747.htm" style="font-family: Georgia, "Times New Roman", serif;">Science 2016</a><span style="font-family: "georgia" , "times new roman" , serif;"> paper. Isolated neurons are then profiled using Fluidigm single-cell RNA-seq to track neuronal transcriptional programs. They found that waves of transcriptional activity are seen as each neuron progresses from proliferative to migratory and finally to connectivity phases.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>A cost-effective 5’ selective single-cell transcriptome profiling approach.</b> </span><a href="http://unice.fr/recherche/chercheurs-a-lhonneur/chercheurs/barbry-pascal">Pascal Barbry, Institut de Pharmacologie Moléculaire et Cellulaire</a><span style="font-family: "georgia" , "times new roman" , serif;">, France.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span><span style="font-family: "georgia" , "times new roman" , serif;"></span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;"><span style="font-family: "georgia" , "times new roman" , serif;">Pascal's group are using Fluidigm single-cell methods to investigate Mucocilliary differentiation. Today he describes the modified SMART-seq method they developed incorporating on-chip barcoding and UMIs. This is somewhat similar to <a href="http://genome.cshlp.org/content/21/7/1160.full">STRT-seq published in 2011</a>, but now on the Fluidigm IFC. Pascal spent some time describing the impact of UMIs (Unique Molecular Identifiers), showed the figure from </span><a href="http://www.pnas.org/content/108/22/9026.long">Cellular Research's PNAS paper</a><span style="font-family: "georgia" , "times new roman" , serif;">, and mentioned one of the four methods to </span><a href="http://core-genomics.blogspot.co.uk/2012/06/improving-small-and-mirna-ngs-analysis.html">reduce RNA-ligation biases</a><span style="font-family: "georgia" , "times new roman" , serif;">. After processing cDNA is fragmented and 5' fragments are isolated by the biotin tag before completion of library prep and sequencing. Showed data on performance and reproducibility of the assay: reads are very biased to the 5' end of transcripts (but have not copared directly to CAGE data), saw about 25% efficiency of ERCC cloning, data suggest that more than 1 million reads per cell is unnecessary. Interestingly they saw a correlation of 0.9 for a C1+IonProton versus Drop-seq+Illumina, but with a reasonable number of genes that appear to be present in only one method! <b>The script will appear on <a href="https://www.fluidigm.com/c1openapp/scripthub/singleCellGenomics?application=geneExpression">Fludigm's Open App</a> site after publication!</b></span></span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">Pascal briefly mentioned their work on the 800 cell IFC, they're pretty happy so far. But would like to be sequencing on Next-seq, which needs lots of PhiX to be added due to the need to read through the oligo-dT sequence. He suggested starting sequencing from the 5' end instead.</span></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; text-align: start;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; text-align: start;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>Single-cell analysis of clonal dynamics and tumour evolution in childhood ALL.</b> <a href="https://www.ucl.ac.uk/cancer/research/department-cancer-biology/stem-cell-group">V</a></span><a href="https://www.ucl.ac.uk/cancer/research/department-cancer-biology/stem-cell-group">irginia Turati</a><span style="font-family: "georgia" , "times new roman" , serif;"><a href="https://www.ucl.ac.uk/cancer/research/department-cancer-biology/stem-cell-group">, Enver lab UCL</a>, UK.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif; text-align: start;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; text-align: start;"><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif; text-align: start;">
</span><span style="font-family: "georgia" , "times new roman" , serif; text-align: start;"></span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; text-align: start;"><span style="font-size: x-small;">ALL is the most common childhood cancer with 1 in 2000 affected and around 500 cases per year in UK. ALL was one of the first disease where branching evolution was described. Using Fluidigm C1 single-cell in a "mouse clinic" from primary patient tumour material, where treatments can be monitored over time. Analysis during chemotherapy of PDXs shows no impact on intratumour heterogeneity i.e PDXs recapitulate the patient tumour. Single cell WGS was much more difficult than RNA-seq! But an average of 37 CNV were found in each cell. They are generating around 10 million reads per cell to generate a coverage of around 0.2x. Saw multiple variants around CDKN2a locus.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif; text-align: start;">
<div style="text-align: justify;">
<span style="font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">Virginia presented some data that shows how small numbers of cells (Freddy) overlap transcriptomes with resistant cells, suggesting that these are evolving towards resistance. Understanding this process is key to improving outcomes for patients. They are aiming to identify a signature of resistant cells to use in the clinic.</span></div>
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>See more with the C1™: explore the breadth of applications available on the C1 platform for single-cell genomics. </b>Shaun Cordes, Senior Product Manager, Fluidigm.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"></span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">Shaun gave an overview of the different methods users can run on the C1 system. He also confirmed the 10,000 cells coming soon, as is a Fluidigm automated imaging system which includes a cloud based software toolkit. New applications coming include single-cell protein analysis with two anti-bodies carrying probes that allow qPCR analysis (read more about the <a href="http://www.sciencedirect.com/science/article/pii/S2214753515000273">Proximity Ligation Assay approach in the Science 2015 paper</a>).</span></div>
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">
</span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "trebuchet ms" , sans-serif; font-size: large;"><b>Session II: Immunotherapy in oncology—new insights at single-cell resolution</b></span></div>
<div style="text-align: justify;">
<br /></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Mass Cytometry applications from Fluidigm. </b>Gary Impey, Director, Product Management - Mass Cytometry, and Robert Ellis, Director, Product Management, Fluidigm.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span><span style="font-family: "georgia" , "times new roman" , serif;"></span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;">About half the audience are either using mass-cytometry already, or are considering using it.</span><span style="font-size: x-small;"> A search on PubMed </span><span style="font-size: x-small;">for <a href="http://www.ncbi.nlm.nih.gov/pubmed/?term=(%22mass-cytometry%22)+OR+CyToF">"mass-cytometry" or "CyToF"</a> results in 196 papers - a pretty high number given how new this method is. </span><span style="font-size: x-small;">Gary is talking about how Fluidigm's Helios system can be used to interrogate cells for immunogenic markers. Gary referenced a Wall Street Journal article: </span><a href="http://www.wsj.com/articles/cancers-super-survivors-how-immunotherapy-is-transforming-oncology-1417714379" style="font-size: small;">Immunotherapy and cancers super survivors</a><span style="font-size: x-small;">. David Lane (formerly Chief Scientist at CRUK) was quoted as saying </span><i style="font-size: small;">“It’s the most exciting thing I’ve ever seen”.</i><span style="font-size: x-small;"> To get real insights we need highly-dimensional single-cell methods - Fluidigm's Helios CyToF is one tool that can help.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
<span style="font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">Fluidigm currently have 50 high-purity metal isotope tags which allow almost generation of data with minimal biological or technical noise. Metals are tagged to antibodies and these are used to tag cell surface or intra-cellular markers.</span></div>
<div style="text-align: justify;">
<span style="font-size: x-small;"><br /></span></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;">Robert is presenting an overview of a <b>new method called imaging mass-cytometry (see the figure at the top of this post - it may be </b></span><span style="font-size: x-small;"><b>the most exciting thing to happen in 'omics in a while</b></span><b style="font-size: small;">)</b><span style="font-size: x-small;">. This allows spatial resolution of proteomic data </span></span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">from tissues in-situ. </span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">The system requires a new box to be bolted onto the Helios instrument to perform imaging, a UV laser vaporises tissue by scanning across the section one line at a time (approximately 1um per pixel), and the ionised tissue goes into the mass-cytometer for semi-quantitative analysis. It works with fixed or frozen tissue on standard microscope slides. The process takes approximately 1 hour to get a region 0.5mm square - highly detailed but highly focused (spatially). Robert presented software developed in the <a href="http://www.bodenmillerlab.org/">Bodenmiller group at ETH, Zurich</a>. You can do LCM-style selection and pick defined regions.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;"><br /></span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
<span style="font-size: x-small;">Robert showed some wonderful images of imaging mass-cytometry compared to IHC or FISH. Alos some lsides from <a href="http://medbio.utoronto.ca/faculty/hedley.html">David Hedley's group at Toronto</a>. You can label your own antibodies using a <a href="https://www.fluidigm.com/reagents/mass-cytometry">kit from Fluidigm</a>, but Robert showed a slide of their Immuno-Onc panel with a broad concentration range for different anitbodies- just how much empirical work tis required to get the balance right is unclear!</span></div>
<div style="text-align: justify;">
<br /></div>
</span><span style="font-family: "georgia" , "times new roman" , serif;"><div style="text-align: justify;">
<b>Imaging Mass Cytometry—about proteins, tissues and biomedical research. </b><a href="https://www.researchgate.net/profile/Valerie_Dubost">Valerie Dubost</a> and <a href="https://www.researchgate.net/profile/Markus_Stoeckli">Markus Stoeckli</a> (also on the SAB of <a href="https://www.imabiotech.com/Company">Imabiotech</a> a CRO for mass-cytometry imaging), Novartis, Switzerland.</div>
</span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">Valerie is talking about her early access results from the imaging mass-cytometry methods presented by Robert. Valeri is a histologist so her perspective will be an interesting one, and potentially give insights into how likely this technology is to make it int the clinic. Novartis haev moved quickly to build a cross-functional team to focus on mass-cytometry imaging technology application and development. Using FFZN and FFPE tissue, incubate a panel of up to 30 antibodies, slides loaded into imaging mass-cytometer for laser ablation and analysis.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">Data presented included validation of the antibodies - this is critical and too many scientific papers are messed up by the use of poorly characterised antibodies. Comparison of IHC to IMC looked excellent. She showed beautiful images of cell segmentation by Voronoi boundaries. The need to carefully consider cellular architecture is important in interpretign results from IMC - you are still going to need a pathologist to help interpret this kind of data. Pathology:Molecular Pathology:IMC Pathology is going to increase our understanding of tissue architecture, and possibly interactions.</span></div>
<div style="text-align: justify;">
<br /></div>
<span style="font-family: "trebuchet ms" , sans-serif; font-size: large;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "trebuchet ms" , sans-serif; font-size: large;"><b>Session III: Single-cell functional biology </b></span></div>
<span style="font-family: "trebuchet ms" , sans-serif; font-size: large;">
</span>
<br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>An introduction to single-cell functional biology.</b> Simon Margerison, Senior Manager, Application Support, Fluidigm </span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span><span style="font-family: "georgia" , "times new roman" , serif;"></span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;">Simon gave an overview of how the Helios and Polaris systems can be used to investigate functional single-cell biology. We heard lots about the Helios yesterday and Simon showed some Cancer data using panels where 10 markers were used for phenotpying and 30+ markers used to investigate functional biology.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
<span style="font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">However Simon spent a little more time describing the Polaris system which was not really mentioned yesterday. This is a system that allows selection of 48 single-cells, and culture them for up to 24 hours while modulating the environment - this is automated cell culture and I'm hoping Polaris is the first of many such systems that will allow highly parametric experiments to be performed where instead of a simple </span><span style="font-size: x-small;">A vs B, treated and untreated experiment, we'll do A,B,C,D,E,F & G</span><span style="font-size: x-small;">, treated at different doses and times all without being messed up in the tissue culture lab.</span></div>
</span><span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>A holistic view of the mucosal immune system: identification of tissue- and disease-specific cellular networks.</b> <a href="http://www.preventcd.com/who-we-are/f-koning">Frits Koning, Leiden University Medical Center</a>, Netherlands </span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">Frits is presenting work published recently in <a href="http://www.cell.com/immunity/abstract/S1074-7613(16)30143-1">Immunity</a>. His lab has built a mass-cytometry panel to look at heterogeneity of the adaptive and innate immune compartment, applied to Human intestinal samples (Coeliac disease). He presented data from an initial cohort of 44 patients. 8 months to generate the data, 6 months to analyse it - a common bioinformatics challenge! He showed a merge scatterplot of all 2.5 million cells from all 44 patients, the different cell types clearly separate into the canonical immume cell populations. However the different samples (PBMC vs colon) and individual patients show very different enrichments for cell populations.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">They were able to distinguish distinct mass-cytometry signatures that divide patients from controls, and were able to detect patients with mucosal lymphoid malignancies. His group has been working hard on developing computational methods to analyse these huge datasets quickly, all 5.2 million cells in 1 hour on a 32Gb laptop! See the <a href="https://graphics.tudelft.nl/cytosplore/">Cytosplore</a> website for more details. Frits was very bullish about the use of mass-cytometry in the clinic and finished by saying "we are moving towards an unbiased diagnostic tool".</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>The nature and nurture of cell heterogeneity: single-cell functional analysis, temporal single-cell sequencing and imaging of gene edited macrophages.</b> <a href="http://www.well.ox.ac.uk/ogc/centre-members">Esther Mellado, Wellcome Trust Centre for Human Genetics</a>, UK.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: x-small;">Esther's work is the focus of a <a href="https://www.fluidigm.com/spotlights#39c848f4-d08c-4212-a5e3-a9759f212d86">spotlight article</a> on Fluidigm's website. She is running the Polaris system at the WTCHG and presented her work isolating single cells and perturbing them to understand the role of macrophages in </span><span style="font-size: x-small;">HIV </span><span style="font-size: x-small;">pathology. And in particular </span><span style="font-size: x-small;">cells with mutations in SAMHD1 gene and the effect of this mutation on HIV latency. They used multiple microenvironmental conditions in early and late activation so adjusted dosing for either 1 or 8 hours, comparing mutant and wild-type macrophages across 10 replicates. </span><span style="font-size: x-small;">They performed high-resolution imaging off the Polaris to investigate morphology</span><span style="font-size: x-small;"> and behaviour. They saw that knockout of SAMDH1 has important paracrine signalling effects.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
<div style="text-align: justify;">
<span style="font-size: x-small;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">The WTCHG team call the Polaris their <i>"10 Postdocs in a Box"</i>. It allows much mire complex experiments to be performed than an individual in the lab can realistically manage. As I said above </span><span style="font-size: x-small;">I'm hoping Polaris is the first of many</span><span style="font-size: x-small;"> automated cell culture systems - and ideally we'd see instruments that can handle bulk cells too.</span></div>
</span><span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Understanding cellular heterogeneity.</b> Sarah Teichmann, Wellcome Trust Sanger Institute and EMBL-EBI, UK</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span>
<br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">Sarah is presenting her groups work on cellualr heterogeneity, it turns out that much of this is of functional significance. She stumbled upon this when doing bulk RNA-seq could not relate the abundance of transcripts to counts of single-molecule RNA-FISH. Bulk RNA is limiting, single-cell rocks!</span><br />
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">She presented data from a new publication </span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">just deposited </span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">on the </span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">BioRxiv: <a href="http://biorxiv.org/content/early/2016/09/13/074971">Temporal mixture modelling of single-cell RNA-seq data resolves a CD4+ T cell fate bifurcation</a>. They used temporal modelling of single-cell RNA-seq to analyse development of Th1 and Tfh cell populations in mice infected with Plasmodium, and show that a single cell gives rise to both cell types. I'd really suggest reading the paper.</span></div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com2tag:blogger.com,1999:blog-6334453475526523597.post-31033187422883537482016-09-14T10:53:00.002+01:002016-09-16T14:50:01.879+01:0010X Genomics publications<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Anyone that's been reading Core-Genomics will have seen my interest in the technology from <a href="http://www.10xgenomics.com/">10X Genomics</a>. I've been watching and waiting for publications to come out to get a better understanding of how people are using the technology and thought you might like my current list of articles: many of these are on the BioRxiv and should be available in a reputable journal if you're reading this in 2017 or later!</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">The number of 10X Genomics publications is going to grow rapidly; and this list will only be updated sporadically!</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXcveBT6QHgbddqhBoZ6fZizyZsnrO5ptX-OpsqrARBNGCiZBxhuENYb925bSCg-ZlYX_-e-v5ZudSNevWdua1_nU328ozvaO1hdrWnF7GqdPJPWkfMBpcgOXSQOysOfYHRiBzeJFywFa5/s1600/Screen+Shot+2016-09-09+at+09.28.31.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="61" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXcveBT6QHgbddqhBoZ6fZizyZsnrO5ptX-OpsqrARBNGCiZBxhuENYb925bSCg-ZlYX_-e-v5ZudSNevWdua1_nU328ozvaO1hdrWnF7GqdPJPWkfMBpcgOXSQOysOfYHRiBzeJFywFa5/s400/Screen+Shot+2016-09-09+at+09.28.31.png" width="400" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
<br />
<a name='more'></a><hr />
<div>
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><a href="http://biorxiv.org/content/early/2016/08/19/070425">Direct determination of diploid genome sequences</a>. </span></span><b style="font-family: georgia, "times new roman", serif;">BioRxiv 2016 </b><b style="font-family: georgia, "times new roman", serif;">Aug</b><b style="font-family: georgia, "times new roman", serif;">.</b><span style="font-family: "georgia" , "times new roman" , serif;"> </span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span></span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">This paper by Deanna Church and David Jaffe <i>et al</i> describes the <a href="http://www.10xgenomics.com/">10X Genomics Chromium</a> phasing technology. I've done a more comprehensive write up of this paper <a href="http://core-genomics.blogspot.co.uk/2016/09/10x-genomics-phasing-explained.html" target="_blank">here on Core-Genomics</a>. Essentially this is the paper to refer to if you're considering using</span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><span style="font-family: "georgia" , "times new roman" , serif;">Chromium phasing</span><span style="font-family: "georgia" , "times new roman" , serif;"> in your own research and want to </span><span style="font-family: "georgia" , "times new roman" , serif;">better understand how it works and what you can do. The authors explain the basic principles of generating LinkedReads, and present data on 7 Human genomes successfully assembled from</span><span style="font-family: "georgia" , "times new roman" , serif;"> HiSeq X data using the Supernova algorithm. Assemblies are good with 100kb+ contigs and 2.5Mb phase blocks, and the HGP sample used had excellent alignment to the reference along a 162kb contig.</span></span></div>
<div style="text-align: justify;">
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
</div>
<hr style="text-align: left;" />
<div style="text-align: left;">
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div style="text-align: left;">
<div style="text-align: justify;">
<a href="http://biorxiv.org/content/early/2016/08/07/068338" style="font-family: georgia, "times new roman", serif;">ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter</a><span style="font-family: "georgia" , "times new roman" , serif;">. </span><b style="font-family: georgia, "times new roman", serif;">BioRxiv 2016 </b><b style="font-family: georgia, "times new roman", serif;">Aug</b><b style="font-family: georgia, "times new roman", serif;">.</b><span style="font-family: "georgia" , "times new roman" , serif;"> </span><br />
<b style="font-family: georgia, "times new roman", serif;"><br /></b><span style="font-family: "georgia" , "times new roman" , serif;">The authors present ABySS 2.0 and compare it to the previous version and 5 other assemblers, BCALM2, DISCOVAR, Minia, SGA and SOAPdenovo. They used the Genome in a Bottle data: 70X coverage Human genome using Illumina paired 250bp reads (PE250) as well as mate-pair data, 10X genomics Chromium data, and BioNano optical mapping data. ABySS 2.0 generated an N50 of 3.5 Mb using only 35 GB of RAM (still won't run on your Mac Book Pro). Whilst this is not a 10X paper </span><i style="font-family: georgia, "times new roman", serif;">per se</i><span style="font-family: "georgia" , "times new roman" , serif;"> they do discuss the limitations of current short-reads and the impact the 10X technology is likely to have on assembly including the BioNano Genomics and 10x Chromium data increased N50 from 29 to 42 Mb. In Fig. 3 from the paper (see below) the authors show all of the 90 scaffolds over 3 Mb, which add up to 90% of the genome. And state that </span><i style="font-family: georgia, "times new roman", serif;">"most chromosome arms are reconstructed by 1 to 4 large scaffolds"</i><span style="font-family: "georgia" , "times new roman" , serif;">.</span></div>
</div>
</div>
</div>
<div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> </span><br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="271" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpBTfSdt3ztbn55Qn4mRSmvbyDSEnt-aSU1LdTlNf-Yj9RvgUCEX2Sxjl2k-OkuaicIhHOrfc50KW8MjjD5CIsGsIQ-R9ZjWj5Z0gGoK72HkRe5YBAEB6mS9zsC7TUQ9ogTRsFMtf9wO9W/s320/Screen+Shot+2016-09-09+at+09.55.21.png" style="margin-left: auto; margin-right: auto;" width="320" /></span></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "georgia" , "times new roman" , serif;">Fig.3 from Jackman/Vandervalk <i>et al </i>2016</span></td></tr>
</tbody></table>
<span style="font-family: "georgia" , "times new roman" , serif;"> </span><br />
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
</div>
<hr />
<div>
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><a href="http://biorxiv.org/content/early/2016/08/02/067447">High-Quality Assembly of an Individual of Yoruban Descent</a>. </span><b style="font-family: georgia, "times new roman", serif;">BioRxiv 2016 Aug.</b><span style="font-family: "georgia" , "times new roman" , serif;"> </span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span><span style="font-family: "georgia" , "times new roman" , serif;">The authors present a hybrid </span><span style="font-family: Georgia, Times New Roman, serif;">assembly of NA19240 usi</span><span style="font-family: "georgia" , "times new roman" , serif;">ng multiple technologies including PacBio, </span><span style="font-family: "georgia" , "times new roman" , serif;">BioNano genomics, Illumina sequencing, 10x Genomics LinkedReads, and BAC </span><span style="font-family: "georgia" , "times new roman" , serif;">hybridization and sequencing. They explain the need for multiple technologies given that no single method</span><span style="font-family: "georgia" , "times new roman" , serif;"> <i>"can fully resolve every genomic feature and/or region"</i>; and argue that</span><span style="font-family: "georgia" , "times new roman" , serif;"> BAC tiling is still a useful technology. I'd be interested to know how useful this might be </span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">once 10X Genomics becomes standardised as the time and cost involved in BAC library construction, mapping and sequencing, let alone the huge amount of DNA required is quite outside the reach of most labs.</span><br />
<br /><span style="font-family: "georgia" , "times new roman" , serif;">
The assembly presented is the first in a set of 5 genomes which the authors are aiming to use to improve the diversity of the reference genome. They refer to "Gold" and "Platinum" genomes but I cannot tell which the final assembly was considered. The final assembly had an N50 of 7.25 Mb and a scaffold N50 of 78.6 Mb, which according to the authors <i>"represents one of the most contiguous high-quality human genomes"</i>.</span></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
</div>
</div>
<div>
<div style="text-align: justify;">
<hr style="text-align: left;" />
<div style="text-align: left;">
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="background-color: white; color: #642a8f; font-size: 14.404px; line-height: 19.9438px;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/27159086" ref="ordinalpos=1&ncbi_uid=27159086&link_uid=27159086&linksrc=docsum_title" style="background-color: white; color: #642a8f; font-size: 14.404px; line-height: 19.9438px;">A hybrid approach for de novo human genome sequence assembly and phasing.</a> </span><b>Nat Methods. 2016 Jul:</b></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span><span style="font-family: "georgia" , "times new roman" , serif;">This</span><span style="background-color: white; font-size: 12pt;"> </span></span><span style="font-family: "georgia" , "times new roman" , serif;">paper describes a combinatorial approach to <i>de novo</i> assembly and phasing analysis using Illumina sequencing, 10X Genomics (GemCode) LinkedReads, and BioNano Genomics mapping; again using NA12878.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgk8zXHQULwA_YagwMmjW2P6K3eYKC0MKlZlWzzXUibTsLjW3OqvU_BxSwua73JdXs4-iZzq0G2x5k3flKYpRC5rhyAOsomHlgZz9B42l40hpNMhbjwOFcp047xllCKAkjifgppx_lP14G8/s1600/Screen+Shot+2016-09-13+at+14.10.37.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="185" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgk8zXHQULwA_YagwMmjW2P6K3eYKC0MKlZlWzzXUibTsLjW3OqvU_BxSwua73JdXs4-iZzq0G2x5k3flKYpRC5rhyAOsomHlgZz9B42l40hpNMhbjwOFcp047xllCKAkjifgppx_lP14G8/s320/Screen+Shot+2016-09-13+at+14.10.37.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<hr style="text-align: left;" />
<div style="text-align: left;">
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div style="text-align: left;">
<div style="text-align: justify;">
<a href="http://biorxiv.org/content/early/2016/07/26/065912" style="font-family: Georgia, "Times New Roman", serif;">Massively parallel digital transcriptional profiling of single cells</a>. <b style="font-family: georgia, "times new roman", serif;">BioRxiv 2016 July: </b><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span><span style="font-family: "georgia" , "times new roman" , serif;">This paper describes the 10X Genomics single-cell 3' mRNA-seq technology. I'd previously covered this paper </span><a href="http://core-genomics.blogspot.co.uk/2016/07/10x-genomics-single-cell-3mrna-seq.html" style="font-family: Georgia, "Times New Roman", serif;">here on Core-Genomics</a><span style="font-family: "georgia" , "times new roman" , serif;">. Essentially this is probably the paper to read to if you'd like to are considering 10X Genomics single-cell RNA-seq in your own research and want to better understand how it works and what you can do. The authors explain the basic principles of the methods, and present data from 250,000 cells across 29 samples. An awesome paper...when does it come out in a Jurnal?</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Ben Hindson (10X Genomics CSO) will be presenting this work at the <a href="http://www.festivalofgenomicscalifornia.com/seminars/massively-parallel-digital-transcriptional-profiling-of-single-cells">San Diego Festival of Genomics</a> if you'd like to know more.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> </div>
</div>
</div>
</div>
<div>
<ul class="highwire-search-results-list" style="-webkit-font-smoothing: antialiased; border: 0px; font-stretch: inherit; font-variant-numeric: inherit; margin: 0px 0px 15px; outline: 0px; padding: 0px; vertical-align: baseline;"><span style="font-family: "georgia" , "times new roman" , serif;"><div style="text-align: justify;">
</div>
<div style="text-align: center;">
<span style="font-size: large;"><a href="http://ctt.ec/51O3p">"The 1/4 of a million cell RNA-seq paper!" http://ctt.ec/51O3p+</a></span></div>
</span><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span><div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"> </span></div>
</div>
<hr style="text-align: justify;" />
<div style="text-align: left;">
<div style="text-align: justify;">
</div>
</div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"></span></span></div>
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"> </span></div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br /><div style="text-align: justify;">
<a href="http://www.ncbi.nlm.nih.gov/pubmed/27271295">Extensive sequencing of seven human genomes to characterize benchmark reference materials.</a> <b>Sci Data. 2016 Jun (originally on <a href="http://biorxiv.org/content/early/2015/12/23/026468">BioRxiv</a>).</b> </div>
<div style="text-align: justify;">
<br /></div>
</span><span style="font-family: Georgia, Times New Roman, serif;"><div style="text-align: justify;">
This is the massive GIAB (Genome in a Bottle Consortium) paper describing NA12878 and other reference materials sequenced across multiple technologies. These include: Illumina WGS paired-end, mate-pair, <a href="http://www.illumina.com/technology/next-generation-sequencing/long-read-sequencing-technology.html">Moleculo</a> and exomes, PacBio, BioNano Genomics, Ion Proton exome, 10X Genomics GemCode, <a href="https://nanoporetech.com/">Oxford Nanopore MinION</a>; and the now defunct SOLiD, and Complete Genomics paired-end and LFR technologies. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The coverage for the 10X Genomics data is only 25x and was produced using the GemCode platform so is not really representative of what 10X Genomics would reccomend today. The data is available from <a href="http://software.10xgenomics.com/giab2015">10X Genomics</a> and from the GIAB ftp. </div>
</span><span style="font-family: "georgia" , "times new roman" , serif;"><br /><div style="text-align: justify;">
<ul class="highwire-search-results-list" style="border: 0px; font-family: Times; font-stretch: inherit; margin: 0px 0px 15px; outline: 0px; padding: 0px; text-align: left; vertical-align: baseline;"><span style="font-family: "georgia" , "times new roman" , serif;"></span></ul>
<span style="font-family: "georgia" , "times new roman" , serif;"></span></div>
</span><div style="text-align: justify;">
</div>
</ul>
</div>
<div>
<div style="text-align: justify;">
<hr style="text-align: justify;" />
<div style="text-align: left;">
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span class="highwire-citation-authors" style="-webkit-font-smoothing: antialiased; border: 0px; font-stretch: inherit; font-variant-numeric: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="http://biorxiv.org/content/early/2016/06/09/052225">Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning</a>. </span></span><b style="font-family: georgia, "times new roman", serif;">BioRxiv 2016 Jun.</b><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /><div style="text-align: justify;">
This paper presents a novel clustering method for single-cell RNA-seq data. One of the data sets they used was the 10X Genomics single-cell RNA-seq of PBMCs from Zheng et al 2016. They present their method: SIMLR (single-cell interpretation via multi- kernel learning), and show that it more accurately defines subpopulations from single-cell data than either <a href="http://www.cs.toronto.edu/~hinton/absps/tsne.pdf">t-SNE</a> or PCA methods.</div>
<div style="text-align: justify;">
<br /></div>
</span></div>
</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcorGOPYqpUhrEITUEef-vLdtZIcvbLjY-K1ejGbJSlpaD_oeO0QjbDPjAJb3SC9PAUlvO061kbB9CND9ya9cC62reSYhhkXSUTJaFSRNO4ZWF0t0IUH0xqsg4xJvK7mAwqBtGdSKTJRnU/s1600/Screen+Shot+2016-09-12+at+09.37.14.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcorGOPYqpUhrEITUEef-vLdtZIcvbLjY-K1ejGbJSlpaD_oeO0QjbDPjAJb3SC9PAUlvO061kbB9CND9ya9cC62reSYhhkXSUTJaFSRNO4ZWF0t0IUH0xqsg4xJvK7mAwqBtGdSKTJRnU/s400/Screen+Shot+2016-09-12+at+09.37.14.png" width="400" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><span style="text-align: justify;">Fig 5: 2D visualisation of data from 5 cell sub-populations by PCA (b) </span><span style="text-align: justify;">SIMLR </span><span style="text-align: justify;">(c) and </span><span style="text-align: justify;">t- SNE</span><span style="text-align: justify;"> </span><span style="text-align: justify;">(d).</span></span></td></tr>
</tbody></table>
<div style="text-align: center;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
</div>
<div>
<div style="text-align: justify;">
<hr style="text-align: left;" />
<div style="text-align: left;">
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><a href="http://science.sciencemag.org/content/352/6284/474">Health and population effects of rare gene knockouts in adult humans with related parents</a>. </span><b style="font-family: Georgia, "Times New Roman", serif; text-align: justify;">Science Apr 2016 (originally on <a href="http://biorxiv.org/content/early/2015/11/14/031641">BioRxiv</a>).</b><br />
<b style="font-family: georgia, "times new roman", serif;"><br /></b>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif; text-align: justify;">This paper presents the use of 10X Genomics phased genome sequencing as a confirmatory method in a study identifying gene knockouts created by rare homozygous predicted loss of function (rhLOF) variants from exome sequencing data. In one case a PRDM9 </span><span style="text-align: justify;">rhLOF </span><span style="text-align: justify;">was confirmed by 10X Genomics sequencing. PDRM9 is </span><span style="text-align: justify;">a gene involved in the localisation of meiotic crossovers, h</span><span style="text-align: justify;">owever the individual was healthy and fertile. The results suggest there are </span><span style="font-family: "georgia" , "times new roman" , serif; text-align: justify;">alternative mechanisms of localising human meiotic crossovers </span><span style="text-align: justify;">as PRDM9 LOF leads to infertility in mice and an inability to repair double-strand breaks. The authors state</span><span style="text-align: justify;"> that we need to be careful when interpreting predicted loss of function events.</span></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">
</span></div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
<hr style="text-align: left;" />
<div style="text-align: left;">
<div style="text-align: justify;">
<a href="http://biorxiv.org/content/early/2016/04/13/048603" style="font-family: georgia, "times new roman", serif;"><span style="font-family: "georgia" , "times new roman" , serif;">Third-generation sequencing and the future of genomics</span><span style="font-family: "georgia" , "times new roman" , serif; line-height: 21.1467px;">.</span></a> <b style="font-family: georgia, "times new roman", serif;">BioRxiv April 2016.</b><br />
<b style="font-family: georgia, "times new roman", serif;"><br /></b>
<span style="font-family: "georgia" , "times new roman" , serif;">This review of third-generation NGS systems describes 10X Genomics Chromium genome technology as a mapping, rather than a sequencing application. 10X Genomics is lumped in with BioNano Genomics, <a href="https://dovetailgenomics.com/" target="_blank">Dovetail Genomics</a> <a href="http://genome.cshlp.org/content/early/2016/02/08/gr.193474.115.long" target="_blank">cHiCago (HiC) method</a>, genetic maps and mate-pair mapping. The paper includes a great table highlighting the characteristics of the different 3rd-gen platforms (reproduced below).</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjpjFmGCgofK4Bxw7oghZ1jdEtVS2Z3ELPtIsCQmLo_nn-XEdC6ZZ5583AiFNEVuBFFlMJILjjQeuVMiP8sN080xnWe51Sh7jM7ekz1iTkk2MUrSR1rKsl7jIzTGAypHd9O4aCtqLyTDYw/s1600/Screen+Shot+2016-09-12+at+14.40.10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="343" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjpjFmGCgofK4Bxw7oghZ1jdEtVS2Z3ELPtIsCQmLo_nn-XEdC6ZZ5583AiFNEVuBFFlMJILjjQeuVMiP8sN080xnWe51Sh7jM7ekz1iTkk2MUrSR1rKsl7jIzTGAypHd9O4aCtqLyTDYw/s400/Screen+Shot+2016-09-12+at+14.40.10.png" width="400" /></a></div>
<br /></div>
</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<div style="text-align: left;">
<div style="text-align: justify;">
<div style="text-align: left;">
<div style="text-align: justify;">
<hr style="text-align: left;" />
</div>
</div>
</div>
</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><a href="http://www.nature.com/nbt/journal/v34/n3/full/nbt.3432.html">Haplotyping germline and cancer genomes with high-throughput linked-read sequencing.</a> </span><b style="font-family: georgia, "times new roman", serif;">Nat Biotechnol. 2016 Mar.</b><br />
<b style="font-family: georgia, "times new roman", serif;"><br /></b>
<span style="font-family: "georgia" , "times new roman" , serif;">This paper from <a href="http://dna-discovery.stanford.edu/">Hanlee Ji's group at Stanford</a> and 10X's Ben Hindson <i>et al</i> describes the 10X Genomics GemCode phasing technology. It is the first paper to demonstrate that droplet methods for phasing and structural variant analysis. This is the other paper you should refer to if you'd like to are considering phasing in your own research, but </span><span style="font-family: "georgia" , "times new roman" , serif;">the </span><span style="font-family: "georgia" , "times new roman" , serif;">more up-to-date Chromium</span><span style="font-family: "georgia" , "times new roman" , serif;"> </span><a href="http://biorxiv.org/content/early/2016/08/19/070425" style="font-family: Georgia, "Times New Roman", serif;">BioRxiv Church/Jaffe paper</a><span style="font-family: "georgia" , "times new roman" , serif;"> (see above) will give you better information about the technical performance today (Sept 2016).</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7FB1f_bIYNnvDT_qXZkBjxeatht18Z8a_6RD75dmRfqTmjlMiEJlPeS6WjSo64yVGbUgFm4aj0tD_KWaj7ooX3FlbigbjYkYyu9WuRWyqfnybWJHboBkVXxg70HmeP49j_32dTp7F-a3-/s1600/Screen+Shot+2016-09-12+at+20.57.40.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7FB1f_bIYNnvDT_qXZkBjxeatht18Z8a_6RD75dmRfqTmjlMiEJlPeS6WjSo64yVGbUgFm4aj0tD_KWaj7ooX3FlbigbjYkYyu9WuRWyqfnybWJHboBkVXxg70HmeP49j_32dTp7F-a3-/s320/Screen+Shot+2016-09-12+at+20.57.40.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig1: !0X technology overview</td></tr>
</tbody></table>
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">This paper demonstrates what can be done for cancer genomes, and that is what makes it such an important read for people deciding if the 10X the might be useful in their research here at the CRUK Cambridge Institute. I've previously written about </span><span style="font-family: "georgia" , "times new roman" , serif;"><a href="http://core-genomics.blogspot.co.uk/2015/03/10x-genomics-whats-fuss-over-phasing.html">why I'm excited about using phasing</a> to resolve complex </span><span style="font-family: "georgia" , "times new roman" , serif;">structural rearrangements and determine if multiple variants in the same gene are in cis/trans (cis- is on the same allele, trans- is on the second allele).</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">For a single colorectal cancer patient they generated 50x Illumina WGS and 30x 10X Genomics WGS </span><span style="color: #999999;"><span style="font-family: "georgia" , "times new roman" , serif;">(the choice of the name 10X Genomics requires some explanation, written as it is here only the inclusion of the work "genomics" in the sentence makes it easily interpretable. And given that most 10X Genomics phasing data will be generated on Illumina's X Ten we're having to ask for "10X on X Ten" or "an X-Ten 10X genome" - I get them mixed up in conversations and I know PIs and post-dos do too!)</span></span><span style="font-family: "georgia" , "times new roman" , serif;"> Multiple deleterious cancer mutations, including the known driver genes TP53 and NRAS, five rearrangements and 26 copy-number variants were</span><span style="font-family: "georgia" , "times new roman" , serif;"> found </span><span style="font-family: "georgia" , "times new roman" , serif;">. The most interesting result presented was a </span><span style="font-family: "georgia" , "times new roman" , serif;">C>T mutation in </span><span style="font-family: "georgia" , "times new roman" , serif;">TP53</span><span style="font-family: "georgia" , "times new roman" , serif;"> that causes a </span><span style="font-family: "georgia" , "times new roman" , serif;">deleterious nonsynonymous R213Q substitution, confirmed in the LinkedRead data as being on one haplotype. The other haplotype was shown to be deleted in the same region leading to LOH, with the only copy present having the </span><span style="font-family: "georgia" , "times new roman" , serif;">TP53 C>T mutation resulting in a single but inactvated copy of TP53. </span><span style="font-family: "georgia" , "times new roman" , serif;">This phased cancer genome was produced from 1ng of ~50kb DNA, from a sample with 70% tumour purity - this is pretty close to many samples that people are collecting, but the careful reporting of this kind of information is going to be vital as we understand which samples might sensibly be run on the 10X Genomics tech, and which we should leave for now.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">A previously validated </span><span style="font-family: "georgia" , "times new roman" , serif;">EML4-ALK </span><span style="font-family: "georgia" , "times new roman" , serif;">translocation was detected in </span><span style="font-family: "georgia" , "times new roman" , serif;">lung cancer cell line NCI-H2228. To target the exome for phasing 10X Genomics and Agilent have <a href="http://www.agilent.com/about/newsroom/presrel/2016/11feb-ca16004.html">partnered on a modified capture panel</a> that includes baits designed to target the introns and improve pull-down of the large genomic fragments. The sequencing of 200X was after removal of duplicates so this could be very deep sequencing indeed. However the 10X Genomics data revealed that this is not a</span><span style="font-family: "georgia" , "times new roman" , serif;"> simple inversion, but is a more complex with a deletion including exons 2–19 of ALK.</span><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;">They discuss the Moleculo tech (actually refs 6-9 from the paper) from Illumina pointing out the main reasons that these methods are sub-optimal are the relatively large amount of DNA used and the relatively low number of partitions generated - both limiting how well the technology can be applied.</span><br />
<div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">The authors conclude their discussion with the following statement <i>"</i><span style="font-style: italic;">phased cancer genomes will provide new insight into the genomic alterations underlying tumor development and maintenance</span><i>"</i>. I think the next few months will see other papers being published confirming how useful the technology really is. And who knows how soon we might see a phasing panel specifically for DNA repair genes being used in the clinic for instance?</span><br />
<br /></div>
</div>
<div>
<div style="text-align: justify;">
<hr style="text-align: left;" />
<div style="text-align: left;">
<div style="text-align: justify;">
</div>
</div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> </span> </span><br />
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/26963554">Haplotypes drop by drop</a>. </span><b style="font-family: georgia, "times new roman", serif;">Nat Biotechnol 2016 Mar.</b><br />
<b style="font-family: georgia, "times new roman", serif;"><br /></b>
<span style="font-family: "georgia" , "times new roman" , serif;">In this news and views article <a href="https://medicine.umich.edu/dept/dcmb/jacob-kitzman-phd">Jacob Kitzman (University of Michigan)</a> describes the data from the <a href="http://www.nature.com/nbt/journal/v34/n3/full/nbt.3432.html">Zheng et al</a> paper (see above) in the same issue, and explores the impact it might have in the field. This paper clearly describes the issue that clinicians want to understand: are both copies of a gene affected e.g. as in cystic fibrosis, where two mutations, one on each allele knock out both copies of the CTFR gene, or if the same haplotype is hit twice with mutations in <span style="font-style: italic;">cis.</span></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-style: italic;"><br /></span></span>
<span style="font-family: "georgia" , "times new roman" , serif;">He suggests how other methods might be improved by the use of 10X phasing technology including metagenomics (we're trying this with a collaborator), and for phasing</span><span style="font-family: "georgia" , "times new roman" , serif;"> cDNA to analyse transcriptomes more deeply with regards splice isoform diversity.</span><br />
<div>
<br /></div>
<span style="font-family: "georgia" , "times new roman" , serif;">One of the questions Kitzman poses is <i>"Whether the 10X Genomics platform will be widely adopted may depend as much on its cost above and beyond standard whole-genome shotgun sequencing as on its technical merit."</i> The papers above are showing just how useful the 10X Genomics tech is turing out to be...but as I said at the start of this post this list is going to grow rapidly; and this list will only be updated sporadically!</span><br />
<div>
<br /></div>
<div class="page" title="Page 1">
<div class="section">
<div class="layoutArea">
<div class="column">
<div class="page" title="Page 3">
<div class="section">
<div class="layoutArea">
<div class="column">
<br /></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com1tag:blogger.com,1999:blog-6334453475526523597.post-373804136201961262016-09-09T11:57:00.001+01:002016-09-09T15:47:42.625+01:0010X Genomics phasing explained<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">This post follows on from my previous one explaining the <a href="http://core-genomics.blogspot.co.uk/2016/07/10x-genomics-single-cell-3mrna-seq.html">10X Genomics single-cell mRNA-seq assay</a>. This time round I'm really reviewing the method as described in a paper recently put up on the BioRxiv by 10X's Deanna Church and David Jaffe</span><span style="font-family: "georgia" , "times new roman" , serif;">: </span><a href="http://biorxiv.org/content/early/2016/08/19/070425" style="font-family: georgia, "times new roman", serif;">Direct determination of diploid genome sequences</a><span style="font-family: "georgia" , "times new roman" , serif;">. This follows on from the earlier <a href="http://www.nature.com/nmeth/journal/v13/n7/full/nmeth.3865.html" target="_blank">Nat Methods paper</a> which was the first </span><span style="font-family: "georgia" , "times new roman" , serif;">10X de novo assembly of NA12878, but on the GemCode system. </span><span style="font-family: "georgia" , "times new roman" , serif;">While we are starting some phasing projects on our 10X Chromium box the more significant interest has been on the single cell applications. But if we can combine the two methods (or something else) to get single-cell CNV then 10X are onto a winner!</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.10xgenomics.com/products/" target="_blank"><img border="0" height="147" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizoEnRK14aZ8PnCNavh_tZax_CLv3W-SPF4R5thDBy2gbYkSzd6NYeX6vn6AKHqDmPrR4eRnHhbXu1bdJsE5kk4HrDMU89unypJdVCHMHeGIdafIFrrbyTZ-AGcEqAxZmaFd0P3rRGadlb/s400/Screen+Shot+2016-09-09+at+11.17.54.png" width="400" /></a></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<a name='more'></a><br />
<span style="font-family: "georgia" , "times new roman" , serif;">The paper describes the <a href="http://www.10xgenomics.com/">10X Genomics Chromium</a> phasing technology. They highlight the impact of their tech by first reminding us that the majority of Human genomes sequenced to date are analysed by alignment to the reference (an important point often forgotten by users). They say that only a few <i>de novo</i> Human assemblies have been created, but that most do not truly represent complex biological genomes. The authors only consider two published genomes as true diploid <i>de novo</i> assemblies - Levy <i>et al</i>. PLoS Biol 2008: <a href="http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0050254">The diploid genome sequence of an individual human</a> and Cao <i>et al</i>. Nat Biotech 2015. <a href="http://www.nature.com/nbt/journal/v33/n6/full/nbt.3200.html">De novo assembly of a haplotyperesolved human genome</a>.</span><br />
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>The method:</b> They introduce the 10X Chromium library prep. This starts with 1.25ng of >50kb DNA, from which 16bp barcoded random genomic loci are copied <span style="color: #999999;">(by polymerase extension?)</span> inside the Chromium gel-beads. Each of these contains around 10 molecules per droplet equal to ~0.5 Mb of the genome. The most important bit of the tech is the ability to put just 0.01% of the diploid Human genome into a single droplet - this makes the probability of both alleles being present vanishingly small. With 2 lanes of X Ten you can expect to get about 60X Human genome coverage and the authors calculate the number of "linked reads" per molecule as 60, which equates to around 0.4x coverage (enough for shallow CNV sequencing to reveal clonality in Tumours perhaps).</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="color: #999999; font-family: "georgia" , "times new roman" , serif;"><b>Question to the authors:</b> I do not understand the statement about smaller genomes getting lower linked read coverage: <i>"For smaller genomes, assuming that the same DNA mass was loaded and that the library was sequenced to the same readdepth, the number of LinkedReads (read pairs) per molecule would drop proportionally, which would reduce the power of the data type. For example, for a genome whose size is 1/10th the size of the human genome (320 Mb), the mean number of LinkedReads per molecule would be about 6, and the distance between LinkedReads would be about 8 kb, making it hard to anchor barcodes to short initial contigs."</i> My first assumption was that genome size would have no impact on linked read depth, but it would significantly affect the amount of the genome present in a single droplet. As such the smaller genome, with DNA fragments of the same size should still have around 60 linked reads per DNA molecule, but a 10MB genome would mean 5% was in each droplet making the phasing much harder to determine. <b>Please feel free to explain this to me.</b></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>The data:</b> In the paper they present data from seven Human genomes, sequenced on <a href="http://www.illumina.com/systems/hiseq-x-sequencing-system/system.html" target="_blank">HiSeq X Ten</a>, and assembled using the <i>"pushbutton"</i> Supernova algorithm (it won't run on your Mac Book Pro as you'll need >384Gb of RAM). In just two days per genome they generated 100kb+ contigs with 2.5Mb phase blocks. The 7 genomes include 4 with parental data to verify phasing results, as well as one sample used in the HGP. They include a figure (see below) showing the Supernova assembly of the HGP sample aligned to a 162kb clone which is part of the GRCh37 reference. It almost completely matches the reference sequence with the 8 variants including just 1 SNV (green), but 6 homopolymer and 1 di-nucleotide repeat length variants (blue/cyan). The sceond figure shows the representation of the path a FASTA sequence takes through the "megabubbles" separating parental alleles, and "microbubbles" caused by longer repeats and homopolymers.</span></div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifA7IMpcmPfqvWm2xFmpT9rFqfKX5IeJCl7xoVpiCXNutVEyjNWpdOqKo7d5OywVyVo1ow-8qz9UMTZ7QkwVzipa8_D4HJtxnHsnQ7mwYtqKnrszgY0bSsebPEzxSsxG5fEr9uYQEWTD0M/s1600/Screen+Shot+2016-09-09+at+09.31.58.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="182" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifA7IMpcmPfqvWm2xFmpT9rFqfKX5IeJCl7xoVpiCXNutVEyjNWpdOqKo7d5OywVyVo1ow-8qz9UMTZ7QkwVzipa8_D4HJtxnHsnQ7mwYtqKnrszgY0bSsebPEzxSsxG5fEr9uYQEWTD0M/s320/Screen+Shot+2016-09-09+at+09.31.58.png" width="320" /></span></a></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br class="Apple-interchange-newline" /></span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXcveBT6QHgbddqhBoZ6fZizyZsnrO5ptX-OpsqrARBNGCiZBxhuENYb925bSCg-ZlYX_-e-v5ZudSNevWdua1_nU328ozvaO1hdrWnF7GqdPJPWkfMBpcgOXSQOysOfYHRiBzeJFywFa5/s1600/Screen+Shot+2016-09-09+at+09.28.31.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="61" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXcveBT6QHgbddqhBoZ6fZizyZsnrO5ptX-OpsqrARBNGCiZBxhuENYb925bSCg-ZlYX_-e-v5ZudSNevWdua1_nU328ozvaO1hdrWnF7GqdPJPWkfMBpcgOXSQOysOfYHRiBzeJFywFa5/s400/Screen+Shot+2016-09-09+at+09.28.31.png" width="400" /></span></a></td></tr>
<tr><td class="tr-caption" style="font-size: 12.8px;"><span style="text-align: left;"><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">Who's careful hand at 10X Genomics drew this representation of FASTA?</span></span><br />
<div>
<span style="text-align: left;"><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><br /></span></span></div>
</td></tr>
</tbody></table>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Tuning 10X phasing to your needs: </b>Users may be able to "tune" scaffold N50 by varying DNA length or sequencing coverage. A single X Ten lane generating 30x coverage looks like it would push scaffold N50 down from 17 to 12 Mb. DNA quality is probably most important and I suspect many people will accept a significant improvement in phasing estimation from lower cost experiments.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">Many groups will also want to run differently sized genomes and will need to estimate how much DNA to use and how much sequencing they'll require. For small genomes this gets really interesting and 10X could be an awesome metagenomics tool allowing strain level analysis of complex samples. For the larger non-Human genomes people will need to us a much smaller amount of DNA in a single run, which may limit the number of genome copies to an unreasonable level.</span><br />
<ul>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Human 3Gb = 1ng = 300 genome copies</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Wheat 5Gb = 0.67ng</span><span style="font-family: "georgia" , "times new roman" , serif;"> = 135 genome copies</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Maize 20Gb = 0.17ng</span><span style="font-family: "georgia" , "times new roman" , serif;"> = 8 genome copies</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Salmander 50Gb = 0.07ng</span><span style="font-family: "georgia" , "times new roman" , serif;"> = 1.3 genome copies</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Paris japonica 150Gb = 0.02ng</span><span style="font-family: "georgia" , "times new roman" , serif;"> = 0.15 genome copies</span></li>
</ul>
<br />
<b style="font-family: georgia, "times new roman", serif;">Who's going to use Chromium phasing:</b><span style="font-family: "georgia" , "times new roman" , serif;"> Is this kind of data going to be relevant enough for people to adopt 10X Chromium as the default genome library prep? I suspect many teams are working on 100s or even 1000s of 10X Genomics genomes right now and we'll see many more publications very soon. If the $500 Chromium prep can add real value (biologically or clinically) then 10X have a real chance of becoming a new standard for library prep. If that's the case I guess we'll see how strong their IP is as the competition builds their own variants of the technology.</span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com2tag:blogger.com,1999:blog-6334453475526523597.post-79658240881882829652016-09-05T13:21:00.001+01:002016-09-05T13:21:29.053+01:00Nuclear sharks live for 400 years<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">A wonderful paper in a recent edition of <a href="http://science.sciencemag.org/content/353/6300/702.full">Science</a> uses radiocarbon dating to show that the Greenland shark can live for up to 400 years - making it the longest lived vertebrate known. See: <b><a href="http://science.sciencemag.org/content/353/6300/702.full">Eye lens radiocarbon reveals centuries of longevity in the Greenland shark (Somniosus microcephalus)</a></b>.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5eC0pFWsVfVjsor8V8IojJHAZjFFTgbK7J0oiWUeAjfY9oKDrOsNXK1J1H65zvR1rHShQhBlENsupCDix2DlQyzCZJzepchWlpOiZCupR5Sgbkvxq8oy-5AuuC-_XpslIW8k3R-iBd3zR/s1600/Screen+Shot+2016-09-02+at+19.39.21.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5eC0pFWsVfVjsor8V8IojJHAZjFFTgbK7J0oiWUeAjfY9oKDrOsNXK1J1H65zvR1rHShQhBlENsupCDix2DlQyzCZJzepchWlpOiZCupR5Sgbkvxq8oy-5AuuC-_XpslIW8k3R-iBd3zR/s400/Screen+Shot+2016-09-02+at+19.39.21.png" width="400" /></a></div>
<br />
<a name='more'></a><div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><i>“Who would have expected that nuclear bombs [one day] could help to determine the life span of marine sharks?”</i> The authors used measurements of 14C radiocarbon isotopes in eye lens nuclei to estimate life span of around 300 years, with the oldest animal approximately 392 years old. A complication in their analysis was the “bomb pulse”: the the pulse of carbon-14 produced by nuclear tests in the 1950s. This creates a spike in radiocarbon levels, however only the two smallest, and presumably youngest, of the 28 animals analysed had the high 14C levels associated with the bomb tests.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Why eye lens nuclei? </b>It turns out that the lens is made from metabolically inert crystalline proteins, and the nucleus, which is formed during prenatal development, retains proteins synthesised at age 0.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>No sex until you're 150: </b>But this longevity comes at a price though, and for the Greenland shark the price is that sexual maturity is not reached for a very long time - around a female sharks 156th </span><span style="font-family: georgia, "times new roman", serif;">birthday!</span><br />
<span style="font-family: georgia, "times new roman", serif;"><br /></span>
<span style="font-family: georgia, "times new roman", serif;">Animals that live this long are rare, and horribly susceptible to Human activities; primarily fishing, shipping and pollution in the case of marine vertebrates. Most of animals used in this study came from several years of collecting dead sharks, many of them accidentally ensnared when trawling for commercial catches.</span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com1tag:blogger.com,1999:blog-6334453475526523597.post-38265905503518429912016-09-02T14:21:00.000+01:002016-09-05T13:39:42.683+01:00Sequencing base modifications: going beyond mC and 5hmC<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">A great new resource was recently brought to my attention on Twitter and there is a paper describing it on the BioRxiv: <a href="http://biorxiv.org/content/early/2016/08/26/071712">DNAmod: the DNA modification database</a>. Nearly all of the modified nucleotide sequencing we hear and read about is modifications to Cytosine mostly methyl cytosine and hydroxymethyl cytosine; you may also have heard about 8-oxoG if you are interested in FFPE analysis. All sorts of modified nucleotides occur in nature and may be important in biological processes where they can vary across tissue of an organism, or may just be chemical noise. The modifications are most important when they change the properties of the DNA strand, how is is read, and what might or might not bind to it e.g mC.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxDwYesFI4ttgU-MKLZeS4DJxqOHs3NqmV-hvrz4hq-FKcxz5OqiANcguJYog6CjO5v4SHoVDqnVXCnaZC2etkYwtTnQkrSGo9LznyixluXdWmdxdx-1OuRdk7BJovMx_0Cg4j_b2xioPK/s1600/Screen+Shot+2016-09-02+at+13.40.03.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="116" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxDwYesFI4ttgU-MKLZeS4DJxqOHs3NqmV-hvrz4hq-FKcxz5OqiANcguJYog6CjO5v4SHoVDqnVXCnaZC2etkYwtTnQkrSGo9LznyixluXdWmdxdx-1OuRdk7BJovMx_0Cg4j_b2xioPK/s400/Screen+Shot+2016-09-02+at+13.40.03.png" width="400" /></span></a></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<a name='more'></a><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The biology of base modification is very complex - DNA methyltransferase marking Cytosine with a 5-methyl, TET family enzymes oxidising 5-methylcytosine to 5-hydroxymethylcytosine, and thymine DNA glycosylase-mediated base excision repair back to unmodified Cytosine. Many groups have worked on methods to sequence modified bases, with Shankar Balasubramanian's research group here in Cambridge most closely associated with 5hmC-seq in his <a href="https://www.cambridge-epigenetix.com/">CEGX spinout</a>.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7SccXr3cOmhLnpAB2b69pd0WgFbDQfAmVYD4BqZ84WTi2aM5jRADFHLns9esf_kVkUVfk-mCim-mP4OAu1OmSJL3owX40VdFFI3oFaN9REOS-NXtoG21MNblxhLb6sSW9RLMX8z6hdvcn/s1600/Screen+Shot+2016-09-02+at+13.42.52.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="184" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7SccXr3cOmhLnpAB2b69pd0WgFbDQfAmVYD4BqZ84WTi2aM5jRADFHLns9esf_kVkUVfk-mCim-mP4OAu1OmSJL3owX40VdFFI3oFaN9REOS-NXtoG21MNblxhLb6sSW9RLMX8z6hdvcn/s200/Screen+Shot+2016-09-02+at+13.42.52.png" width="200" /></a><span style="font-family: "georgia" , "times new roman" , serif;"><b>DNAmod DB: </b><a href="https://www.pmgenomics.ca/hoffmanlab/proj/dnamod">The DNA modification database</a> lists 38 modified bases, only 7 of which only been observed synthetically. It gives each a brief description of each modified base including the likely biological function, and most importantly for readers of Core Genomics it lists the methods that can be used to map the modifications in the genome.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Unfortunately it appears to miss the <a href="http://www.ncbi.nlm.nih.gov/pubmed/22539555">OxBS-seq</a> method published by Booth <i>et al</i> in 2012, but does have the competing <a href="http://www.ncbi.nlm.nih.gov/pubmed/22608086">TAB-seq</a> method published by Yu <i>et al</i> in the same year.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Not all bases are modified to the same extent: </b>There are a total of 128 modified nucleotides reported in the unverified list on DNAmod. I'd assumed modifications would be about the same number for each of the biological building blocks but they vary quite significantly: Uracil has 45 mods <span style="text-align: left;">(I'm guessing modifications in ribonucleotides need less careful control?), </span><span style="text-align: left;">Adenine (39) </span><span style="text-align: left;">has nearly twice as many modifications as Guanine (19), and </span></span><span style="font-family: "georgia" , "times new roman" , serif;">Cytosine (13) and </span><span style="font-family: "georgia" , "times new roman" , serif;">Thymine (12) have the least.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="background-color: #f0f7f4; color: #333333; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 15px; line-height: 21.4286px; text-indent: -22.5px;"><b>Citation: </b>Sood AJ, Viner C, Hoffman MM. 2016. </span><a href="http://doi.org/10.1101/071712" style="background-color: #f0f7f4; box-sizing: border-box; color: mediumslateblue; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 15px; line-height: 21.4286px; text-decoration: none; text-indent: -22.5px;">DNAmod: the DNA modification database</a><span style="background-color: #f0f7f4; color: #333333; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 15px; line-height: 21.4286px; text-indent: -22.5px;">. bioRxiv 071712.</span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com6tag:blogger.com,1999:blog-6334453475526523597.post-530536790644358422016-09-01T09:54:00.005+01:002016-09-05T13:39:59.046+01:00Celebrating 10 years at the CRUK-Cambridge Institute today<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Today I have been working for <a href="http://www.cancerresearchuk.org/">Cancer Research UK</a> for ten years! September 1st 2006 seems like such a short time ago but a huge amount has changed in that time in the world of Genomics. NGS has changed the way we do biology, and is changing the way we do medicine. The original Solexa SBS has been pushed hard by Illumina to give us the $1000 genome, and perhaps just as exciting are the <a href="http://biorxiv.org/content/early/2016/08/12/068809">results coming out of Oxford Nanopore's MAP community</a> - this maybe the technology to displace Illumina? What the next ten years will hold is difficult to predict, but today I wanted to focus on the highlights of the last ten years at CRK for me.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYERDdOInDyA0D4VKIKxdG-wbVJhP1n1lQV3intOdHHpulvzcp8b_ExO1zBa3Xi7Oj5FDaFBxwMCjY8PEJ90wGMz2r3PNL0PHLwlzqr75deNO-OIejA_BdVYPuiGRWY6WkEw_Og682C_LM/s1600/Screen+Shot+2016-09-01+at+08.20.35.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="152" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYERDdOInDyA0D4VKIKxdG-wbVJhP1n1lQV3intOdHHpulvzcp8b_ExO1zBa3Xi7Oj5FDaFBxwMCjY8PEJ90wGMz2r3PNL0PHLwlzqr75deNO-OIejA_BdVYPuiGRWY6WkEw_Og682C_LM/s320/Screen+Shot+2016-09-01+at+08.20.35.png" width="320" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "georgia" , "times new roman" , serif;">CRUK-Cambridge Institue circa early 2006</span></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"></span></div>
<a name='more'></a><span style="font-family: "georgia" , "times new roman" , serif;">I was employed to build a brand new genomics facility and was hired for my expertise in gene expression microarrays - previously I'd set up an Affymetrix facility at the </span><a href="http://www.jic.ac.uk/" style="font-family: Georgia, "Times New Roman", serif;">John Innes Centre</a><span style="font-family: "georgia" , "times new roman" , serif;"> in what is now the </span><a href="http://www.earlham.ac.uk/" style="font-family: Georgia, "Times New Roman", serif;">Earlham Institute</a><span style="font-family: "georgia" , "times new roman" , serif;">. </span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="text-align: left;">Perhaps the one thing I remember from my interview is the answer to a question I'd posed at the end <i>"Will the CRUK institute be using the new next-generation sequencing technologies?"</i> NGS was still in its infancy then, in late 2005 the </span><a href="http://www.nature.com/nature/journal/v437/n7057/abs/nature03959.html" style="text-align: left;">first 454</a><span style="text-align: left;"> paper made a big splash, and a Solexa sequencer has been installed at the Sainsbury lab in Norwich and I'd heard interesting things about the technology.</span></span><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The answer was something like <i>"we want this facility to focus on microarrays, we'll see if the NGS comes to anything useful"</i>. Well everyone reading Core-Genomics knows how disruptive NGS was, microarrays are dead (for gene expression anyway) and virtually all the data we generate in my lab comes off an <a href="http://www.illumina.com/systems/hiseq-3000-4000.html">Illumina HiSeq</a> sequencer.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">When I arrived the site had only just been handed over by the builders. In January of 2007 we had the first instruments installed and were processing Sanger sequencing and Illumina arrays by the Spring. But we'd decided to get our first sequencer and our initial discussions with the Solexa rep ended up with the purchase of an Illumina GAI. The rest as they say is history.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br /></b></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Highlights from the last ten years:</b> The Institute celebrates its 10th anniversary in February of 2017 so I'll not go into too much detail about the top ten projects the Genomics core has been involved with. But I did want to pick upon three projects that I was personally involved with and that I think were major advances.</span></div>
<ol>
<li><div style="text-align: justify;">
<b style="font-family: Georgia, "Times New Roman", serif;">Understanding gene regulation: </b><span style="font-family: "georgia" , "times new roman" , serif;">In a wonderful paper: </span><a href="http://www.ncbi.nlm.nih.gov/pubmed/18787134" style="font-family: Georgia, "Times New Roman", serif;">Species-specific transcription in mice carrying human chromosome 21</a>,<span style="font-family: "georgia" , "times new roman" , serif;"> </span><a href="http://www.wilsonlab.org/" style="font-family: Georgia, "Times New Roman", serif;">Mike Wilson</a><span style="font-family: "georgia" , "times new roman" , serif;">, in </span><a href="http://www.cruk.cam.ac.uk/research-groups/odom-group" style="font-family: Georgia, "Times New Roman", serif;">Duncan Odom's</a><span style="font-family: "georgia" , "times new roman" , serif;"> group,</span><span style="font-family: "georgia" , "times new roman" , serif;"> demonstrated that sequence differences in regulatory regions are the dominant force in governing when and where genes are expressed. Mike designed an incredibly elegant experiment using a Mouse model of Down's syndrome, the TC1 mouse carries an extra copy of chromosome 21, but it is a Human copy. That Human chromosome is in a mouse nuclear environment and this allowed the authors to show that the Mouse transcription factors bound to Human DNA in a Human specific context i.e. the DNA sequence was the dominant force driving gene expression. Mike and Duncan were instrumental in the development of NGS at the Institute. Mike was great to work with, and hosted probably the best "crash pad" in Cambridge; and Duncan has kept up an amazing pace of research over the whole of the last ten years.</span></div>
<span style="font-family: "timesnewromanps"; font-size: 13.3333px;"><div style="text-align: justify;">
</div>
</span><div class="page" title="Page 1">
<div class="section">
<div class="layoutArea">
<div class="column">
</div>
</div>
</div>
</div>
</li>
<li><div style="text-align: justify;">
<b style="font-family: Georgia, "Times New Roman", serif;">Molecular subtyping of Breast cancer:</b><span style="font-family: "georgia" , "times new roman" , serif;"> The </span><a href="http://www.ncbi.nlm.nih.gov/pubmed/22522925" style="font-family: Georgia, "Times New Roman", serif;">METABRIC project</a><span style="font-family: "georgia" , "times new roman" , serif;"> was a major reason I took the job at CRUK. It was the largest array project I ever worked on and had a huge impact on our understanding of Breast cancer, revealing novel subtypes of breast cancer with distinct clinical outcomes and subtype- specific driver genes. It was truly a landmark study. The Genomics core processed all of the UK-based samples extracting DNA and RNA, quality controlling and normalising them for analysis. I managed the Affymetrix genotyping on SNP6.0 arrays, carried out as a service by <a href="http://arosab.com/">Aros in Denmark</a>. And my lab processed all of the 2500 Illumina HT12 arrays used in the study in just 6-8 weeks. </span><a href="http://med.stanford.edu/curtislab.html" style="font-family: Georgia, "Times New Roman", serif;">Christina Curtis</a><span style="font-family: "georgia" , "times new roman" , serif;"> now runs her own lab at Stanford. And the </span><a href="http://www.cruk.cam.ac.uk/research-groups/caldas-group" style="font-family: Georgia, "Times New Roman", serif;">Caldas group</a><span style="font-family: "georgia" , "times new roman" , serif;"> continues to lead on Breast cancer genomics, most recently we've been working with them most recently on a PDX project where we introduced low-coverage WGS of pre-capture exome libraries to significantly improve CNV calling.</span></div>
<span style="font-family: "timesnewromanps"; font-size: 13.3333px;"><div style="text-align: justify;">
</div>
</span></li>
<li style="text-align: justify;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>Liquid biopsy:</b> probably the biggest advance I've been involved with, NGS analysis of ctDNA as a liquid biopsy, is changing the way we do cancer medicine. Tim Forshew in Nitzan Roselfeld's group was the first person to use NGS to non-invasively identify mutations by sequencing the DNA from a patients tumour circulating in their blood. In a hugely impactful <a href="http://www.ncbi.nlm.nih.gov/pubmed/22649089">Science Translational Medicine paper</a> Tim and colleagues showed that this could be used to detect and quantify mutations seen in the tumour, that <i>de novo</i> mutations could be identified, and that a liquid biopsy could be used to monitor tumour progression in patients. <a href="https://www.tgen.org/home/research/research-faculty/muhammed-murtaza.aspx">Mohammad Murtaza</a> (now Assistant Prof at TGEN) pushed the technology even further by showing that it was possible to perform whole exome analysis of ctDNA, and that this could be used to monitor tumour evolution. This was a groundbreaking study <a href="http://www.ncbi.nlm.nih.gov/pubmed/23563269">published in Nature</a>, but when I presented it at AGBT the following year the audience was still highly skeptical of how widely ctDNA might be used - that has changed and now there are dozens of companies pursuing liquid biopsy including Nitzan and Tims <a href="http://inivata.com/">Inivata</a>.</span></li>
</ol>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">I've worked with some amazing people over the last decade many of whom have gone on to start their own labs. My team has been great; people have come and gone, marriages have happened and babies have been born. The CRUK Cambridge Institute continues to be an excellent place to work, and is still a world leader in Genomics, and I've played my part in helping that to happen. Here's to the next ten years.</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com2tag:blogger.com,1999:blog-6334453475526523597.post-55879362783316094622016-08-25T15:00:00.002+01:002016-09-05T13:41:21.477+01:00Optalysys eco-friendly genomics analysis<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="p1">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The amount of power used in a genome analysis is not something I'd ever thought of until I heard about <a href="http://optalysys.com/">Optalysys</a>, a company developing optical computing that has the potential to be 90% more energy-efficient and 20X faster than than standard (electronic) compute infrastructure. Read on if you are interested in finding out more, and watch the video below - featuring <a href="https://en.wikipedia.org/wiki/Heinz_Wolff">Prof Heinz Wolff</a>!</span></div>
<br />
<div style="text-align: center;">
<iframe allowfullscreen="" frameborder="0" height="200" src="https://www.youtube.com/embed/T2yQ9xFshuc" width="400"></iframe><br /></div>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Optalysys was originally spun out from the University of Cambridge and the technology needs a lot more explanation that I'll give: briefly they split laser light across liquid crystal grids where each "pixel" can be modulated to encode analogue numerical data in the laser beam, this diffracts forming an interference pattern and a mathematical calculation is performed - all at the speed of light. The beam can be split across many liquid crystals to increase the multiplicity and complexity of mathematical operations performed.</span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Optalysys and the <a href="http://www.earlham.ac.uk/">Earlham Institute</a> in Norwich are collaborating on a project to build hardware/software that will be used for <a href="http://optalysys.com/projects/genesys/">metagenomic analysis</a>. This is a long way from comparing 500 matched tumour and normal genomes in an ICGC project; but if Optalysys can build systems to handle this scale then the huge compute processing tasks might be carried out at a fraction of the current costs and whilst running from a standard mains power supply.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">PS: </span><span style="font-family: "georgia" , "times new roman" , serif;">do you remember the Great Egg race as fondly as I do?</span></div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com1tag:blogger.com,1999:blog-6334453475526523597.post-72278133869643750622016-08-24T15:28:00.001+01:002016-09-05T13:41:21.480+01:00Upcoming Genomics conferences in the UK<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">It is almost time for the kick off at <a href="http://genomescience.org.uk/agenda">Genome Science</a>, probably the best organised academic conference in the UK. It runs from August 30th to September 1st next week and sadly I can't be there (just returned from holidays and too much going on). You can hear from a wide range of speakers in a <a href="http://genomescience.org.uk/conference-agenda">jam packed agenda</a>. This year it is hosted by the University of Liverpool, and the evening entertainment comes from Beatles Tribute Band <a href="http://www.thecheatles.com/">“The Cheatles”</a>!</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">What other conferences are available for Genomics in the UK, and which one should you attend if you too can't make it over to Liverpool? </span><span style="font-family: Georgia, "Times New Roman", serif;">The Wellcome Trust Genome Campus is holding their first </span><a href="https://registration.hinxton.wellcome.ac.uk/events/item.aspx?e=596" style="font-family: Georgia, "Times New Roman", serif;">Single Cell Genomics</a><span style="font-family: Georgia, "Times New Roman", serif;"> conference </span><span style="font-family: Georgia, "Times New Roman", serif;">from September 9th (sold-out I'm afraid). </span><span style="font-family: Georgia, Times New Roman, serif;">Personally I thought that the <a href="http://www.festivalofgenomicslondon.com/">London Festival of Genomics</a> was excellent and I've high hopes for the January 2017 meeting. </span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Often it is word of mouth that brings a conference to my attention, but there are a couple of resources out there to help.</span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://allseq.com/conferences">AllSeq</a> maintain a list of conferences.</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><a href="https://www.genomeweb.com/resources/conferences-events">GenomeWeb</a> has a similar list, but it seems less focused than AllSeq.</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://nextgenseek.com/ngs-conferences">NextGenSeek</a> has a list for 2016, but nothing on the cards for 2017 yet.</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://www.nature.com/natureevents/science/events">Nature</a> has an events page (searchable) that lists 50 upcoming NGS conferences.</span></li>
</ul>
</div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b>PS: </b>please do let me know if you've particular recommendations on conferences to attend. And do get in touch with the groups above to list your conference on their sites.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif;"><div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b>PPS: </b>If you can justify it then the </span><span style="font-family: Georgia, Times New Roman, serif;">HVP/HUGO Variant Detection Training Course - "<a href="http://vep.variome.org/">Variant Effect Prediction</a>" running from 31st October 2016 is in Heraklion, Crete - a beautiful place to learn!</span></div>
</span></div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com3tag:blogger.com,1999:blog-6334453475526523597.post-21462776980201180162016-07-28T16:37:00.004+01:002016-09-09T11:58:13.827+01:0010X Genomics single-cell 3'mRNA-seq explained<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://www.10xgenomics.com/">10X Genomics</a> have been very successful in developing their gel-bead droplet technology for phased <a href="http://www.10xgenomics.com/applications/">genome sequencing and more recently, single-cell 3'mRNA-seq</a>. I've posted about their technology before (at <a href="http://core-genomics.blogspot.co.uk/2016/02/agbt16-10x-genomics-workshop.html">AGBT2016</a>, and <a href="http://core-genomics.blogspot.co.uk/2015/03/10x-genomics-whats-fuss-over-phasing.html">March</a> and <a href="http://core-genomics.blogspot.co.uk/2015/11/ten-x-updatedrare-disease-anlaysis-made.html">November 2015</a>) and based most of what I've written on discussion with 10X or from presentations by early access users. Now 10X have a paper up on the BioRxiv: <a href="http://www.biorxiv.org/content/early/2016/07/26/065912"><b>Massively parallel digital transcriptional profiling of single cells</b></a>. This describes their approach to single-cell 3'mRNA-seq in some detail and describes how you might use their technology in trying to better understand biology and complex tissues.</span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfbLDkXZ5iHdliU4zFqdkPgUl0hKWUq8nsrsXq_KUVsWICBsJPuDoN1bavIkojl_s_b6VjO0ehMDqf4v_sUl-hKCT6WeV10aiVYGIPY_Uan8L2jxcmccuHkiJ2UdpOGdGNEaTJLiBZz1xy/s1600/Screen+Shot+2016-07-28+at+15.16.35.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="151" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfbLDkXZ5iHdliU4zFqdkPgUl0hKWUq8nsrsXq_KUVsWICBsJPuDoN1bavIkojl_s_b6VjO0ehMDqf4v_sUl-hKCT6WeV10aiVYGIPY_Uan8L2jxcmccuHkiJ2UdpOGdGNEaTJLiBZz1xy/s400/Screen+Shot+2016-07-28+at+15.16.35.png" width="400" /></a></div>
<div class="p1">
<span class="s1"></span></div>
<a name='more'></a><span style="font-family: Georgia, Times New Roman, serif;"><div style="text-align: justify;">
<b><br /></b></div>
<div style="text-align: justify;">
<b>Technical performance of the GEMcode system:</b> The paper is unfortunately based on the earlier GEMcode system rather than the latest <a href="http://www.10xgenomics.com/instrument/">Chromium</a>, but the results are likely, though not definitely, going to be representative of what Chromium can deliver.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Technical performance was assessed using 1200 Human 293T or Mouse 3T3 cells, with 100,000 reads per cell. 71% of reads aligned to Human or Mouse genomes (38% and 33% respectively). Analysis of the UMIs allowed the authors to estimate a total number of cell-containing GEMs to be just over 1000 (482 and 538 Human or Mouse respectively). Only 8 GEMs appeared to have Human and Mouse cells co-located, as assessed by GEM barcoded reads aligning to both genomes. It is not easy <span style="color: #999999;">(is it possible)</span> to detect Human:Human or Mouse:Mouse cell doublets so the inferred doublet rate for this experiment was 1.6% (see figure 2a in the paper with multiplet GEMs as grey dots).</div>
</span><br />
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMcAdSDuhAZXJwCoB4ZooLubcHs-wN6srqZuqr33Cx01ZwQrTud6j3_wQ0H3C5tEVlFnHax_LPPyMybR1nXz0HNN-kM-SVNHrDhwt8M-VpQpGZteRyU05L5KNtx-kl1qV71hFRBTT1XY_d/s1600/Screen+Shot+2016-07-28+at+15.22.38.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMcAdSDuhAZXJwCoB4ZooLubcHs-wN6srqZuqr33Cx01ZwQrTud6j3_wQ0H3C5tEVlFnHax_LPPyMybR1nXz0HNN-kM-SVNHrDhwt8M-VpQpGZteRyU05L5KNtx-kl1qV71hFRBTT1XY_d/s400/Screen+Shot+2016-07-28+at+15.22.38.png" width="400" /></a></div>
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">The 1.6% multiplet (doublet, triplet, or higher) rate appears low, but as cell numbers increase so does the </span><span style="font-family: Georgia, "Times New Roman", serif;">multiplet </span><span style="font-family: Georgia, Times New Roman, serif;">rate, the authors describe a linear relationship of multiplet rate to cell loading from 1000-10000 cells (Supplementary Fig. 1a), however it is not clear how this rate changes at 20k, 30k, 40, or 50k (the maximum loading recommended). What the impact is on experiments I do not know - but this is an area several labs are focusing on. The multiplet rate <i>"approximately followed a Poisson distribution"</i> as assessed by imaging experiments (Supplementary Fig. 1b). In these a Nikon microscope equipped with a high-speed camera capable of capturing 4000 frames per second imaged GEMs as they were created. 28,000 frames were analysed for single-cell encapsulation (7 seconds of video, which only represents about 1.5% of the time your Chromium is actually making GEMs) but the multiplet rate was 16% higher than expected - I don't think the authors delve deeply enough into the reasons for this. Multiplets are likely to add significant noise to analysis of single-cell experiments, every single-cell technology has to account for them and cells like to stick together so user probably can't rely on actually having a single cell suspension in the first place.</span></div>
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><span class="s1"></span><br /></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;">To further investigate this the authors also carried out mixing experiments with Human 293T (female & expressing XIST) and Jurkat cells (male & expressing CD3D). Figure 2e (see above) in the paper shows the PCA for these mixes at 100% 293T, 100% Jurkat, 50:50 or 10:90. The 50:50 mix shows a lot of cells in the space between the cell clusters, I\d suggest this indicates higher multiplet rates in this experiment than the 1.6% suggested? But I could not see the cell loading density used, which may explain the higher numbers of apparent muliplets.</span></span><span style="font-family: Georgia, "Times New Roman", serif;"> </span></div>
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><span class="s1"></span><br /></span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><b>Cell capture efficiency: </b>The rate of cell capture is important especially where rare cell populations are being studied. 10X captures about 50% of the cells loaded into GEMS (Supplementary Tables 1&3), and whilst this could be increased it would be to the detriment of an increased cell doublet/triplet rate. This might be a parameter users are willing to tweak depending on their needs and it would be interesting to ask how many users would accept higher doublets in return for 80-90% cell capture rates? What we really need in a single-cell system is the ability to image cells in droplets so we can exclude empty drops, doublets and triplets; I'd be interested to know if anyone is working on something like this?</span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;">The level of cross-talk between cell barcodes was about 1% (see Online Methods) but it is not clear in the manuscript where this cross-talk comes from. If it is error in reading the cell barcodes then this could be reduced by sequencing longer, more error-tolerant barcodes, and a longer barcode read (if >25bp) would allow a proper error estimation of the index read. But if this is coming from molecular cross-over during the downstream library prep (which is going to happen to some degree) then fixing it will be much more difficult (see these papers to learn more about PCR chimeras and their affect on NGS: <a href="http://www.ncbi.nlm.nih.gov/pubmed/22021376">NAR 2012</a>, <a href="http://www.ncbi.nlm.nih.gov/pubmed/2186361">NAR 1990</a>, <a href="http://www.ncbi.nlm.nih.gov/pubmed/2307682">JBioChem 1990</a>, <a href="http://www.ncbi.nlm.nih.gov/pubmed/7596836">NAR 1995</a>).</span></span></div>
<div>
<br /></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;">83% of UMIs were associated with cell barcodes suggesting that cell-free RNA does not significantly affect the results - this is an issue scSeq users will have to consider carefully as the amount of cell-free RNA or DNA in a sample is likely to be highly variable, and it may be that experiments with artificially high levels might show us the failure mode in these sample types.</span></span></div>
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><span class="s1"></span><br /></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"><b>Transcript counting: </b>With 100,000 reads per cell the authors report a median detection rate of 4500 genes or 27,000 transcripts with little bias for GC content or gene length. However as a 3' assay I'd not expect a huge variation here, and this is something that would become much more important as 10X, and others, move to whole transcript assays. </span></span><span style="font-family: Georgia, "Times New Roman", serif;">Clustering analysis was performed Seurat (<a href="http://www.ncbi.nlm.nih.gov/pubmed/25867923">Satija et al., 2015</a>).</span></div>
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><span class="s1"></span><br /></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"><b>SNV detection from scRNA-seq data:</b> while deciphering population structure and discovering rare cells is great many people will want to look for SNP/SNVs in their scRNA-seq data. The authors reported an analysis of a curated set of high quality SNVs only observed in only 293T or Jurkat cells, but not both (see Online Methods). They showed that they could detect SNVs reliably, and that multiplet rates predicted from SNV were highly correlated with those from gene expression analysis. The paper is confusing in suggesting that each cDNA generates 250bp of sequence for SNV detection, but the sequencing run generates only 98bp in read 1 from the cDNA (I'd like to understand this better or see this corrected in the final version if it is a typo).</span></span></div>
<div class="p2" style="text-align: justify;">
<br /></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"><b>scRNA-seq from frozen cells: </b>In the discussion the authors make a strong statement about the ability to analyse frozen cells: <i>"the ability of GemCode to generate faithful scRNA-seq profiles from cryopreserved samples enables its application to clinical samples"</i>. The frozen cells in questions were fresh cells recovered from whole blood, cryopreserved and <i>"gently thawed"</i> one week later (see Online Methods). Only a small number of genes (57) showed greater than 2-fold upregulation (no down regulated genes were reported), suggesting that freezing cells is possible. However I suspect that the minimal freezing time and "gentle" protocols will put many users off relying on cell storage until more comprehensive evaluation is undertaken. The fact that they got such good results is encouraging, we're working on a project with patient material that needs to be processed immediately for best results. Right now we're brining cells over from the hospital about one hour after collection and processing straigh-away, but this is not an efficient use of the technology when the plastic chip holds 8 samples and costs $150 each time.</span></span></div>
<div class="p2" style="text-align: justify;">
<br /></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"><b>A few words about sequencing 10X scRNA-seq libraries: </b>In the paper the authors say that after GEMcode prep <i>"libraries then undergo standard Illumina short-read sequencing"</i> - there is nothing standard about the run type you need to do for 10X. It is a 98.14.8.10 format run - 98bp 1st read (mRNA), 14bp Index 1 (UMI), 8bp Index 2 (sample index), 10bp 2nd read (Cell index) - I hope I got that right!</span></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;">10X sequencing does not fit easily into a core lab running HiSeq instruments due to the run configuration (we need 8 lanes of the same sample type). I suspect this is going to get much easier as we do more and more 10X sequencing, but for now we're either running longer reads than necessary, or using NextSeq/2500 RapidRuns. Chromium genomes can now be run on X Ten as PE150 with no modification. Hopefully single-cell RNA-seq will move to a more standard single-end run for differential gene expression, this would make life easier for my team, and reduce costs by around 40%.</span></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"><span style="font-family: Georgia, Times New Roman, serif;"></span></span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b>Summary: </b>All on all this paper explains many of the things potential users of 10X single-cell are looking to understand. I'm expecting papers to be coming thick and fast over the next six months now people have the instruments in their hands.</span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">It is going to be interesting to see how 10X develop their chemistry, particularly for whole transcriptome single-cell, for copy-number and for applications like <a href="http://www.ncbi.nlm.nih.gov/pubmed/25915121">G&T-seq</a> or <a href="http://www.ncbi.nlm.nih.gov/pubmed/26752769">scM&T-seq</a>, or even <a href="http://www.ncbi.nlm.nih.gov/pubmed/24097267">ATAC-seq</a>.</span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">How will RainDance fight back with their own single cell methods? And how does this 3'mRNA-seq assay compare to Fluidigm's C1? Both of these are questions I look forward to seeing answered. Ultimately the more technologies we have for single-cell the better, there are likely to be strengths and weaknesses in each. But I'd not be surprised if the one with the most open chemistry becomes dominant - this was part of Illumina/Solexa's success as it meant users could develop methods from a core technology.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="p2" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><span class="s1"><b>PS:</b> </span>Supplementary Figure legends are available on BioRxiv, but not the figures - go figure! Online methods are also missing. Probably because the BioRxiv does not check if these have been submitted.</span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com3tag:blogger.com,1999:blog-6334453475526523597.post-52365838752002022482016-07-25T17:36:00.000+01:002016-09-05T13:40:45.463+01:00RNA-seq advice from Illumina<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="color: #999999;"><span style="font-family: "georgia" , "times new roman" , serif;">This article was </span><span style="font-family: "georgia" , "times new roman" , serif;">commissioned by</span><span style="font-family: "georgia" , "times new roman" , serif;"> Illumina Inc.</span></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">The most common NGS method we discuss in our weekly experimental design meeting is RNA-seq. Nearly all projects will use it at some point to delve deeply into hypothesis driven questions, or simply as a tool to go fishing for new biological insights. It is amazing how far a project can progress in just 30 minutes of discussion, methodology, replication, controls, analysis, and all sorts of bias get covered as we try to come up with an optimal design. However many users don't have the luxury of in-house Bioinformatics and/or Genomics core facilities so they have to work out the right <span style="font-family: "georgia" , "times new roman" , serif;">s</span>ort of experiment to do for themselves. Fortunately people have been hard at work creating resources that can really help and most recently Illumina released an <a href="http://www.illumina.com/content/dam/illumina-marketing/documents/products/other/rna-sequencing-workflow-buyers-guide-476-2015-003.pdf">RNA-seq "Buyer’s Guide"</a> with lots of helpful information....including how to keep costs down.</span></div>
<div style="text-align: justify;">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://directeffectmedia.iljmp.com/262/uvjbk"><img border="0" height="137" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAtVwKnVbUmx1fZiNr6zigFpLuiUlMrSsnvWDCPbOLppfu-1gy8H_4N99Y-iquP2Ec1x61RbKLiX5cIODLn1AbxoVL70QFI3nmJI2P1dU1yQpS84szgbuI1wmFfLpz0sgQTO_s3JrPsdkv/s400/Screen+Shot+2016-05-01+at+22.53.13.png" width="400" /></a></div>
</div>
<div style="text-align: justify;">
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><b></b></span><br />
<a name='more'></a><span style="font-family: "georgia" , "times new roman" , serif;"><b>Illumina's <a href="https://directeffectmedia.iljmp.com/262/uvjbk">"Buyer’s Guide"</a></b>: the guide offers advice on common RNA-Sequencing methods and should help new users in evaluating the many options available for next-generation sequencing of RNA. Anyone considering a differential gene expression analysis experiment should have RNA-seq as their plat<span style="font-family: "georgia" , "times new roman" , serif;">fo</span>rm of choice<span style="font-family: "georgia" , "times new roman" , serif;"> and t</span>he g<span style="font-family: "georgia" , "times new roman" , serif;">uide <span style="font-family: "georgia" , "times new roman" , serif;">presents</span> three</span> simple steps for users to consider different aspects of their experime<span style="font-family: "georgia" , "times new roman" , serif;">nts<span style="font-family: "georgia" , "times new roman" , serif;">.</span></span></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">1) First of all make sure you understand what your scientific question is! This sounds simple but all too often people want to get too much out of one experiment and end up getting in a bit of a mess. Better to answer one question well, than two questions badly. Once you've thought about this it should be clear whether you want analyse mRNA's for a simple differential gene expression experiment, or are after something else e.g. splicing, and also if you'll need to look at more than just poly-adenylated mRNAs. And if possible try to determine ahead of time whether the genes you're interested in studying are highly expressed or very rare.<br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">2) Once you've thought about this you can consider what sort of samples you have, are they low quality and/or low quantity? You should also consider who's going to do the work in the lab and who's going to analyse the sequence data?</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<span style="font-family: "georgia" , "times new roman" , serif;">3) Now you can really think about the final experimental design, what type f library preparation kit to use, replicate numbers, proper controls, depth of sequencing, etc. Illumina's RNA-seq buyers guide describes some of the things you'll need to consider in choosing the read-depth and run-type, and also </span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">includ<span style="font-family: "georgia" , "times new roman" , serif;">e</span> some tips for keeping the costs <span style="font-family: "georgia" , "times new roman" , serif;">of</span> your experiment do<span style="font-family: "georgia" , "times new roman" , serif;">wn</span></span></span></span>.</span> </div>
<div class="separator" style="clear: both; text-align: center;">
<b><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9KP16_wwqWAprvcclMqgrFv94P9vHMHLocFcMFu5m8qaQbYdD5_uEoHDmY6ktlQ5LDavLJ0Vdln2nL3soEPnhTHERmhuflIgJ-RZL22PUGeaDn7HziRkHfBW-etTgNw04ZbIJEV-Jc6YV/s1600/Screen+Shot+2016-05-01+at+13.42.31.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9KP16_wwqWAprvcclMqgrFv94P9vHMHLocFcMFu5m8qaQbYdD5_uEoHDmY6ktlQ5LDavLJ0Vdln2nL3soEPnhTHERmhuflIgJ-RZL22PUGeaDn7HziRkHfBW-etTgNw04ZbIJEV-Jc6YV/s320/Screen+Shot+2016-05-01+at+13.42.31.png" width="233" /></a></b></div>
<br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>What do <span style="font-family: "georgia" , "times new roman" , serif;">people</span> mean when <span style="font-family: "georgia" , "times new roman" , serif;">they</span> say "RNA-seq": </b></span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">When people say "RNA-seq" most of them are talking about <i>differential gene expression (DGE) by sequence analysis of reverse transcribed poly-adenylated mRNAs</i>, but by changing the depth sequencing <span style="font-family: "georgia" , "times new roman" , serif;">or type of sequencing</span>, and/or choosing a different library prep kit you can <span style="font-family: "georgia" , "times new roman" , serif;">investigate so much more. </span></span>The guide includes three different scenarios for RNA-seq experiments including basic differential gene expression; DGE and allele-specic expression plus isoforms, SNVs and fusions; and finally whole transcriptome analysis.<span style="font-family: "georgia" , "times new roman" , serif;"> </span>These show the breadth of experim<span style="font-family: "georgia" , "times new roman" , serif;">ents you can consider <span style="font-family: "georgia" , "times new roman" , serif;">once you've mastered this method</span><span style="font-family: "georgia" , "times new roman" , serif;">.</span></span></span><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">The first <span style="font-family: "georgia" , "times new roman" , serif;">two scenarios </span></span></span></span><span style="font-family: "georgia" , "times new roman" , serif;">showcase the power of RNA-seq</span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> and demonstrate how using a<span style="font-family: "georgia" , "times new roman" , serif;"> single library prep method, but varying the sequencing allows very different questions to be asked of your samples. </span></span></span></span></span></span><span style="font-family: "georgia" , "times new roman" , serif;">The guide recommends Illumina's <a href="http://www.illumina.com/products/truseq_stranded_mrna_library_prep_kit.html">TruSeq Stranded mRNA-seq kits</a> (these are the ones we use most in my lab and we have done so ever since beta-testing the original RNA-seq kit many years ago). Scenario #1 is a simple DGE experiment and Illumina recommends you generate ≥ 10 million reads per sample, using single-end 50bp reads (SE50). Scenario #2 allows a full mRNA analysis by simply changing read depth to ≥ 25 million reads per sample, and using paired-end 75 bp reads (PE75).</span><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">If <span style="font-family: "georgia" , "times new roman" , serif;">you are interested in more than poly-adenylated mRNA's then </span></span></span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">changing the RNA-seq library prep kit to Illumina's <a href="http://www.illumina.com/products/truseq_stranded_total_rna_library_prep_kit.html">TruSeq Stranded Total RNA</a> <span style="font-family: "georgia" , "times new roman" , serif;">gets rid of</span> ribosomal RNA's, letting you anaylse both coding and non-coding RNA. <span style="font-family: "georgia" , "times new roman" , serif;">Much greater</span> read depth is <span style="font-family: "georgia" , "times new roman" , serif;">needed</span> <span style="font-family: "georgia" , "times new roman" , serif;">and</span> Illumina recommend ≥ 50 million PE75 reads per sample. Completing the RNA-seq line-up is the TruSeq small RNA kits which allow you to analyse microRNAs and other smaller transcr<span style="font-family: "georgia" , "times new roman" , serif;">i</span>pt<span style="font-family: "georgia" , "times new roman" , serif;">s, usually this requires only </span></span></span></span><span style="font-family: "georgia" , "times new roman" , serif;">≥ 1<span style="font-family: "georgia" , "times new roman" , serif;">-2</span> million SE50 reads per sample<span style="font-family: "georgia" , "times new roman" , serif;">.</span></span><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>How do Illumina's recommendations st<span style="font-family: "georgia" , "times new roman" , serif;">ack-up:</span></b><span style="font-family: "georgia" , "times new roman" , serif;"> <span style="font-family: "georgia" , "times new roman" , serif;">The guide is pre<span style="font-family: "georgia" , "times new roman" , serif;">t</span>ty <span style="font-family: "georgia" , "times new roman" , serif;">good in the</span></span></span><span style="font-family: "georgia" , "times new roman" , serif;"> suggestions it makes for common RNA<span style="font-family: "georgia" , "times new roman" , serif;">-<span style="font-family: "georgia" , "times new roman" , serif;">seq methods. I'd aim a bit higher for DGE and sugges<span style="font-family: "georgia" , "times new roman" , serif;">t </span></span></span></span></span></span><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">≥ <span style="font-family: "georgia" , "times new roman" , serif;">2</span>0 million reads per sample to allow profiling of hi<span style="font-family: "georgia" , "times new roman" , serif;">gh, medium and low<span style="font-family: "georgia" , "times new roman" , serif;">ly expressed genes. I'm really not keen on the suggestion that MiSeq or NextSeq mid-output <span style="font-family: "georgia" , "times new roman" , serif;">are</span> good tools for RNA-seq as <span style="font-family: "georgia" , "times new roman" , serif;">from my experience</span> most experiments, with sufficient replication, will be too large to fit into a sing<span style="font-family: "georgia" , "times new roman" , serif;">le</span> sequencing run. <span style="font-family: "georgia" , "times new roman" , serif;">I'd argue that t</span>he cheapest way to get your RNA-seq data is going <span style="font-family: "georgia" , "times new roman" , serif;">to</span> be on HiSeq 4000<span style="font-family: "georgia" , "times new roman" , serif;">, </span>until of course we can run RNA-seq on X Ten. Of course not everyone should buy a HiSeq and a MiniSeq, MiSeq or NextSeq may be a good fit for your own laboratory; but I'd encourage you to consider the benefits of using your local core lab first though, especially if you are planning on doing experiments bigger than 12-24 samples. I'm not sure I'd argue quote as strongly for p<span style="font-family: "georgia" , "times new roman" , serif;">aired-end data and would prefer splicing, ASE, fusion detection</span> to be coming from higher depth sequencing instead (50M SE50 reads cost about <span style="font-family: "georgia" , "times new roman" , serif;">the</span> same as 25M paired-75<span style="font-family: "georgia" , "times new roman" , serif;">bp reads).</span></span></span></span></span></span></span></span></span></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></span></span></span></span></span></span></span></span></span> <span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>Why does my lab focus on mRNA-seq DGE: </b>My own choices for RNA-seq are primarily informed by the questions people say that want to answer in experimental design discussions - and nearly all of these are differential gene expression questions. As such my lab runs lots and lots of Illumina's stranded mRNA-seq kits. We only run some form of ribosomal reduction when the experiment warrants it as these methods generally require deeper sequencing for the same differential gene expression analysis power. We've very few users who need to run FFPE RNA so although we tested the RNA Access kit, we've yet to really use it in a significant project. This is partly because the research groups coming ot my lab understand the limitations of FFPE samples, and work hard to procure fresh frozen material wherever possible.</span></span></span></span></span></span></span></span></span></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></span></span></span></span></span></span></span></span></span> <span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>A brief bit about informatics:</b> This article is focussed on the wetlab but without a good analysis pipeline you'll be stuck with some big but unusable Fastq files. The analysis requirements are heavily influenced by the biological questions being asked, by the samples available, and by the library preparation and sequencing performed. I'd always recommend the user to make sure they know what analysis is likely to be performed before generating data.</span></span></span></span></span></span></span></span></span></span><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">Ma<span style="font-family: "georgia" , "times new roman" , serif;">ny others have weighed in on how to use and design RNA-seq e<span style="font-family: "georgia" , "times new roman" , serif;">xperiments (see the list of my favourite references at the bottom of this po<span style="font-family: "georgia" , "times new roman" , serif;">st)</span>. </span></span>Nearly everyone agrees that replication is key with most people suggesting 4-6 <span style="font-family: "georgia" , "times new roman" , serif;">biological replicates. <span style="font-family: "georgia" , "times new roman" , serif;">Most papers agree on read-depth being kept to under 20M reads per sampl<span style="font-family: "georgia" , "times new roman" , serif;">e.</span></span> <span style="font-family: "georgia" , "times new roman" , serif;">The</span></span> <a href="https://genome.ucsc.edu/ENCODE/protocols/dataStandards/ENCODE_RNAseq_Standards_V1.0.pdf">ENCODE<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"> RNA-seq</span></span> guidelines</a><span style="font-family: "georgia" , "times new roman" , serif;"> are very different</span> recommending just two biological replicate and 30M paired-end reads per sample - I've never agreed with this, even when it was published in 2011, and have steered people to other resources. The Blogosphere also offers lots <span style="font-family: "georgia" , "times new roman" , serif;">of</span> help<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">; a</span></span> 2013 post by <a href="http://gkno2.tumblr.com/post/24629975632/thinking-about-rna-seq-experimental-design-for">GKNO</a> (Marth lab, U. Utah), and the <a href="http://rnaseq.uoregon.edu/">RNA-seqlopedia</a> (U. Oregon) are <span style="font-family: "georgia" , "times new roman" , serif;">two</span> great reads for people who want to know more.</span></span></span></span></span></span></span></span></span><br />
<i style="color: #999999; font-family: Georgia, "Times New Roman", serif;"><br /></i>
<i style="color: #999999; font-family: Georgia, "Times New Roman", serif;">All Illumina products listed are for research use only. Not for use in diagnostic procedures (except as specifically noted).</i><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>Further reading:</b> </span></span></span></span></span></span></span></span></span><br />
<ul style="text-align: left;">
<li style="font-family: Georgia, "Times New Roman", serif; font-size: small; text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/27022035">How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?</a> <b>RNA. 2016. </b>This paper really pushes to answering the question most people want to understand. They present a very highly replicated study and show that as many as 20 biological replicates were required to detect 85% of DGE accurately. They recommend using 6 biological replicates in RNA-seq experiments as a minimum, and edgeR or DESeq2 as the best tools. They used single-end sequencing and generated 0.8-2.6 million reads per technical replicate - equivalent to about 10M per biological sample.</li>
<li style="font-family: Georgia, "Times New Roman", serif; font-size: small; text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/27008024">Experimental Design and Power Calculation for RNA-seq Experiments.</a> <b>Methods Mol Biol. 2016.</b> This book chapter reviews the major factors that influence the statistical power of detecting DGE.</li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/26220961" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Designing alternative splicing RNA-seq studies. Beyond generic guidelines.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>Bioinformatics. 2015. </b>This paper describes how sequencing depth and length, library preparation and the level of replication affect the cost-effectiveness of single-sample and group comparison studies. They present data showing how short reads outperformed long reads for most analyses.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/25246651" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Power analysis and sample size estimation for RNA-Seq differential expression.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>RNA. 2014. In t</b>his paper the authors compare and evaluate five differential expression analysis packages - DESeq, </span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">edgeR, DESeq2, sSeq, and EBSeq. They show that increasing sample size is preferable to </span><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">increasing sequencing depth past 20 million reads.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/24319002" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">RNA-seq differential expression studies: more sequence or more replication?</a> <b style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Bioinformatics. 2014. </b><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">This paper describes the explicit trade-off between numbers of biological replicates and depth of sequencing in increasing the power to detect DGE. They suggested that greater than 10M reads was unnecessary and that more replicates should be the strategy of choice to increase power and accuracy inRNA-seq studies.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/24056876" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Accounting for technical noise in single-cell RNA-seq experiments.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>Nat Methods. 2013. </b>This paper presents a quantitative statistical method to distinguish biological variability from technical noise in single-cell RNA-seq.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/23314327" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>Bioinformatics. 2013.</b> This paper presents a web-based tool, </span><a href="http://euler.bc.edu/marthlab/scotty/scotty.php" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Scotty</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;">, to assists in the design of RNA-seq experiments with appropriate sample size and read depth.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/22539670" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">RNA-SeQC: RNA-seq metrics for quality control and process optimisation.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>Bioinformatics. 2012. </b>Authors from the Broad Institute present the </span><a href="http://www.broadinstitute.org/rna-seqc" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">RNA-SeQC</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> tool for quality control of data before DGE analysis. They provide metrics including yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3'/5' bias and count of detectable transcripts.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/21498551" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Design and validation issues in RNA-seq experiments.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>Brief Bioinform. 2011.</b> This paper reviews the experimental design issues pertinent to RNA-seq.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/21645359" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">RNA-seq: technical variability and sampling.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>BMC Genomics. 2011.</b> This paper analysed technical bias in 3 replicated RNA-seq experiments and showed that low coverage (less than 5 reads per base) leads to a significant increase in technical noise, and that understanding sampling bias is an issue that needs to be considered.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/20220756" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">Transcriptome genetics using second generation sequencing in a Caucasian population.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><b> Nature. 2010.</b> One of the first papers to suggest that a relatively low read-depth for RNA-seq of just 10 million reads "gave the same dynamic range as microarrays, with better quantification of alternate and highly abundant transcripts". However they used paired-end reads in their analysis.</span></li>
<li style="text-align: justify;"><a href="http://www.ncbi.nlm.nih.gov/pubmed/18550803" style="font-family: Georgia, "Times New Roman", serif; font-size: small;">RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.</a><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"> <b>Genome Res. 2008.</b> In this paper the authors estimated the technical variance in RNA-seq and compared it to arrays for detecting differentially expressed genes.</span></li>
</ul>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com10tag:blogger.com,1999:blog-6334453475526523597.post-8741330146305995532016-07-21T21:41:00.001+01:002016-09-05T13:41:21.486+01:00Core Genomics is going cor-porate (sort of)<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">I've just had my five year anniversary of starting the Core Genomics blog! Those five years have whizzed by and NGS technologies have surpassed almost anything I dreamed would have been possible when I started using them in 2007. My blog has also grown beyond anything I dreamed possible and the feedback I've had has been a real motivating factor in keeping up with the writing. It also stimulated my move onto Twitter and I now have multiple accounts: <a href="https://twitter.com/CIgenomics">@CIGenomics</a> (me), <a href="https://twitter.com/crukgenomecore">@CRUKgenomecore</a> (my lab) and <a href="https://twitter.com/rna_seq">@RNA_seq</a>, <a href="https://twitter.com/exome_seq">@Exome_seq</a> (PubMed <a href="http://core-genomics.blogspot.co.uk/2014/10/twitterbots-for-ngs.html">Twitter bots</a>).</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The blog is still running on the Google Blogger site I set up back in 2011 and I feel ready for a change. This will allow me to do a few things I've wanted to do for a while and over the next few months I'll be migrating core-genomics to a new WordPress site: <a href="http://enseqlopedia.com/">Enseqlopedia.com</a>. </span></div>
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2HJ07dYbW7-Y99RydCNu5mWDYjDEdItT6HQEuXnG5eOLPiIHhfqBXAqhf9NOs8PlXHRogFg7sJjOXMeOHrDHDgRBZROnl3UrCkB-iVsQYmCi333lf_ZlmHPUBPSO-AAvS3KeAxrMLhyphenhyphenkv/s1600/Screen+Shot+2016-07-14+at+14.43.03.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "georgia" , "times new roman" , serif;"><img border="0" height="118" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2HJ07dYbW7-Y99RydCNu5mWDYjDEdItT6HQEuXnG5eOLPiIHhfqBXAqhf9NOs8PlXHRogFg7sJjOXMeOHrDHDgRBZROnl3UrCkB-iVsQYmCi333lf_ZlmHPUBPSO-AAvS3KeAxrMLhyphenhyphenkv/s400/Screen+Shot+2016-07-14+at+14.43.03.png" width="400" /></span></a></div>
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<a name='more'></a><br />
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Introducing <a href="http://enseqlopedia.com/">Enseqlopedia</a>: </b>The new home of Core Genomics will be a chance for me to expand on something I've been doing for many years - explaining NGS to users. The same blog content is going to keep flowing, but other stuff will appear alongside, and I hope you'll find it informative and entertaining.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">The Enseqlopedia name was chosen as I'll be adding content describing methods, linking to the best papers that demonstrate these or advance them, and hopefully making the new site a useful resource for the community. It will also be somewhere I can serve up more PubMed Twitterbot output in a single place outside of Twitter. I'd also like to reinvigorate the <a href="http://omicsmaps.com/">sequencer map</a> Nick Loman and I put together many years ago. Some of the reasons for these changes has come about from my dissatisfaction with sites that serve up NGS news, but simply regurgitate press releases from academics or companies in the NGS field; I want to deliver more than this. Hopefully you already agree that my blog posts hit the spot, and I'm hoping the new stuff is of real interest to readers. I aim to make sure you can see that what appears has been carefully chosen and has an opinion behind it.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>Core Genomics corp: </b>The biggest change is going to be the appearance of commissioned or sponsored content i.e. stuff I get paid to post. I've not tried to monetise my blog before, mainly because I don't like unsightly ads all over the place, however I've been asked to write reasonably frequently about new products in the NGS space and until now I've always turned the offers down. I have ghost written other content, but nothing on Core Genomics has been paid for - and all the topics have been chosen by me. The two new types of post will be tagged so you can tell immediately what your reading:</span></div>
<blockquote>
<div style="text-align: justify;">
<i><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;"><b>Commissioned posts</b> will be tagged "commissioned content" and </span><span style="font-family: "georgia" , "times new roman" , serif;">labelled at the top of the post so you know who paid me to write the piece. All commissioned content will be taken on with full editorial control i.e. I decide what ends up in the final piece, and </span></span></i><i><span style="font-family: "georgia" , "times new roman" , serif;"><span style="text-align: left;"><u>I will have written the post</u></span></span></i><i><span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-family: "georgia" , "times new roman" , serif;">.</span></span></i></div>
</blockquote>
<blockquote>
<i><span style="font-family: "georgia" , "times new roman" , serif;"><b>Sponsored posts</b> will be tagged "sponsored content" and <span style="text-align: left;">labelled at the top of the post so you know who wrote the piece. S</span><span style="text-align: left;">ponsored</span><span style="text-align: left;"> content will only be accepted by me if I think readers of Core Genomics would be interested. The content is likely to be written by the sponsor and should be considered as an advert. Although I will decide whether a sponsorship opportunity will get posted I will NOT have full editorial control i.e. I get to decide on what </span><span style="text-align: left;">sponsored content appears on the site, <u>but I will not have written the post</u>.</span></span></i></blockquote>
</div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">My first sponsored piece will be coming soon. Although the topic has been chosen by someone else, the opinions are very much my own. I'm not expecting to write much more than one sponsored post a month (so any NGS companies reading this better get their requests in soon), and I'm not going to write about something I really don't believe in. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif; text-align: justify;">I'll also be making it clearer what kind of consultancy work I'm happy to take on. Mostly this has been technological consulting for investors who want to understand market reactions to new instruments or developments (with Brexit came a rush of consultancy work). But I've also consulted for technology companies, and for research groups.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Thanks for reading Core Genomics - hopefully you'll be reading for another five years!</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">James.</span></div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com3tag:blogger.com,1999:blog-6334453475526523597.post-50333135725260185272016-07-16T12:15:00.002+01:002016-09-05T13:40:45.460+01:00Whole genome amplification improved<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">A new genome amplification technology from <a href="http://www.expedeon.com/trueprime-single-cell-wga-kit-version-2-0.html">Expedeon/Sygnis</a>: <a href="http://www.sygnis.com/trueprime/">TruePrime</a> looks like it might work great for single-cell and low-input anlyses - particularly copy number. TruePrimer is a primer-free multiple displacement amplification technology. It uses the well established <a href="https://en.wikipedia.org/wiki/%CE%A629_DNA_polymerase">phi29 DNA polymerase</a> and a new <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3899013/">TthPrimPol primase</a>, which eliminates the need to use random primers and therefore avoids their inherent amplification bias. The senior author on the TthPrimPol primase paper, Prof <a href="https://www.cnio.es/ing/publicaciones/spanish-scientists-identify-a-new-ancestral-enzyme-that-facilitates-dna-repair">Luis Blanco</a>, is leading the TruePrime research team.</span></div>
<div class="p1" style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiG89gCxI4YLlPr_rFVtXwAW4dDebbOQE1QYcbXvat3hAj_P72nlGK4-yj-PlNFE1JXkS3ONZglYjI-aIqTr9S9RnKWPriEJ9NFwUsQH-cqYl_rFc-RqUvNMxnBkyKE_u_QWNWav18Hi8mz/s1600/Screen+Shot+2016-07-15+at+17.07.26.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="151" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiG89gCxI4YLlPr_rFVtXwAW4dDebbOQE1QYcbXvat3hAj_P72nlGK4-yj-PlNFE1JXkS3ONZglYjI-aIqTr9S9RnKWPriEJ9NFwUsQH-cqYl_rFc-RqUvNMxnBkyKE_u_QWNWav18Hi8mz/s400/Screen+Shot+2016-07-15+at+17.07.26.png" width="400" /></span></a></div>
<div class="p1">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="p1">
<span style="font-family: Georgia, Times New Roman, serif;"></span></div>
<a name='more'></a><div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">I saw a recent poster which had results demonstrating equal amplification and homogenous coverage (see image above), no primer artefacts, and high identification of both SNPs and CNVs. TruPrime gave very similar CNV data to unamplified DNA with very little apparent amplification or coverage bias from low coverage whole genome sequencing (12 million reads). Competitors "R" and "G" did not look so good.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b>What does TthPrimPol do in the cell: </b>TthPrimPol is a DNA and RNA primase with DNA-dependent DNA and RNA polymerase activity. It is a unique human enzyme capable of de novo DNA synthesis solely with dNTPs and is found primarily in the nucleus - TthPrimPol -/- cells show inefficient mtDNA replication, but it is not an essential protein. In the mitochondria TthPrimPol provides the primers for leading-strand mtDNA synthesis in the replication fork. It is an important protein in the mitochondria where the highly oxidative environment leads to replication stress and and genome instability. It is also capable of reading through template lesions such as 8oxoG, a common DNA lesions produced by reactive oxygen species that causes G to T and C to A substitutions. This may have auseful application in the amplification of FFPE damaged DNA.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">Using TruPrime in single-cell sequencing: I can see several opportunities for using this technology in my lab, including both single-cell systems: </span><a href="http://www.10xgenomics.com/" style="font-family: Georgia, "Times New Roman", serif;">10X Genomics</a><span style="font-family: Georgia, "Times New Roman", serif;"> and </span><a href="https://www.fluidigm.com/products/c1-system" style="font-family: Georgia, "Times New Roman", serif;">Fluidigm C1</a><span style="font-family: Georgia, "Times New Roman", serif;"> for future copy-number methods. It is also likely to be useful for other low-input experiments and we're likely to couple it with </span><a href="http://www.illumina.com/products/nextera_xt_dna_library_prep_kit.html" style="font-family: Georgia, "Times New Roman", serif;">Nextera XT</a><span style="font-family: Georgia, "Times New Roman", serif;"> or similar.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, "Times New Roman", serif;">I'm sure we'll see some great work using this enzyme if it really works as well as the company suggest - if you are using TruPrimer please do let me know how you are getting on!</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<br /></div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com1tag:blogger.com,1999:blog-6334453475526523597.post-29264811072426040912016-07-14T16:20:00.000+01:002016-09-05T13:41:21.472+01:00How much time is lost formatting references?<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">I just completed a grant application and one of the steps required me to list my recent papers in a specific format. This was an electronic submission and I’m sure it could be made much simpler, possibly by working off the DOI or PubMed ID? But this got me thinking about the pain of reformatting references and the reasons we have so many formats in the first place. It took me ten minutes to get references in the required format, and I've spent much longer in the past - all wasted time in a day that is already too full!</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"></span><br />
<a name='more'></a><span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">I use <a href="https://www.mendeley.com/">Mendeley</a> as my reference manager of choice and it has a very good <a href="http://support.mendeley.com/customer/en/portal/articles/168756-installing-and-using-the-word-plugin-in-windows">Word plug-in</a> that makes it easy to add references and build a final reference list when writing papers. I used it for my PhD with over 160 references and it coped pretty well. Mendeley, EndNote, <i>et al </i>make changing reference styles pretty easy, but why do we have to bother at all?</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;">In digging into this I came across a post by Jay Fitzsimmons at <a href="http://canadianfieldnaturalist.blogspot.co.uk/2013/07/reference-formats-little-method-lots-of.html">the Canadian Field-Naturalist blog</a>. Jay's post is well written and describes the problem well - lots of citation styles, but no real evidence about which is most efficient.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>How did reference styles evolve: </b>Once upon a time the only way to access published information was to go to your University library and find the paper you were looking for (it wasn't that long ago). House styles were developed by publishers as a set of standards for the writing and design of articles in their periodicals. There was no, or little, effort to determine what the most </span><span style="font-family: "georgia" , "times new roman" , serif;">efficient way to communicate the information in a reference. A big reason for abbreviating information, or omitting article titles etc from references was to reduce the amount of text - simply to save money for publishers of printed materials. There is even an <a href="http://www.issn.org/services/online-services/access-to-the-ltwa">ISO standard</a> just for abbreviating journal titles! Even though we're in the electronic age there might still be good reasons to abbreviate references. Who wants to read a 300 author list (unless you're one of the authors of course)!</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>What do I think is important: </b>It depends on why I’m looking at a reference in the first place but here are my priorities</span><br />
<ol>
<li><span style="font-family: "georgia" , "times new roman" , serif;">The title is the most common reason I decide if this is a paper I should read, I’d like to see it every time.</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Second on my list is the year of publication, there’s sometimes no point looking at old references in a fast moving field (but beware this simple cull on useful reading materials).</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Then I’d like a link to the paper - personally I’m happy with the PubMed ID or DOI.</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">Lastly is the</span><span style="font-family: "georgia" , "times new roman" , serif;"> lead author(s) as these are likely to be the people with most to gain from the publication in the immediate future.</span></li>
</ol>
</div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">As far as the authors go then in the context of my grant application, or perhaps a CV or job application I’d prefer a simple numbering format: the authors place by numerical ordering of the author list and the total number of authors, perhaps with an asterisk to denote joint first or corresponding author status e.g. 2*/17 where I am the 2nd author in a list of 17 authors, but I'm a joint 1st author.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;">Lastly I’d set it all to a nice delimited format so a screen grab from almost anything can be easily imported into whatever I need to use the reference in.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;">I don't really care about the Journal and certainly not volume and/or page numbers as I am NOT going to look for this in the library!</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br />
</b></span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b>So here’s my suggestion:</b></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;">Murtaza/Dawson/Tsui. <b>Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA.</b> Nature. 2013. DOI:<a href="http://dx.doi.org/10.1038/nature12065">10.1038/nature12065</a>. PMID:<a href="http://www.ncbi.nlm.nih.gov/pubmed/23563269">23563269</a>. 13/17.</span></div>
<div style="text-align: justify;">
<span style="font-family: "georgia" , "times new roman" , serif;"><b><br />
</b></span></div>
<div>
<div style="text-align: justify;">
<b style="font-family: Georgia, "Times New Roman", serif;">Compare this to:</b><br />
<span style="font-family: "georgia" , "times new roman" , serif;">Murtaza M, Dawson SJ, Tsui DW, Gale D, Forshew T, Piskorz AM, Parkinson C, Chin SF, Kingsbury Z, Wong AS, Marass F, Humphray S, Hadfield J, Bentley D, Chin TM, Brenton JD, Caldas C, Rosenfeld N. <b>Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA.</b> Nature. 2013 May 2;497(7447):108-12.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br />
</span> <span style="font-family: "georgia" , "times new roman" , serif;">I guess nothing is going to change in the field anytime soon. But I feel better for getting this off my chest. And </span><span style="font-family: "georgia" , "times new roman" , serif;">I’ve sent feedback to the funder...</span></div>
</div>
</div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com0tag:blogger.com,1999:blog-6334453475526523597.post-56274658871217973092016-07-02T15:38:00.002+01:002016-09-05T13:40:45.469+01:00Comparison of DNA library prep kits by the Sanger Institute<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">A recent paper from <a href="http://www.sanger.ac.uk/people/directory/quail-michael-andrew">Mike Quail's</a> group at the <a href="http://www.sanger.ac.uk/">Sanger Institute</a> compares 9 different library prep kits for WGS. In <a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2757-4"><b>Quantitation of next generation sequencing library preparation protocol efficiencies using droplet digital PCR assays - a systematic comparison of DNA library preparation kits for Illumina sequencing</b></a>, the authors used a digital PCR (ddPCR) assay to look at the efficiency of ligation and post-ligation steps. They show that even though final library yield can be high, this can mask poor adapter ligation efficiency - ultimately leading to lower diversity libraries.</span></div>
<span style="font-family: Georgia, Times New Roman, serif;"><div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In the paper they state that PCR-free protocols offer obvious benefits in not introducing amplification biases or PCR errors that are impossible to distinguish from true SNVs. They also discuss how the emergence of greatly simplified protocols that merge library prep steps can significantly improve the workflow as well as the chemical efficiency of those merged steps. As a satisfied user of the Rubicon Genomics library prep technology (<a href="http://www.nature.com/nature/journal/v497/n7447/full/nature12065.html">e.g. for ctDNA exomes</a>) I'd like to have seen this included in the comparison*. In a <a href="http://core-genomics.blogspot.co.uk/2014/04/choosing-ngs-library-prep-kit-provider.html">2014 post I listed almost 30 different providers</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0rdQkdJTA12QttIkCk75OslFPuY2Uwgek6YlvcF_xN9gIXRAzLDFrM9RAJqXVWQoejDTnNYCBQtDkZ6DMULcg704UNXIvP7PGteo_tpQFZDtGaeqJBslWrBB9zryAv6X7XOwQfUKsQyYT/s1600/Screen+Shot+2016-07-02+at+15.14.22.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="239" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0rdQkdJTA12QttIkCk75OslFPuY2Uwgek6YlvcF_xN9gIXRAzLDFrM9RAJqXVWQoejDTnNYCBQtDkZ6DMULcg704UNXIvP7PGteo_tpQFZDtGaeqJBslWrBB9zryAv6X7XOwQfUKsQyYT/s320/Screen+Shot+2016-07-02+at+15.14.22.png" width="320" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b>Hidden ligation inefficiency:</b> The analysis of ligation efficiency by the authors sheds light on an issue that has been discussed by many NGS users - that of whether library yield is an important QC or not? Essentially yield is a measure of how much library a kit can generate from a particular sample, but it is not a measure of how "good" that library is. Only analysis of final library diversity can really act as a sensible QC.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The authors saw that kits with high adapter ligation efficiency gave similar yields when compared to kits with low adapter ligation efficiency (fig 4 reproduced above). They determined that the most likely cause was that the relatively high amount of adapter-ligated DNA going into PCR inhibits the PCR amplification reaction leading to lower than expected yields. For libraries with low adapter ligation efficiency a much lower amount of adapter-ligated DNA would make it into PCR, but because there is no inhibition the PCR amplification reaction leads to higher than expected yields. The best performing kits were <a href="http://www.illumina.com/products/truseq-nano-dna-library-prep-kit.html">Illumina Truseq Nano</a> and <a href="http://www.illumina.com/products/truseq-dna-pcr-free-library-prep-kits.html">PCR free</a>, and <a href="https://www.kapabiosystems.com/product-applications/products/next-generation-sequencing-2/dna-library-preparation/kapa-hyper-prep-kits/">KAPA Hyper</a> kit with ligation yields above 30%; and the <a href="https://www.kapabiosystems.com/product-applications/products/next-generation-sequencing-2/dna-library-preparation/kapa-hyperplus-kits/">KAPA HyperPlus</a> was fully efficient.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b>Control amplicon bias:</b> the PhiX control used had three separate PCR amplicons amplified to assess bias. The kits with the lowest bias at less than 25 % for each fragment size were <a href="https://www.kapabiosystems.com/product-applications/products/next-generation-sequencing-2/dna-library-preparation/kapa-hyperplus-kits/">KAPA HyperPlus</a> and <a href="https://www.neb.com/products/e7370-nebnext-ultra-dna-library-prep-kit-for-illumina">NEBNext</a>. The Illumina TruSeq Nano kit showed different biases when using the "Sanger adaptors" rather than "Illumina adaptors", which the authors suggest highlights that both adapter and fragment sequence play a role in the cause of this bias.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b>Which kit to choose:</b> The authors took the same decision as most kit comparison papers and shied away from making overt claims about which kit was "best". The did discuss fragmentation and PCR-free as important points to consider.</div>
<div style="text-align: justify;">
<ul>
<li>If you have lots of DNA then aim for <a href="http://www.illumina.com/products/truseq-dna-pcr-free-library-prep-kits.html">PCR-free</a> to remove any amplification errors and/or bias.</li>
<li>If you don't have a Covaris then newest enzymatic shearing methods e.g. KAPA fragmentase have significantly less bias than previous chemical fragmentation methods.</li>
</ul>
</div>
</span><span style="font-family: Georgia, Times New Roman, serif;"><div style="text-align: justify;">
Ultimately practicability, the overall time and number of steps required to complete a protocol, will be uppermost in many users minds. The fastest protocols were <a href="https://www.neb.com/products/e7370-nebnext-ultra-dna-library-prep-kit-for-illumina">NEBNext Ultra </a>kit, <a href="https://www.kapabiosystems.com/product-applications/products/next-generation-sequencing-2/dna-library-preparation/kapa-hyperplus-kits/">KAPA HyperPlus</a>, and <a href="http://www.illumina.com/products/truseq-dna-pcr-free-library-prep-kits.html">Illumina Truseq DNA PCR-free</a>.</div>
</span><span style="font-family: Georgia, Times New Roman, serif;"><div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #666666;"><b>*Disclosure: </b>I am a paid member of Rubicon Genomics' SAB.</span></div>
<div style="text-align: justify;">
<br /></div>
</span></div>
James@cancerhttp://www.blogger.com/profile/02825715598810395734noreply@blogger.com2