I've been finding out more about exomes: specifically QC analysis using HS Metrics in Picard. There's loads of useful metrics and I'm hoping to get to a point that I can explain these to users here and also look at the results to try and troubleshoot an experiment. I'm also trying to understand what sort of read length we should be using for exome analysis. An earlier post discussed my thoughts around moving to PE125 or switching to SE125 and running more lanes. In a follow up post (watch this space) I'll try to consider the impact of different run modes: will users/reviewers accept any kind of read for an exome or will they baulk at seeing something different from the paired-end norm?
PS: Your comments on this would be greatly appreciated!
The first thing I really wanted to get my head around was the difference between baits and targets. On the SAMtools mailing list many people suggest simply using the targets only, however that seems to defeat the purpose of having different files in the first place. Baits and targets are different; I'm guessing the companies that manufacture exome capture kits use an in-house algorithm to design baits from a list of targets and that differences in these may be somewhat immaterial if you only care about how well a single gene, e.g. TP53, is captured.
Baits and targets defined: A bait - is the biotinylated oligo-nucleotide or cRNA used to pull out fragments from a whole genome library. A target - is a region of the genome the manufacturer was trying to enrich for, it does not necessarily mean the capture will work a the target region, or that even if it does coverage will be even across different targets or samples!
Ideally we'd sequence the whole of a gene like TP53 as mutations can be found all over the place, however analysis of exons 4–10 covers well over 90% of mutations. The image below shows the genomic coordinates for exons 4-10 of TP53, and if you look closely you can just make out that baits are a little smaller than targets, and that some exons have multiple baits (exon 4).
click the image to navigate to the session. |
The distribution of target sizes is pretty high in the Nextera Rapid capture 1.2 we're using in my lab. The total target size is 451Mb (using 336Mb of 79bp baits). The majority of targets are under 250bp, with almost 50% being under 150bp, or about the size of one exon. But 6% of targets are over 400bp and 10 are over 10Kb in length and very heavily tiled with baits.
HSmetrics: The tools in PICARD are really very useful, but did take a while for me to figure out. A 2010 presentation from the Broad's Kristian Cibulskis and Andrew Kernytsky: Quality Assessment of Hybrid Selection Experiments,
explains quite a lot about the Broad's capture and analysis pipelines; and slides 9-12 cover interpretation of HS_metrics.
From a quality point of view a few of the metrics turn out to be very very useful (I'd appreciate some help filling in the numbers you think should be QC cutoffs!). Here are a few I think are most helpful...
PF_Unique_Reads [x%]: The number of PF reads that are not marked as duplicates
PF_UQ_Bases_Aligned [>90%]: the number of bases left after duplicate removal and alignment.
On_Target_Bases [x%]: the number of bases aligning to the targets file
PCT_Selected_Bases [x%]:On+Near Bait Bases / PF Bases Aligned.
Fold_enrichment [x%]: The fold by which the baited region has been amplified above genomic background.
A poster from AGBT '13: Comparison of Illumina Sequencing Run Lengths and Genome References on Whole Exome Captured Libraries, by the Center for Inherited Disease Research (CIDR) in Baltimore shed some light on the move from PE75-100, and therefore helps me understand how the move to PE125 might impact us. They ran the same libraries on both run types and compared various exome analysis metrics. The PE100 data aligns better resulting in a slight increase in coverage.
I'll be back with some comments on using different read-lengths for exomes: is there a "correct" choice?
WHEN ALL HOPES IS LOST MEDICALLY TO THE CURE OF MY HERPES THE ONLY AWAITING SOLUTION I NEVER KNEW IS DR OGUDU SPELL TEMPLE. FOR THOSE WHO ARE STILL SHARING MY PREVIOUS HOPE IN MEDICAL CURE AS A SOLUTION TO YOUR CURE SHOULD STOP SEARCHING AND GO TO DR.OGUDU FOR INSTANT CURE. CONTACT HIM NOW WITH THE FOLLOWING DETAILS EMAIL ADDRESS [OGUDUSPELLTEMPLE@YAHOO.COM] WHATSAAPP NUMBER +2348106058254 Website---http://oguduspelltemple.wixsite.com/drogudu
ReplyDeleteAm COREY ANN from URUGUAY i was suffering from genital herpes, oral herpes, shingles. Before i came in contact with dr.ogudu
It is no longer news that the Acquired immune deficiency syndrome Herpes Virus is increasing by the day, The fear is that many people living with the sickness are scared of saying it because of the stigma that comes along with it.I am bold enough among many others to state that there is now a potent cure to this sickness but many are unaware of it.I discovered that I was infected with the virus 3 months ago, after a medical check-up. My doctor told me and I was shocked, confused and felt like my world has crumbled. I was dying slowly due to the announcement of my medical practitioner but he assured me that I could leave a normal life if I took my medications, In a bid to look for a lasting solution to my predicament, I sought for solutions from the voodoo world. I went online and searched for every powerful trado-medical practitioner that I could severe, because I heard that the African Voodoo Priests had a cure to the Herpes syndrome. It was after a little time searching the web that I came across one Dr.Ogudu website who offered to help me, He gave me some steps to follow and I meticulously carried out all his instructions. Two days ago to be precise, I went back to the hospital to conduct another test and to my amazement, the results showed that " I am NEGATIVE" You can free yourself of this Herpes virus by consulting this great African Voodoo Priest via this EMAIL [OGUDUSPELLTEMPLE@YAHOO.COM] WHATASPP NUMBER +2348106058254