The GIAB consortium (@GenomeInABottle) took a major step forward today when it released the first NIST reference material for Human genome sequencing, the story even made it into the New York Times. It comes at an important time when we're moving into an era where millions of people are getting genome-based genetic tests. The GIAB standard will allow labs to demonstrate their capability to detect known variants, and measure the noise introduced by their tests. The GIAB RM8398 is probably the most sequenced Human sample of all time, and has orders of magnitude more confirmed variants than anything else including reference calls for SNPs, small indels, and homozygous reference genotypes for almost 80% of the genome. NA12878 has already been referenced in almost 250 publications on PubMedCentral.
It is intended to be used as a reference in assessing the performance of NGS variant detection. Importantly this sample is of high-molecular weight with the peak of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA - this is likely to be important as we push ahead with long-read technologies like Complete LFR, Illumina Moleculo, 10X Genomics, BioNano, PacBio and, of course, Oxford Nanopore.
It will allow an assessment of a method, an instrument, or a labs sensitivity and specificity (true positives, false positives, true negatives, and false negatives for variant calls). However one point niggling the GIAB community is that variants are all based on the GRCh37 reference assembly, new analysis will be needed as each new reference assembly is released.
Other GIAB genomes: It is not just the Pilot Genome NA12878 that is getting this loving treatment. At least two other GIAB reference materials are in the works, an Ashkenazi trio, and a Han Chinese tri; both expected in 2016.
- NA12878: DNA and GM12878 cell line were already available from Coriell. Now high-confidence variant calls, public datasets, 300x coverage HiSeq 2500 PE150 sequencing data as FASTQ, and ~44x PacBio data from Mt. Sinai School of Medicine. The methods used for analysis were published in Nature Biotechnology last year.
- Ashkenazi trio: Currently available as BAM and VCFs from 50x, and FASTQ from 300x Illumina sequencing, as data from the Personal Genome Project, Complete Genomics data, PacBio, Ion Exome, BioNano. Other data is on the way including Illumina mate-pair, Complete Genomics LFR, and Moleculo sequencing (what about 10X), and the group expect a paper submission in a few months.
- Han Chinese trio: Currently available as data from the Personal Genome Project (PGP IDs: hu91BD69/huCA017E/hu38168C) and as Complete Genomics data, and BioNano.
How do you get hold of it: RM 8398, NA12878, GIAB is available now, $450 gets you 10ug - expiry date Christmas Eve 2024. Buy it on the NIST website.