A few weeks ago I wrote about an idea I had been thinking about for a while for using SNPs as additional content in capture or amplicon assays to provide patient identification at the same time as generating test data. I have many responses from others that would like to do this and the question I have been asked most is which SNPs should we be using.
I’m afraid I can’t answer that question and think a community discussion would be the best way forwards although I am not sure how best to kick-start that discussion. If you have any ideas do let me know! I’ll ask CRUK’s StratMed board for starters.
One person also pointed to a couple of papers and Sequenom’s iPlex sample ID panel. The assay runs a 52SNP multiplex PCR on the MassArray sysytem. I think the biggest thing missing from the SPIA assay on Sequenom is the lack of coding SNP information. I suggested in the older post that using coding SNPs would allow the assay to be more widely used in research settings and even in future clinical RNA-seq based gene expression panels. PAm50-seq perhaps?
I'm getting some designs done to test different methods in the lab. Feel free to send me suggestions on what criteria should be considered when choosing SNPs.
Sanchez et al; A multiplex assay with 52 single nucleotide polymorphisms for human identification. The authors are members fo the EU SNP for ID project they recommended 52 SNP markers in the publication where they describe their choice of 52 SNP markers and development and validation of a multiplex PCR assay. The project webist has a very nice tool to visualise SNP distribution across different populations.
Their SNP selction criteria were;
i) Maximum 120bp amplicon size
ii) Minimum 0.28 MAF in one and 0.17 in at least three populations
iii) Random distribution of SNPs
iv) 100kb distance from neighboring marker SNPs or genes
flanking DNA sequence reliably reported and free from
interfering poly- morphisms, such as nucleotide substitutions in
potential primer binding sites.
analysed SNPs using single-base extension (SBE) with ABI SNaPshot kits,
products were detected on a capillary sequencer and analysed with
Paper 2: Demichelis et al; SNP panel identification assay (SPIA): a genetic- based assay for the identification of cell lines. Used cell lines run on Affymetrix SNP genotyping arrays to identify candiate SNPs. They
commented in their paper that they would have preferred to use whole
genome genotype data from a larger number of samples to rank SNPs that
best distinguish sampes and using an iterative design could define the
“most accurate and parsimonious panel  and how many [SNPs] are needed”. They
used cell lines typed on Affymetrix XBba 50k arrays.
Their SNP selection criteria were;
i) SNPs have assigned rs identifier
ii) SNPs are not located in intronic regions
iii) SNPs are also represented on the 10 K Affymetrix oligonucleotide array.
iterative testing generated an average of 30-40 SNPs as the optimum. They
developed a SPIA test panel to run on Sequenom’s MassArray which uses
multiplex PCR as the method to target specific genomic loci. The SPIA
analysis tool is written in R and available here.
Their conclusion was that any 40 SNPs from their top 100 would produce a
good SNP panel for Human identification but that more SNPs equals more
confidence! They also showed that a SNP-ID panel can be used to monitor genetic
drift of cell lines during passaging.