Prioritization of Candidate Nonsynonymous Single Nucleotide Polymorphisms via Sequence Conservation Features

Full Text (PDF, 160KB), PP.66-72

Views: 0 Downloads: 0


Jiaxin WU 1 Wangshu ZHANG 1 Rui JIANG 1,*

1. MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China

* Corresponding author.


Received: 23 Jun. 2011 / Revised: 22 Jul. 2011 / Accepted: 26 Aug. 2011 / Published: 5 Oct. 2011

Index Terms

Prioritization, Nonsynonymous Single Nucleotide Polymorphisms (Nssnps), Guilt-By-Association, Euclidean Distance, Manhattan Distance


The Detection of rare variants responsible for human complex diseases has been receiving more and more attentions. However, most existing computational methods for this purpose require the selection of functional variants before statistical analysis. Based on the assumption that nonsynonymous single nucleotide polymor-phisms (nsSNPs) associated with specific diseases should be similar in their properties, we propose a method that utilize conservation scores of nsSNPs and the guilt-by-association principle to prioritize the candi-date nsSNPs for specific diseases. Systematic validation demonstrates that our approach is effective in recovering the relationship between nsSNPs and diseases, with the Manhattan distance measure achieving the most pre-cise prediction results.

Cite This Paper

Jiaxin WU, Wangshu ZHANG, Rui JIANG,"Prioritization of Candidate Nonsynonymous Single Nucleotide Polymorphisms via Sequence Conservation Features", IJEM, vol.1, no.5, pp.66-72, 2011. DOI: 10.5815/ijem.2011.05.09 


[1]Robinson R (2010) Common disease, multiple rare (and distant) variants. PLoS Biol 8: e1000293. 

[2]Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83: 311-321.

[3]Xiong M, Zhao J, Boerwinkle E (2002) Generalized T2 test for genome association studies. Am J Hum Genet 70: 1257-1268.

[4]Ramensky V, Bork P, Sunyaev S (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30: 3894-3900.

[5]Lander ES, Schork NJ (1994) Genetic dissection of complex traits. Science 265: 2037-2048.

[6]Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31: 3812-3814.

[7]Liu DJ, Leal SM (2010) A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 6: e1001156.

[8]Altshuler D, Daly M, Kruglyak L (2000) Guilt by association. Nat Genet 26: 135-137.

[9]Consortium TU (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 38: D142-148.

[10]Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34: D247-251.

[11]Jiang R, Yang H, Zhou L, Kuo CC, Sun F, et al. (2007) Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. Am J Hum Genet 81: 346-360.

[12]Wu J, Zhang W, Jiang R (2010) Comparative study of ensemble learning approaches in the identification of disease mutations. BMEI 2010.

[13]Jiang R, Yang H, Sun F, Chen T (2006) Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy. BMC Bioinformatics 7: 417.

[14]Stenström P (2008) High performance embedded architectures and compilers : third international conference, HiPEAC 2008, Göteborg, Sweden, January 27-29, 2008 : proceedings. Berlin ; New York: Springer. xiii, 400 p. p.

[15]Bourbon M, Duarte MA, Alves AC, Medeiros AM, Marques L, et al. (2009) Genetic diagnosis of familial hypercholesterolaemia: the importance of functional analysis of potential splice-site mutations. J Med Genet 46: 352-357.

[16]Taylor A, Tabrah S, Wang D, Sozen M, Duxbury N, et al. (2007) Multiplex ARMS analysis to detect 13 common mutations in familial hypercholesterolaemia. Clin Genet 71: 561-568.

[17]Humphries SE, Neely RD, Whittall RA, Troutt JS, Konrad RJ, et al. (2009) Healthy individuals carrying the PCSK9 p.R46L variant and familial hypercholesterolemia patients carrying PCSK9 p.D374Y exhibit lower plasma concentrations of PCSK9. Clin Chem 55: 2153-2161.

[18]DNA Mutation Diseases. http://wwwexplorednacouk/dna-mutation-diseaseshtml.