On Construction of Gene-PDB Structure Mapping with Applications in Functional Annotation of Human Genes

Full Text (PDF, 249KB), PP.53-59

Views: 0 Downloads: 0


Xi Chen 1,* Hao Jiang 1 Wai-Ki Ching 1 Limin Li 2

1. Advanced Modeling and Applied Computing Laboratory Department of Mathematics The University of Hong Kong, Hong Kong, China

2. Department of Mathematics, Xi’an Jiaotong University,Xi’ an, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2011.02.08

Received: 12 May 2010 / Revised: 15 Aug. 2010 / Accepted: 5 Dec. 2010 / Published: 8 Mar. 2011

Index Terms

Classifiction, Gene Functions, Protein Structures, Prediction, Similarity


Protein 3D structure is one of the key factors in recognizing gene functions. The availability of protein structure data in Protein Data Bank (PDB) enables us to conduct gene function analysis. However, the molecules in the PDB, whose structures have been determined, are always not corresponding to a unique gene. That is to say, the mapping from gene to PDB is not one-to-one. Thus this uncertain property complicates the analysis and increases the difficulty of gene function analysis. In this paper, we attempt to tackle this challenging issue and we study the problem of predicting gene function from protein structures based on the gene-PDB mapping. We first obtain the gene-PDB mapping, which is important in representing a gene by the structure set of all its corresponding PDB molecules. We then define a new gene-gene similarity measurement based on the structure similarity between PDB molecules. We further show that this new measurement matches with gene functional similarity nicely. This means that the measurement we introduced here can be useful for gene function prediction. Numerical examples are given to demonstrate our claim.

Cite This Paper

Xi Chen, Hao Jiang, Wai-Ki Ching, Limin Li, "On Construction of Gene-PDB Structure Mapping with Applications in Functional Annotation of Human Genes", International Journal of Information Technology and Computer Science(IJITCS), vol.3, no.2, pp.53-59, 2011. DOI: 10.5815/ijitcs.2011.02.08


[1] http://hkumath.hku.hk/∼wkc/data/gene.PDBid.txt.

[2] http://hkumath.hku.hk/∼wkc/data/gene.struc.similarity.mat.

[3] http://hkumath.hku.hk/∼wkc/data/gene.goset.mat.

[4] http://hkumath.hku.hk/∼wkc/data/gene.func.similarity.mat.

[5] http://hkumath.hku.hk/∼wkc/data/go.distance.mat

[6] Barabasi, A. and Albert, R. Emergence of scaling in randomnetworks, Science, 286 509 - 512,1999.

[7] Brown, M., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M. Jr. and Haussler, D. Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Sci., 97 262-267, 2000.

[8] Chen, X., Jiang, H., Ching,W. and Li,L. Inferring Functional Annotation for Human genes from Gene-PDB Structure Mapping, Proceedings of the 2nd International Conference on Biomedical Engineering and Computer Science (ICBECS2011), 23-24 April, 2011, Wuhan, China.

[9] Ching, W., Li,L, Tsing, N., Tai, C.Ng, T., Wong A. and Cheng, K. A Weighted Local Least Squares Imputation Method for Missing Value Estimation in Microarray Gene Expression Data, Journal of Data Mining and Bioinformatics 4 331-347, 2010.

[10] Guimera, R. and Nunes Amaral, L. A. Functional cartographyof complex metabolic networks, Nature,433(7028):895–900, 2005.

[11]Guimera, R., Sales-Pardo, M., and Amaral L. A. N. Modularity from fluctuations in random graphs and complex networks, Phys. Rev.E, 70:025101, 2004.

[12] Li, L., Shiga, M., Ching, W. and Mamitsuka, H. Annotating gene functions with integrative spectral clustering on microarray expressions and sequences, Genome Informatics, 22 95-120,2009.

[13] Marcotte, E., Pellegrini, M., Thompson, M., Yeates, T. and Eisenberg, D. A combined algorithm for genome-wide prediction of protein function, Nature, 402 83-86, 1999.

[14] Marcotte, E., Pellegrini, M., Thompson, M. Yeates, T. and Eisenberg, D. Detecting protein function and protein-protein interactions from genome sequences, Science, 285 751-753, 1999.

[15] Newman, M. E. J. and Girvan, M. Finding and evaluating community structure in networks, Phys. Rev. E, 69:026-113,2004.

[16] Pellgrini, M., Marcotte,E.M., Thompson, M.J., Eisenberg,D. and Yeates, T. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles, Proc. Nat.Acad. Sci.,96 (8) 4285-4288,1999.

[17] Rost B. Enzyme function less conserved than anticipated, J.Mol.Biol.,318 595-608,2000.

[18] SGD project. “Saccharomyces Genome Database” ftp://ftp.yeastgenome.org/yeast/sequence similarity/yeast vs yeast/.

[19] Shi, J. and Malik, J. Normalized cuts and image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, 22 888–905,2000.

[20] Song, C., Havlin, S. and Makse, H. Self-similarity of complexnetworks, Nature 433, 392-395,2005.

[21] Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. Missing value estimation methods for DNA microarray, Bioinformatics, 17 520-525,2000.

[22] Watts, D. and Strogatz, S. Collective dynamics of ’small-world’ networks, Nature 393, 440-442,1998.

[23] Gene Ontology. http://www.geneontology.org