Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques

Full Text (PDF, 1606KB), PP.22-36

Views: 0 Downloads: 0


Mohammed Abo-Zahhad Abo-Zeid 1,* Sabah M. Ahmed 1 Shimaa A. Abd-Elrahman 1

1. Electrical and Electronics Engineering Department, Faculty of Engineering, Assiut University, Assiut, Egypt

* Corresponding author.


Received: 22 Sep. 2011 / Revised: 14 Jan. 2012 / Accepted: 12 Mar. 2012 / Published: 8 Jul. 2012

Index Terms

Genomic Signal Processing, DNA and Proteins Sequences, Numerical Mapping, Codon, Exons and Introns, Short Time Fourier Transform


Using digital signal processing in genomic field is a key of solving most problems in this area such as prediction of gene locations in a genomic sequence and identifying the defect regions in DNA sequence. It is found that, using DSP is possible only if the symbol sequences are mapped into numbers. In literature many techniques have been developed for numerical representation of DNA sequences. They can be classified into two types, Fixed Mapping (FM) and Physico Chemical Property Based Mapping (PCPBM (. The open question is that, which one of these numerical representation techniques is to be used? The answer to this question needs understanding these numerical representations considering the fact that each mapping depends on a particular application. This paper explains this answer and introduces comparison between these techniques in terms of their precision in exon and intron classification. Simulations are carried out using short sequences of the human genome (GRch37/hg19). The final results indicate that the classification performance is a function of the numerical representation method.

Cite This Paper

Mohammed Abo-Zahhad, Sabah M. Ahmed, Shimaa A. Abd-Elrahman, "Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques", International Journal of Information Technology and Computer Science(IJITCS), vol.4, no.8, pp.22-36, 2012. DOI:10.5815/ijitcs.2012.08.03


[1]Vikrant Tomar, Dipesh Gandhi, and Vijaykumar Chakka, Advanced Filters for Genomic signal processing [J]. Int. J. Adapt. Control Signal Process.

[2]B. D. Silverman and R. Linker, A measure of DNA periodicity [J].Theor. Biol.,118:295-300.

[3]D.G. Grandhi and C. Vijaykumar, 2-Simplex Mapping for Identifying the Protein Coding Regions in DNA [C]. TENCON-2007, Taiwan, Oct. 2007, 530.

[4]M. Akhtar, Julien Epps, and E. Ambikairajah, Signal Processing in Sequence Analysis, Advances in Eukaryotic Gene Prediction [J]. IEEE Journal of selected topics in signal processing, June 2008, 2(3):310-321.

[5]H. K. Kwan and S. B. Arniker, numerical representation of DNA sequences [C]. IEEE Inter, Conf. on Electro/Information Technology, EIT '09, Windsor, 2009:307-310.

[6]P. Ramachandran and A. Antoniou, Genomic Digital Signal Processing. Lecture notes,

[7]R. F. Voss, Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Physical Review Letters, 1992, 68(25):3805–3808.

[8]D. Anastassiou, Genomic signal processing [M]. IEEE Signal Processing Magazine, 2001, 18(4): 8–20.

[9] P. D. Cristea, Genetic signal representation and analysis [C]. in Proc. SPIE Inter. Conf. on Biomedical Optics, 2002, 4623:77–84.

[10] P. D. Cristea, Conversion of nucleotides sequences into genomic signals [J]. Cell. Mol. Med, April-June 2002, 6, 279-303.

[11] P. D. Cristea, Representation and analysis of DNA sequences. in Genomic signal processing and statistics: EURASIP Book Series in Signal Processing and Communications, (Eds) Edward R. Dougherty et al Hindawi Pub. Corp, 2005, 2 :15-66.

[12]N. Chakravarthy, A. Spanias, L. D. Lasemidis, and K. Tsakalis, Autoregressive modeling and feature analysis of DNA sequences [J]. EURASIP Journal of Genomic Signal Processing, January 2004, 1:13-28.

[13]M. Akhtar, J. Epps, and E. Ambikairajah, On DNA numerical representations for period-3 based exon prediction. in Proc. of IEEE Workshop on Genomic Signal Processing and Statistics (GENSIPS), Tuusula, June 2007:1-4. 

[14] A. K. Brodzik and O. Peters, Symbol-Balanced Quaternionic Periodicity Transform for Latent Pattern Detection in Dna Sequences [C] Proceedings of IEEE Inter. Conf. on Acoustics, Speech, and Signal Processing, ICASSP '05, 2005, 5: 373-376.

[15]T. P. George and T. Thomas, Discrete wavelet transform de-noising in eukaryotic gene splicing. BMC Bioinformatics 2010

[16]NCBI GenBank database, online access:

[17]Stuart W. A. B and A. Antoniou, Application of Parametric Window Functions to the STDFT For Gene Prediction [C]. IEEE Pacific Rim Conf. communications, computers and signal Processing, PACRIM’05, 2005:324-327.

[18]Achuthsankar S. Nair and Sreenadhan S. Pillai, A coding measure scheme employing electron-ion interaction pseudo potential (EIIP), Bio-information, Oct. 2006, 1: 197-202.

[19]I. Cosic, Macromolecular Bioactivity: Is it resonant interaction between macromolecules? Theory and Applications. IEEE Transactions on Biomedical Eng., Dec. 1994, 41:1101-1114.

[20]Todd Holden, R. Subramaniam, R. Sullivan, E. Cheng, C. Sneider, G.Tremberger, Jr. A. Flamholz, D. H. Leiberman, and T. D. Cheung, ATCG nucleotide fluctuation of Deinococcus radiodurans radiation genes. in Proc. of Society of Photo-Optical Instrumentation Engineers (SPIE), August 2007, 6694:. 669417-1 to 669417-10.

[21]S. V. Buldyrev, A. L. Goilberger, S. Havlin, R. N. Mantegna, M. E. Mastsa, C.-K. Peng, M. Simons, and H. E. Stanley, Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. Phy. Rev. E, May 1995, 51(5): 5084-5091.

[22]M. Akhtar, J. Epps, and E. Ambikairajah, “Paired Spectral Content Measure for Gene and Exon Prediction in Eukaryotes [C]. Inter. Conf. on Information and Emerging Technologies, ICIET’07, Karachi, July 2007:1- 4. 

[23]C.-K. Peng, S.V. Buldyrev, A.L. Goldberger, S. Havlin, F. Sciortino, M. Simons, H.E. Stanley, A.L. Goldberger, S. Havlin, C.-K. Peng, H.E. Stanley, G.M. Viswanathan, Analysis of DNA sequences using methods of statistical physics, Physica A, Elsevier Science B.V, 1998, 249: 430-438.

[24]J. A. Berger, S. K. Mitra, M. Carli, and A. Neri, Visualization and analysis of DNA sequences using DNA walks [J]. Journal of the Franklin Institute, January-March 2004, 341:37-53.

[25]R. Zhang and Chun-Ting Zhang, Identification of replication origins in archaeal genomes based on the Z-curve method. 2005 Heron Publishing-Victoria, Canada, Archaea 1, Nov. 2004:335–346 

[26]N. F. Law, K. Cheng and W. Siu, On relationship of Z-curve and Fourier approaches for DNA coding sequence classification, Bioinformation, 2006, 1(7) : 242-246.

[27]J. Y. Y. Kwan, B. Y. M. Kwan and H. K. Kwan, Spectral analysis of numerical exon and intron sequences [C]. Proceedings of IEEE Inter. Conf. on Bioinformatics and Biomedicine Workshops, Hong Kong, 2010 : 876-877.

[28]D Karolchik, AS Hinrichs, TS Furey, KM Roskin, CW Sugnet, D Haussler, WJ Kent, The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32(Database issue), 2004, D493-496.

[29]J Goecks, A Nekrutenko, J Taylor, The Galaxy Team, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 2010, 11(8).

[30]D Blankenberg, G Von Kuster, N Coraor, G Ananda, R Lazarus, M Mangan, A Nekrutenko, J Taylor, Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. Chapter 19, 2010, Unit 19.10.1-21.

[31]B Giardine, C Riemer, RC Hardison, R Burhans, L Elnitski, P Shah, Y Zhang, D Blankenberg, I Albert, J Taylor, W Miller, WJ Kent, A Nekrutenko, Galaxy.

[32]J. Y. Y. Kwan, B. Y. M. Kwan and H. K. Kwan, Novel methodologies for spectral classification of exon and intron sequences [J]. EURASIP Journal on Advances in Signal Processing, 2012.