Hybrid Machine Learning Approaches for DNA Classification: A Stacking Classifier Perspective

PDF (852KB), PP.104-114

Views: 0 Downloads: 0

Author(s)

Sultanul A. Hamim 1,* Dip Nandi 1 Niloy E. Costa 2

1. Department of Computer Science, American International University-Bangladesh, Dhaka, 1229, Bangladesh

2. Department of Computer Science, York University’s Lassonde School of Engineering, Canada

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.03.07

Received: 23 Jan. 2026 / Revised: 16 Mar. 2026 / Accepted: 12 Apr. 2026 / Published: 8 Jun. 2026

Index Terms

DNA Sequence Classification, Machine Learning, Hybrid Models, Bio-informatics, Predictive Analytics

Abstract

This paper presents a hybrid machine learning model for the classification of DNA sequences by combining different machine learning algorithms, including K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), Decision Tree, Random Forest, Light Gradient Boosting Machine (LGBM), and XGBoost (XGB). This model has been developed using the stacking ensemble method, associated with a majority voting mechanism to achieve improved overall classification accuracy. In this study, the Promoter Gene Sequences dataset from the UCI Machine Learning Repository was used to concentrate on classifying promoter versus non-promoter sequences. The results indicated an accuracy of 96.25%, showcasing the hybrid model’s ability to classify DNA sequences effectively. This research provides valuable insights into ensemble machine-learning techniques in DNA classification, with possible applications in genomics research, medical diagnostics, agricultural biotechnology, and forensic science. The hybrid model’s thriving implementation demonstrates the potential for more accurate and reliable DNA sequence classification methods.

Cite This Paper

Sultanul A. Hamim, Dip Nandi, Niloy E. Costa, "Hybrid Machine Learning Approaches for DNA Classification: A Stacking Classifier Perspective", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.3, pp.104-114, 2026. DOI:10.5815/ijisa.2026.03.07

Reference

[1]Md Ahsan Habib, Md Motaleb Hossen Manik, and Bangladesh Khulna. Classification of dna sequence using machine learning techniques. EasyChair, Aug, 4, 2022.
[2]Samia M Abd-Alhalem, El-Sayed M El-Rabaie, Naglaa Soliman, Salah Eldin SE Abdulrahman, Nabil A Ismail, Abd El-samie, E Fathi, et al. Dna sequences classification with deep learning: a survey. Menoufia Journal of Electronic Engineering Research, 30(1):41–51, 2021.
[3]Huifen Cao, Bolin Deng, Tianrong Song, Jiabian Lian, Lu Xia, Xiaojing Chu, Yufei Zhang, Fujian Yang, Chunlian Wang, Ye Cai, et al. Genome-wide profiles of dna damage represent highly accurate predictors of mammalian age. Aging Cell, 23(5):e14122, 2024.
[4]Nyme Ahmed, Sultanul Arifeen Hamim, and Dip Nandi. A comprehensive study to analyze student evaluations of teaching in online education. International Journal of Modern Education and Computer Science (IJMECS), 16(5):105–117, 2024.
[5]Ays¸e Tug˘ba Dosdog˘ru and AslA¨ ±Boru ˙Ipek. Hybrid boosting algorithms and artificial neural network for wind speed prediction. International Journal of Hydrogen Energy, 47(3):1449–1460, 2022.
[6]Aimin Yang, Wei Zhang, Jiahao Wang, Ke Yang, Yang Han, and Limin Zhang. Review on the application of machine learning algorithms in the sequence data mining of dna. Frontiers in Bioengineering and Biotechnology, 8:1032, 2020.
[7]Manoj Kumar Goshisht. Machine learning and deep learning in synthetic biology: Key architectures, applications, and challenges. ACS omega, 9(9):9921–9945, 2024.
[8]MS Antony Vigil, Alan Christofer, Mithun Chandar, and Jayna Mukesh. Comparative analysis of machine learning algorithms for dna sequencing. In 2023 Winter Summit on Smart Computing and Networks (WiSSCoN), pages 1–4. IEEE, 2023.
[9]T Keerthika, SR Kanimozhi, V Oviya Svapna, V Kaviya, and L Preethi. Cancer prediction using adaptive boosting tech web app. In 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), pages 890–896. IEEE, 2023.
[10]Shan Li. Construction of a multimodal poetry translation corpus based on adaboost model. Applied Mathematics and Nonlinear Sciences, 9(1), 2023.
[11]Ronke Seyi Babatunde, Akinbowale Nathaniel Babatunde, Bukola Fatimah Balogun, Ibrahim Aliyu Yakubu, Rose- line Oluwaseun Ogundokun, Kolawole Yusuf OBIWUSI, and Emmanuel Umar. A comparison of boosting tech- niques for classification of microarray data. Ilorin Journal of Computer Science and Information Technology, 6(2):1–8, 2023.
[12]Sarah M Ayyad, Ahmed I Saleh, and Labib M Labib. Gene expression cancer classification using modified k-nearest neighbors technique. Biosystems, 176:41–51, 2019.
[13]Riccardo Rizzo, Antonino Fiannaca, Massimo La Rosa, and Alfonso Urso. A deep learning approach to dna se- quence classification. In Computational Intelligence Methods for Bioinformatics and Biostatistics: 12th Interna- tional Meeting, CIBB 2015, Naples, Italy, September 10-12, 2015, Revised Selected Papers 12, pages 129–140. Springer, 2016.
[14]Steve Agajanian, Odeyemi Oluyemi, and Gennady M Verkhivker. Integration of random forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations. Frontiers in molecular biosciences, 6:44, 2019.
[15]Belal A Hamed, Osman Ali Sadek Ibrahim, and Tarek Abd El-Hafeez. Optimizing classification efficiency with machine learning techniques for pattern matching. Journal of Big Data, 10(1):124, 2023.
[16]Hemalatha Gunasekaran, Krishnasamy Ramalakshmi, A Rex Macedo Arokiaraj, S Deepa Kanmani, Chandran Venkatesan, and C Suresh Gnana Dhas. Analysis of dna sequence classification using cnn and hybrid models. Computational and Mathematical Methods in Medicine, 2021(1):1835056, 2021.
[17]Wangchao Lou, Xiaoqing Wang, Fan Chen, Yixiao Chen, Bo Jiang, and Hua Zhang. Sequence based prediction of dna-binding proteins based on hybrid feature selection using random forest and gaussian naive bayes. PloS one, 9(1):e86703, 2014.
[18]R Touati, I Messaoudi, AE Oueslati, Z Lachiri, and M Kharrat. New intraclass helitrons classification using dna- image sequences and machine learning approaches. Irbm, 42(3):154–164, 2021.
[19]Michael A Zeller, Zebulun W Arendsee, Gavin JD Smith, and Tavis K Anderson. classlog: Logistic regression for the classification of genetic sequences. Frontiers in Virology, 3:1215012, 2023.
[20]Kyung-Joong Kim and Sung-Bae Cho. Ensemble classifiers based on correlation analysis for dna microarray classi- fication. Neurocomputing, 70(1-3):187–199, 2006.
[21]Md Ahsan Habib, Md Motaleb Hossen Manik, and Bangladesh Khulna. Classification of dna sequence using ma- chine learning techniques. EasyChair, Aug, 4, 2022.
[22]Susanne S Renner, Mark D Scherz, Conrad L Schoch, Marc Gottschling, and Miguel Vences. Improving the gold standard in ncbi genbank and related databases: Dna sequences from type specimens and type strains. Systematic Biology, 73(2):486–494, 2024.
[23]Karen Scida, Bingling Li, Andrew D Ellington, and Richard M Crooks. Dna detection using origami paper analytical devices. Analytical chemistry, 85(20):9713–9720, 2013.