IJMECS Vol. 17, No. 6, 8 Dec. 2025
Cover page and Table of Contents: PDF (size: 787KB)
PDF (787KB), PP.146-159
Views: 0 Downloads: 0
Scientific Article, Classification, Convolutional Neural Network, Ensemble Learning, TFIDF, SBERT
The classification of scientific articles faces challenges due to the complexity and diversity of academic content. In response to this issue, a new approach is proposed, utilizing Ensemble Learning, specifically Decision Tree, Random Forest, AdaBoost, and XGBoost, along with Convolutional Neural Network (CNN) techniques. This study utilizes the arXiv dataset, comparing the effectiveness of Term Frequency-Inverse Document Frequency (TFIDF) and Sentence-BERT (SBERT) for text representation. To further refine feature extraction, vectors derived from SBERT are integrated into the CNN framework for dimensionality reduction and obtaining more representative feature maps named latent feature vectors. The study also observes the impact of incorporating both the title and abstract on performance, demonstrating that richer textual information enhances model accuracy. The hybrid model (CNN + Ensemble Learning) demonstrates a substantial improvement in classification accuracy compared to traditional Ensemble Learning. The evaluation shows that CNN + SBERT with XGBoost achieved the highest accuracy of 94.62%, showcasing the benefits of combining advanced feature extraction techniques with powerful models. This research emphasizes the potential of integrating CNN within the Ensemble Learning paradigm to enhance the performance of scientific article classification and provides insights into the crucial role of CNN in improving model accuracy. Additionally, the study highlights the superior performance of SBERT in feature extraction, contributing beneficially to the overall model.
I. Nyoman Switrayana, Neny Sulistianingsih, "Leveraging Convolutional Neural Network to Enhance the Performance of Ensemble Learning in Scientific Article Classification", International Journal of Modern Education and Computer Science(IJMECS), Vol.17, No.6, pp. 146-159, 2025. DOI:10.5815/ijmecs.2025.06.10
[1]X. Luo, “Efficient English text classification using selected Machine Learning Techniques,” Alexandria Eng. J., vol. 60, no. 3, pp. 3401–3409, 2021, doi: 10.1016/j.aej.2021.02.009.
[2]Q. Li et al., “A Survey on Text Classification: From Traditional to Deep Learning,” ACM Trans. Intell. Syst. Technol., vol. 13, no. 2, 2022, doi: 10.1145/3495162.
[3]M. Rivest, E. Vignola-Gagné, and É. Archambault, “Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling,” PLoS One, vol. 16, no. 5 May, pp. 1–18, 2021, doi: 10.1371/journal.pone.0251493.
[4]M. Gao et al., “A Novel Hierarchical Discourse Model for Scientific Article and It’s Efficient Top-K Resampling-based Text Classification Approach,” Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern., vol. 2022-Octob, no. 2, pp. 774–781, 2022, doi: 10.1109/SMC53654.2022.9945306.
[5]S. Zhang, S. Wang, R. Liu, H. Dong, X. Zhang, and X. Tai, “A bibliometric analysis of research trends of artificial intelligence in the treatment of autistic spectrum disorders,” Front. Psychiatry, vol. 13, 2022, doi: 10.3389/fpsyt.2022.967074.
[6]S. Chowdhury and M. P. Schoen, “Research Paper Classification using Supervised Machine Learning Techniques,” 2020 Intermt. Eng. Technol. Comput. IETC 2020, 2020, doi: 10.1109/IETC47856.2020.9249211.
[7]M. Miric, N. Jia, and K. G. Huang, “Using supervised machine learning for large-scale classification in management research: The case for identifying artificial intelligence patents,” Strateg. Manag. J., vol. 44, no. 2, pp. 491–519, 2023, doi: 10.1002/smj.3441.
[8]A. Wahdan, S. Hantoobi, S. A. Salloum, and K. Shaalan, “A systematic review of text classification research based on deep learning models in Arabic language,” Int. J. Electr. Comput. Eng., vol. 10, no. 6, pp. 6629–6643, 2020, doi: 10.11591/IJECE.V10I6.PP6629-6643.
[9]D. N. Reddy, S. M. Bagali, and S. Sadiya, “Machine Learning Algorithms for Detection: A Survey and Classification,” Turkish J. Comput. Math. Educ., vol. 12, no. 10, pp. 3468–3475, 2021.
[10]H. Manoharan et al., “A machine learning algorithm for classification of mental tasks,” Comput. Electr. Eng., vol. 99, no. February, p. 107785, 2022, doi: 10.1016/j.compeleceng.2022.107785.
[11]M. Canaparo, S. C. Todeschini, and E. Ronchieri, “Transformer-based Architecture for Assisting Title and Abstract Screening of a Systematic Review,” pp. 1–1, 2023, doi: 10.1109/nssmicrtsd49126.2023.10338654.
[12]I. N. Switrayana and N. U. Maulidevi, “Collaborative Convolutional Autoencoder for Scientific Article Recommendation,” Proc. - 2022 9th Int. Conf. Inf. Technol. Comput. Electr. Eng. ICITACEE 2022, pp. 96–101, 2022, doi: 10.1109/ICITACEE55701.2022.9924130.
[13]P. L. Teh and C. F. Uwasomba, “Impact of Large Language Models on Scholarly Publication Titles and Abstracts: A Comparative Analysis,” J. Soc. Comput., vol. 5, no. 2, pp. 105–121, 2024, doi: 10.23919/JSC.2024.0011.
[14]J. Eykens, R. Guns, and T. C. E. Engels, “Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches,” Quant. Sci. Stud., vol. 2, no. 1, pp. 89–110, 2021, doi: 10.1162/qss_a_00106.
[15]J. Dai and C. Chen, “Text classification system of academic papers based on hybrid Bert-BiGRU model,” Proc. - 2020 12th Int. Conf. Intell. Human-Machine Syst. Cybern. IHMSC 2020, vol. 2, pp. 40–44, 2020, doi: 10.1109/IHMSC49165.2020.10088.
[16]M. M. Ahanger and M. A. Wani, “Novel Deep Learning Approach for Scientific Literature Classification,” Proc. 2022 9th Int. Conf. Comput. Sustain. Glob. Dev. INDIACom 2022, pp. 249–254, 2022, doi: 10.23919/INDIACom54597.2022.9763218.
[17]M. M. Ahanger and M. A. Wani, “Comparative analysis of deep learning approaches for scientific literature classification,” Proc. 2021 8th Int. Conf. Comput. Sustain. Glob. Dev. INDIACom 2021, pp. 74–80, 2021, doi: 10.1109/INDIACom51348.2021.00015.
[18]J. Zhang, K. Chusap, W. Zhang, and C. Liu, “Improving Paper Classification Using Forecasting,” Proc. - 2022 IEEE Int. Conf. Big Data, Big Data 2022, pp. 2454–2460, 2022, doi: 10.1109/BigData55660.2022.10020764.
[19]C. Fan, Y. Li, and Y. Wu, “Multi feature fusion paper classification model based on attention mechanism,” Proc. - 2023 5th Int. Conf. Nat. Lang. Process. ICNLP 2023, pp. 308–312, 2023, doi: 10.1109/ICNLP58431.2023.00063.
[20]A. Mulahuwaish, K. Gyorick, K. Z. Ghafoor, H. S. Maghdid, and D. B. Rawat, “Efficient classification model of web news documents using machine learning algorithms for accurate information,” Comput. Secur., vol. 98, 2020, doi: 10.1016/j.cose.2020.102006.
[21]I. M. Rabbimov and S. S. Kobilov, “Multi-Class Text Classification of Uzbek News Articles using Machine Learning,” J. Phys. Conf. Ser., vol. 1546, no. 1, pp. 0–11, 2020, doi: 10.1088/1742-6596/1546/1/012097.
[22]N. Sulistianingsih and I. N. Switrayana, “Enhancing Sentiment Analysis for the 2024 Indonesia Election Using SMOTE-Tomek Links and Binary Logistic Regression,” Int. J. Educ. Manag. Eng., vol. 14, no. 3, pp. 22–32, 2024.
[23]B. Devlin and R. Liu, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3982–3992.
[24]H. Choi, J. Kim, S. Joe, and Y. Gwon, “Evaluation of BERT and Albert sentence embedding performance on downstream NLP tasks,” Proc. - Int. Conf. Pattern Recognit., pp. 5482–5487, 2020, doi: 10.1109/ICPR48806.2021.9412102.
[25]I. N. Switrayana, R. Hammad, P. Irfan, T. T. Sujaka, and M. H. Nasri, “Comparative Analysis of Stock Price Prediction Using Deep Learning with Data Scaling Method,” vol. 7, no. 1, pp. 78–90, 2025.
[26]S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah, “Comparative Analysis using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, pp. 1082–1090, 2023, doi: 10.14569/IJACSA.2023.01406116.
[27]T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-August-2016, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
[28]M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels,” Technologies, vol. 13, no. 3, pp. 1–40, 2025, doi: 10.3390/technologies13030088.
[29]Y. Ibrahim, E. Okafor, B. Yahaya, S. M. Yusuf, Z. M. Abubakar, and U. Y. Bagaye, “Comparative Study of Ensemble Learning Techniques for Text Classification,” 2021 1st Int. Conf. Multidiscip. Eng. Appl. Sci. ICMEAS 2021, no. May, 2021, doi: 10.1109/ICMEAS52683.2021.9692306.
[30]Z. Shao, M. N. Ahmad, and A. Javed, “Comparison of Random Forest and XGBoost Classifiers Using Integrated Optical and SAR Features for Mapping Urban Impervious Surface,” Remote Sens., vol. 16, no. 4, 2024, doi: 10.3390/rs16040665.
[31]S. Gonçalves, P. Cortez, and S. Moro, “A deep learning classifier for sentence classification in biomedical and computer science abstracts,” Neural Comput. Appl., vol. 32, no. 11, pp. 6793–6807, 2020, doi: 10.1007/s00521-019-04334-2.
[32]S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep Learning-Based Text Classification: A Comprehensive Review,” ACM Comput. Surv., vol. 54, no. 3, pp. 1–40, 2021, doi: 10.1145/3439726.