Reinforcement Learning for Automated Literature Screening: Enhancing E-Learning and University Research Classification in Computer Science

PDF (802KB), PP.42-56

Views: 0 Downloads: 0

Author(s)

Enes Bajrami 1,* Florim Idrizi 2 Shpend Ismaili 2

1. Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia

2. Faculty of Natural Sciences and Mathematics, University of Tetova, Tetovo, North Macedonia

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2026.01.03

Received: 6 Aug. 2025 / Revised: 26 Sep. 2025 / Accepted: 11 Nov. 2025 / Published: 8 Feb. 2026

Index Terms

Reinforcement Learning, e-Learning, Prisma, API, Python, Classification

Abstract

Reinforcement Learning (RL) is a successful and established Artificial Intelligence (AI) method, particularly with recent groundbreaking progress in Deep Reinforcement Learning (DRL). Reinforcement learning is very well suited for sequential decision-making tasks, wherein a learned agent learns an optimal policy after many interactions with an environment. The present paper examines the application of reinforcement learning for automating screening of literature in academic research, particularly in the fields of computer science and e-learning. Keyword filtering techniques, while predominantly applied, are found to be inflexible as well as unable to capture the dynamic nature of research themes. To overcome such constraints, we present a Deep Q-Network (DQN)-based reinforcement learning model that combines reinforcement learning with the Semantic Scholar API to enhance research paper classification based on dynamically acquired decision rules. The proposed reinforcement learning model was trained and tested with a dataset of 8,934 research papers, accessed by systematic searching. The agent filtered and picked 11 effective papers depending on improved selection criteria like publication date, keyword relevance, and scholarly topic provided. The model iteratively optimizes the decision-making process through reward-based learning and therefore maximizes categorization accuracy over time. Test experiments demonstrate utilization of RL-based suggested framework yields classification accuracy at 91.5%, recall at 86.3%, and precision at 89.7%. A comparison test demonstrates that the approach performs 12.5% better on recall and 9.8% better on accuracy compared to traditional keyword-filtering methods. The finding confirms the power of the model in minimizing false positives and false negatives for screening literature, hence proving the scalability and adaptability of reinforcement learning in managing high academic data. This work offers a scalable, cognitive approach to conducting systematic reviews of literature through the application of reinforcement learning to programmatically execute work in academic research. The work shows the promise of reinforcement learning to further enhance research methodology, make literature reviews more effective, and facilitate more knowledgeable decision-making in fast-changing scientific disciplines. Further research will be focused on incorporating hybrid AI models with multi-agent systems of reinforcement learning for responsiveness and classification enhancements.

Cite This Paper

Enes Bajrami, Florim Idrizi, Shpend Ismaili, "Reinforcement Learning for Automated Literature Screening: Enhancing E-Learning and University Research Classification in Computer Science", International Journal of Information Technology and Computer Science(IJITCS), Vol.18, No.1, pp.42-56, 2026. DOI:10.5815/ijitcs.2026.01.03

Reference

[1]Evi Sachini, Konstantinos Sioumalas-Christodoulou, Stefanos Christopoulos, Nikolaos Karampekios; AI for AI: Using AI methods for classifying AI science documents. Quantitative Science Studies 2022; 3(4): 1119–1132. https://doi.org/10.1162/qss_a_00223
[2]R. Vijaya Kumar Reddy, Dr. U. Ravi Babu; A Review on Classification Techniques in Machine Learning. International Journal of Advance Research in Science and Engineering, 2018, 7(3): 40–48
[3]Manmeet Kaur; A Comprehensive Overview of Artificial Intelligence-Based Classification Techniques. International Journal of Science and Research Archive, 2024, 11(2): 125–129. https://doi.org/10.30574/ijsra.2024.11.2.0387
[4]Sarker, I.H. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN COMPUT. SCI. 3, 158 (2022). https://doi.org/10.1007/s42979-022-01043-x
[5]Kotsiantis, S.B., Zaharakis, I.D. and Pintelas, P.E. Machine learning: a review of classification and combining techniques. Artif Intell Rev 26, 159–190 (2006). https://doi.org/10.1007/s10462-007-9052-3
[6]S. B. Kotsiantis. Supervised Machine Learning: A Review of Classification Techniques. Informatica 31 (2007) 249-268.
[7]J. Heidemann and C. Papdopoulos, "Uses and Challenges for Network Datasets," 2009 Cybersecurity Applications and Technology Conference for Homeland Security, Washington, DC, USA, 2009, pp. 73-82. https://doi:10.1109/CATCH.2009.29
[8]A. Jain, A. Singh and I. Jain, "Use of Machine Learning in Application of Artificial Intelligence," 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART), MORADABAD, India, 2021, pp. 29-34, https://doi:10.1109/SMART52563.2021.9676318
[9]Smadi, T. , Al Issa, H. , Trad, E. and Smadi, K. (2015) Artificial Intelligence for Speech Recognition Based on Neural Networks. Journal of Signal and Information Processing, 6, 66-72. https://doi:10.4236/jsip.2015.62006
[10]Shakya, Subarna. (2020). Analysis of Artificial Intelligence based Image Classification Techniques. Journal of Innovative Image Processing. 2. 44-54. https://doi:10.36548/jiip.2020.1.005
[11]I. Pesovski, A. M. Bogdanova and V. Trajkovik, "Systematic Review of the published Explainable Educational Recommendation Systems," 2022 20th International Conference on Information Technology Based Higher Education and Training (ITHET), Antalya, Turkey, 2022, pp. 1-8, https://doi:10.1109/ITHET56107.2022.10032029
[12]Uthman OA, Court R, Enderby J, Al-Khudairy L, Nduka C, Mistry H, Melendez-Torres GJ, Taylor-Phillips S, Clarke A. Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning. Health Technol Assess. 2022, https://doi:10.3310/UDIR6682
[13]Van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3, 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7
[14]Jia, J., & Wang, W. (2020). Review of reinforcement learning research. 2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC). https://doi:10.1109/yac51587.2020.9337653
[15]A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers [J]. IBM Journal. 1967, pp.601-617
[16]Watkins CJCH, Dayan P. Technical Note: Q-learning [J]. Machine Learning, 1992,pp.279-292
[17]Cai Wenlan, ”Research on Control Methods of Small Unmanned Helicopters Based on Reinforcement Learning”, National University of Defense Technology, 2007, pp.16-17
[18]Littman M L. Markov Games as a Frame work for Multi-Agent Reinforcement Learning [J]. Machine Leproceedings, 1997,pp.157-163
[19]Zhang Rubo, Zhou Ning, Gu Guochang, etc., ”Research on Intelligent Robot Collision Avoidance Method Based on Reinforcement Learning”, ”Robot”, 1999, pp.204-209
[20]Maja J.Mataric.Learning in behavior-based mufti-robot systems: policies, models, and other agents [J]. Cognitive Systems Research, 2001,pp.81-93
[21]Tan Min, Wang Shuo, Cao Zhiqiang, ”Multi-robot System”, Tsinghua University, 2005.
[22]C. Berner, G. Brockman, B. Chan, V. Cheung, P. D˛ebiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. d. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang, "Dota 2 with large scale deep reinforcement learning," arXiv preprint arXiv:1912.06680, 2019. https://arxiv.org/abs/1912.06680
[23]Oliver G. Selfridge, Richard S. Sutton, and Andrew G. Barto. Training and tracking in robotics. In Proceedings of the International Joint Conference on Artificial Intelligence, 1985.
[24]Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. of the International Conf. on Machine learning (ICML), 2018, https://doi.org/10.48550/arXiv.1801.01290
[25]John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. Trust region policy optimization. In Proc. of the International Conf. on Machine learning (ICML), 2015.
[26]Evan Greensmith, Peter L. Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. In Advances in Neural Information Processing Systems (NIPS), 2001.
[27]Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Proc. of the International Conf. on Machine learning (ICML), 2017.
[28]Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wuthrich, Stefan Bauer, Bernhard Schölkopf, and Georg Martius. Benchmarking offline reinforcement learning on real-robot hardware. In Proc. of the International Conf. on Learning Representations (ICLR), 2023.
[29]M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 1994.
[30]E. Even-Dar, S. M. Kakade, and Y. Mansour, “Experts in a Markov decision process,” in Advances in Neural Information Processing Systems, 2005, pp. 401–408.
[31]S. Mannor, D. Simester, P. Sun, and J. N. Tsitsiklis, “Bias and variance approximation in value function estimates,” Management Science, vol. 53, no. 2, pp. 308–322, 2007.
[32]G. N. Iyengar, “Robust dynamic programming,” Mathematics of Operations Research, vol. 30, no. 2, pp. 257–280, 2005.
[33]G. Neu, A. Gyorgy, C. Szepesvari, and A. Antos, “Online Markov decision processes under bandit feedback,” in Advances in Neural Information Processing Systems, 2010, pp. 1804–1812.
[34]R. I. Brafman and M. Tennenholtz, “R-max: A general polynomial time algorithm for near-optimal reinforcement learning,” Journal of Machine Learning Research, vol. 3, pp. 213–231, 2002.
[35]T. Jaksch, R. Ortner, and P. Auer, “Near-optimal regret bounds for reinforcement learning,” Journal of Machine Learning Research, vol. 99, pp. 1563–1600, 2010.
[36]A. Nilim and L. El Ghaoui, “Robust control of Markov decision processes with uncertain transition matrices,” Operations Research, vol. 53, no. 5, pp. 780–798, 2005.
[37]H. Xu and S. Mannor, “Distributionally robust Markov decision processes,” Mathematics of Operations Research, vol. 37, no. 2, pp. 288–300, 2012.
[38]S. Mannor, O. Mebel, and H. Xu, “Lightning does not strike twice: Robust MDPs with coupled uncertainty,” in International Conference on Machine Learning (ICML), 2012.
[39]J. Y. Yu, S. Mannor, and N. Shimkin, “Markov decision processes with arbitrary reward processes,” Mathematics of Operations Research, vol. 34, no. 3, pp. 737–757, 2009.
[40]Dixit, Avinash K. (1990). Optimization in Economic Theory (2nd ed.). Oxford University Press. p. 164. ISBN 0-19-877211-4.
[41]Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Prentice-Hall. p. 55. ISBN 0-13-638098-0
[42]I. Szcze´sniak and B. Wo´zna-Szcze´sniak, "Generic Dijkstra: correctness and tractability," NOMS 2023-2023 IEEE/I-FIP Network Operations and Management Symposium, Miami, FL, USA, 2023, pp. 1-7, doi:10.1109/NOMS56928.2023.10154322
[43]Kamien, Morton I.; Schwartz, Nancy L. (1991). Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management (Second ed.). Amsterdam: Elsevier. p. 261. ISBN 0-444-01609-0
[44]Jones, Morgan; Peet, Matthew M. (2020). "Extensions of the Dynamic Programming Framework: Battery Scheduling, Demand Charges, and Renewable Integration". IEEE Transactions on Automatic Control. 66 (4): 1602–1617. arXiv:1812.00792. doi:10.1109/TAC.2020.3002235
[45]Otterlo, Martijn & Wiering, Marco. (2012). Reinforcement Learning and Markov Decision Processes. Reinforcement Learning: State of the Art. 3-42. doi:10.1007/978-3-642-27645-3_1
[46]Kudo, T., Ohtsuki, T. Cell range expansion using distributed Q-learning in heterogeneous networks. J Wireless Com Network 2013, 61 (2013). https://doi.org/10.1186/1687-1499-2013-61
[47]Caputo, Michael R. (2005). Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications. New York: Cambridge University Press. p. 185. ISBN 0-521-60368-4.
[48]Weber, Thomas A. (2011). Optimal Control Theory : with Applications in Economics. Cambridge: The MIT Press. p. 82. ISBN 978-0-262-01573-8
[49]Corbae, Dean; Stinchcombe, Maxwell B.; Zeman, Juraj (2009). An Introduction to Mathematical Analysis for Economic Theory and Econometrics. Princeton University Press. p. 145. ISBN 978-0-691-11867-3.
[50]Kamien, Morton I.; Schwartz, Nancy L. (1991). Dynamic Optimization : The Calculus of Variations and Optimal Control in Economics and Management (2nd ed.). Amsterdam: North-Holland. p. 259. ISBN 0-444-01609-0.
[51]Kamalapurkar, Rushikesh; Walters, Patrick; Rosenfeld, Joel; Dixon, Warren (2018). "Optimal Control and Lyapunov Stability". Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Berlin: Springer. pp. 26–27. ISBN 978-3-319-78383-3.
[52]Zhou, X. Y. (1990). "Maximum Principle, Dynamic Programming, and their Connection in Deterministic Control". Journal of Optimization Theory and Applications. 65 (2): 363–373. doi:10.1007/BF01102352.S2CID122333807
[53]Shahzad, K. and Khan, S.A. (2023), "Effects of e-learning technologies on university librarians and libraries: a systematic literature review", The Electronic Library, Vol. 41 No. 4, pp. 528-554. https://doi.org/10.1108/EL-04-2023-0076
[54]D. van Heerden and M. Serote, "Exploring the Validity of Continuous Assessment in a First-Year Programming Course at a Comprehensive Open Distance e-Learning University," Proc. 22nd Eur. Conf. e-Learning - ECEL 2023, vol. 22, no. 1, 2023, https://doi:10.34190/ecel.22.1.1467
[55]A. Zabolotsky, “Method of the e-learning support system using as a development of informational and communicative competence of the university distance education center employees”, SR:PE, no. 1 (21), pp. 27–29, Jan. 2018.
[56]Phakathi, L. (2023). Technology-Based Support of Final Year Bachelor of Education Students in a South African Open Distance e-Learning Institution. International Journal on Open and Distance E-Learning, 9(1). https://doi.org/10.58887/ijodel.v9i1.104
[57]A. S. Abdalkafor, Z. J. Mushref and A. M. Khalaf, "E-Learning Technology Impact On the Development and Sustainability of Training Skills of Anbar University Graduates," 2023 16th International Conference on Developments in eSystems Engineering (DeSE), Istanbul, Turkiye, 2023, pp. 194-199, doi:10.1109/DeSE60595.2023.10469280
[58]H. T. Alrikabi, N. A. Jasim, B. H. . Majeed, A. A. . Zkear, and I. R. N. . ALRubeei, “Smart Learning based on Moodle E-learning Platform and Digital Skills for University Students”, Int. J. Recent Contrib. Eng. Sci. IT, vol. 10, no. 01, pp. 109–120, Mar. 2022.
[59]Özbey, M., Kayri, M. Investigation of factors affecting transactional distance in E-learning environment with artificial neural networks. Educ Inf Technol 28, 4399–4427 (2023). https://doi.org/10.1007/s10639-022-11346-4
[60]Mukhametgaliyeva, S., Gura, A., Dudnik, O., & Khudarova, A. (2022). The Use of Social Networks in E-Learning Technologies in the Context of Distance Education. Sustainability, 14(14), 8949. https://doi.org/10.3390/su14148949
[61]Yıldız, G., ¸Sahin, F., & Do˘gan, E. (2022). University students with special needs in the e-learning system: Characteristics, experiences and competencies. Anadolu Journal of Educational Sciences International, 12(2), 468-491. https://doi.org/10.18039/ajesi.1052854
[62]Rizky Windar Amelia, Djoko Suhardjanto, Agung Nur Probohudono, Setyaningtyas Honggowati, Weighted Index of Cultural Heritage Disclosure in Indonesia, Eduvest - Journal of Universal Studies: Vol. 2 No. 8 (2022): Journal Eduvest - Journal of Universal Studies
[63]Ardhana, Valian & Sapi’i, Muhammad & Hasbullah, Hasbullah & Sampetoding, Eliyah. (2022). Web-Based Library Information System Using Rapid Application Development (RAD) Method at Qamarul Huda University. The IJICS (International Journal of Informatics and Computer Science). 6. 43. https://doi.10.30865/ijics.v6i1.4031