Zhengbing Hu; Markiian-Mykhailo Paprotskyi; Victoria Vysotska; Lyubomyr Chyrun; Yuriy Ushenko; Dmytro Uhryn

Local Agentic RAG-Based Information System Development for Intelligent Analysis of GitHub Code Repositories in Computer Science Education

PDF (1230KB), PP.109-145

Views: 0 Downloads: 0

Author(s)

Zhengbing Hu ¹ Markiian-Mykhailo Paprotskyi ² Victoria Vysotska ^2,3 Lyubomyr Chyrun ⁴ Yuriy Ushenko ^5,* Dmytro Uhryn ⁵

1. School of Computer Science, Hubei University of Technology, Wuhan, China

2. Department of Information Systems and Networks, Lviv Polytechnic National University, Lviv, 79013, Ukraine

3. Osnabrück University, Osnabrück, 49076, Germany

4. Ivan Franko National University of Lviv, Lviv, 79000, Ukraine

5. Department of Computer Science, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2025.05.07

Received: 12 Apr. 2025 / Revised: 27 May 2025 / Accepted: 11 Jul. 2025 / Published: 8 Oct. 2025

Index Terms

Agentic RAG, Vector Search, Langchain, Github, Redis, LLM, Information Technology, Semantic Embedding, Code Analysis, Intelligent System

Abstract

This study presents the development and evaluation of a local agent-based Retrieval-Augmented Generation (Agentic RAG) system designed for the intelligent analysis of GitHub repositories in computer science education and IT practice. The novelty of this work lies not in inventing a new RAG algorithm, but in orchestrating multiple existing components (LangChain, Redis, SentenceTransformer, and LLMs) into a multi-stage agent pipeline with integrated relevance evaluation, specifically adapted to offline repository mining. The proposed pipeline consists of four sequential stages: (1) query reformulation by a dedicated LLM agent, (2) semantic retrieval using SentenceTransformer embeddings stored in Redis, (3) response generation by a second LLM, and (4) relevance scoring through a verification agent with retry logic. Relevance is assessed via cosine similarity and LLM-based scoring, allowing iterative refinement of answers. Experimental testing compared the system against two baselines: keyword search and a non-agentic single-stage RAG pipeline. Results showed an average MRR@10 of 0.72, compared to 0.48 for keyword search and 0.61 for non-agentic RAG, representing a 33% relative improvement in retrieval quality. Human evaluators (n=15, computer science students) rated generated explanations on a 5-point Likert scale; the proposed system achieved an average 4.3/5 for clarity and correctness, compared to 3.6/5 for the baseline. Precision@5 for code retrieval improved from 0.54 (keyword) and 0.67 (non-agentic RAG) to 0.76 in the proposed system. Average query latency in the local environment was 3.8 seconds, indicating acceptable performance for educational and small-team IT use cases. The system demonstrates high autonomy by operating fully on-premises with only optional API access to LLMs, ensuring privacy and independence from cloud providers. Ease of use was measured through a System Usability Scale (SUS) questionnaire, yielding a score of 78/100, reflecting positive user perception of the Streamlit interface and minimal setup requirements. Nevertheless, several limitations were observed: the high computational cost of running embeddings and LLMs locally, potential hallucinations in generated explanations (particularly for complex or unfamiliar code), and the inability of vector search to fully capture code syntax and control flow structures. Furthermore, while the Analytic Hierarchy Process (AHP) was applied to select the system architecture, future work should complement this with benchmark-driven evaluations for greater objectivity. The contribution of this study is threefold: (1) introducing a multi-agent orchestration logic tailored to educational code repositories; (2) empirically demonstrating measurable gains in retrieval quality and explanation usefulness over baselines; and (3) highlighting both opportunities and limitations of deploying autonomous RAG systems locally. The proposed technology can benefit IT companies seeking secure in-house tools for repository analysis, universities aiming to integrate intelligent assistants into programming courses, and research institutions requiring reproducible, privacy-preserving environments for code exploration.

Cite This Paper

Zhengbing Hu, Markiian-Mykhailo Paprotskyi, Victoria Vysotska, Lyubomyr Chyrun, Yuriy Ushenko, Dmytro Uhryn, "Local Agentic RAG-Based Information System Development for Intelligent Analysis of GitHub Code Repositories in Computer Science Education", International Journal of Modern Education and Computer Science(IJMECS), Vol.17, No.5, pp. 109-145, 2025. DOI:10.5815/ijmecs.2025.05.07

Reference

[1]P. Lewis, et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020. https://arxiv.org/abs/2005.11401, https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf
[2]T. Brown, et al., Language models are few-shot learners. Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
[3]E. Cipriano, A. Ferrato, C. Limongelli, D. Schicchi, and D. Taibi, Leveraging Large Language Models to Assist Teachers in Code Grading. In: A.I. Cristea, E. Walker, Y. Lu, O.C. Santos, S. Isotani (eds) Artificial Intelligence in Education. AIED 2025. Lecture Notes in Computer Science, vol. 15880. Springer, Cham, 2025. https://doi.org/10.1007/978-3-031-98459-4_15
[4]S. Yao, et al., Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv preprint, arXiv:2305.10601, 2023. https://arxiv.org/abs/2305.10601
[5]GPT Engineer. Open-source project for building software agents powered by LLMs, 2024. https://github.com/AntonOsika/gpt-engineer
[6]EduCoder, ChatCodeTutor. A minimalistic implementation of an LLM agent, 2025. https://github.com/Antropath/minimal-agent
[7]M. Chen, et al., Evaluating Large Language Models Trained on Code. arXiv preprint, arXiv:2107.03374, 2021. https://arxiv.org/abs/2107.03374
[8]E. Nijkamp, et al., CodeGen2: Lessons for Training LLMs on Programming and Natural Languages. arXiv preprint, arXiv:2305.02309, 2023. https://arxiv.org/abs/2305.02309
[9]Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, et al., CodeBERT: A pre-trained model for programming and natural languages. arXiv preprint, arXiv:2002.08155, 2020. https://arxiv.org/abs/2002.08155
[10]D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, et al., GraphCodeBERT: Pre-training code representations with data flow. arXiv preprint, arXiv:2009.08366, 2020. https://openreview.net/pdf?id=jLoC4ez43PZ, https://arxiv.org/abs/2009.08366
[11]OpenHands. OpenHands: Code Less, Make More, 2025. https://github.com/All-Hands-AI/OpenHands
[12]AutoGPT for Education. AutoGPT: Build, Deploy, and Run AI Agents, 2025. https://github.com/Significant-Gravitas/AutoGPT
[13]V. Vysotska, Computer linguistic system modelling for Ukrainian language processing. CEUR Workshop Proceedings, vol. 3722, pp. 288–342, 2024. https://ceur-ws.org/Vol-3722/paper18.pdf
[14]V. Vysotska, Computer linguistic system architecture for Ukrainian language content processing based on machine learning. CEUR Workshop Proceedings, vol. 3723, pp. 133–181, 2024. https://ceur-ws.org/Vol-3723/paper9.pdf
[15]V. Vysotska, Modern State and Prospects of Information Technologies Development for Natural Language Content Processing. CEUR Workshop Proceedings, vol. 3668, pp. 198–234, 2024. https://ceur-ws.org/Vol-3668/paper15.pdf
[16]V. Vysotska, Computer Linguistic Systems Design and Development Features for Ukrainian Language Content Processing. CEUR Workshop Proceedings, vol. 3668, pp. 229–271, 2024. https://ceur-ws.org/Vol-3688/paper18.pdf
[17]V. Vysotska, O. Markiv, S. Vladov, V. Sokurenko, L. Chyrun, and Y. Lytvynenko, Embedding Model for Low-Resource Carpathian Ruthenian Language as Western Ukrainian Dialect Based on NLP and Machine Learning. Proc. 2024 IEEE 17th Int. Conf. on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), pp. 1–6, IEEE, 2024. https://ieeexplore.ieee.org/abstract/document/10755854
[18]R. Fedchuk, and V. Vysotska, Mathematical model of a decision support system for identification and correction of errors in Ukrainian texts based on machine learning. CEUR Workshop Proceedings, vol. 4005, pp. 29–50, 2025. https://ceur-ws.org/Vol-4005/paper3.pdf
[19]GitHub Copilot. https://github.com/features/copilot.
[20]Cody by Sourcegraph. https://sourcegraph.com/cody.
[21]Tabnine: AI code assistant. https://www.tabnine.com/.
[22]LangChain Documentation. https://docs.langchain.com/.
[23]Redis Vector Similarity Search. https://redis.io/docs/interact/search-and-query/query/vectors/.
[24]Streamlit Documentation. https://docs.streamlit.io/.
[25]H. Husain, H.H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint, arXiv:1909.09436, 2019. https://arxiv.org/abs/1909.09436.
[26]D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, UniXcoder: Unified cross-modal pre-training for code representation. arXiv preprint, arXiv:2203.03850, 2022. https://arxiv.org/abs/2203.03850.
[27]L. Di Grazia, and M. Pradel, Code search: A survey of techniques for finding code. ACM Computing Surveys, vol. 55, no. 11, pp. 1–31, 2023. https://dl.acm.org/doi/10.1145/3565971.
[28]S. Setty, H. Thakkar, A. Lee, E. Chung, and N. Vidra, Improving retrieval for RAG based question answering models on financial documents. arXiv preprint, arXiv:2404.07221, 2024. https://arxiv.org/html/2404.07221v2.
[29]T. Liu, C. Xu, and J. McAuley, RepoBench: Benchmarking repository-level code auto-completion systems. arXiv preprint, arXiv:2306.03091, 2023. https://openreview.net/forum?id=pPjZIOuQuF.
[30]Leolty, RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024. https://github.com/Leolty/repobench.
[31]W. Cheng, Y. Wu, and W. Hu, Dataflow-guided retrieval augmentation for repository-level code completion. arXiv preprint, arXiv:2405.19782, 2024. https://aclanthology.org/2024.acl-long.431.pdf.
[32]R. Hu, C. Peng, J. Ren, B. Jiang, X. Meng, Q. Wu, et al., CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering. arXiv preprint, arXiv:2412.14764, 2024. https://arxiv.org/pdf/2412.14764.
[33]D. Wu, W.U. Ahmad, D. Zhang, M.K. Ramanathan, and X. Ma, Repoformer: Selective retrieval for repository-level code completion. arXiv preprint, arXiv:2403.10059, 2024. https://arxiv.org/abs/2403.10059.
[34]G. Ggerganov, LLM inference in C/C++. https://github.com/ggml-org/llama.cpp.
[35]Which Quantization Method Is Best for You?: GGUF, GPTQ, or AWQ. https://www.e2enetworks.com/blog/which-quantization-method-is-best-for-you-gguf-gptq-or-awq.
[36]M. Grootendorst, Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ). https://newsletter.maartengrootendorst.com/p/which-quantization-method-is-right.
[37]R. Devine, How to install and use Ollama to run AI LLMs locally on your Windows 11 PC. https://www.windowscentral.com/software-apps/how-to-install-and-use-ollama-to-run-ai-llms-on-your-windows-11-pc.
[38]B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X.E. Tan, et al., Code Llama: Open foundation models for code. arXiv preprint, arXiv:2308.12950, 2023. https://arxiv.org/pdf/2308.12950.
[39]R. Devine, This Common Mistake in Ollama Could Be Killing Your AI Performance in Windows 11 — Here’s How to Fix It. https://www.windowscentral.com/artificial-intelligence/mistake-with-ollama-on-windows-sucked-away-performance
[40]T.L. Saaty, Decision making with the analytic hierarchy process. International Journal of Services Sciences, vol. 1, no. 1, pp. 83–98, 2008. https://www.inderscienceonline.com/doi/abs/10.1504/IJSSCI.2008.017590
[41]I.B. Botchway, A.A. Emmanuel, N. Solomon, and A.B. Kayode, Evaluating software quality attributes using analytic hierarchy process (AHP). Int. J. of Advanced Computer Science and Applications, vol. 12, no. 3, 2021. https://thesai.org/Publications/ViewPaper?Volume=12&Issue=3&Code=IJACSA&SerialNo=21
[42]C.M. U-Dominic, J.C. Ujam, and N. Igbokwe, Applications of analytical hierarchy process (AHP) and knowledge management (KM) concepts in defect identification: a case of cable manufacturing. Asian Journal of Advanced Research and Reports, pp. 9–21, 2021. https://hal.science/hal-03384475v1/document#:~:text=The%20application%20of%20AHP%20begins,and%20to%20reduce%20process%20complexity
[43]UML Use Case Diagram. https://www.uml-diagrams.org/use-case-diagrams.html
[44]UML Class Diagram. https://www.uml-diagrams.org/class-diagrams.html.
[45]UML Sequence Diagram. https://www.uml-diagrams.org/sequence-diagrams.html.
[46]UML State Machine Diagram. https://www.uml-diagrams.org/state-machine-diagrams.html.
[47]UML Activity Diagram. https://www.uml-diagrams.org/activity-diagrams.html.
[48]UML Component Diagram. https://www.uml-diagrams.org/component-diagrams.html.
[49]N. Reimers, and I. Gurevych, Sentence-BERT: Sentence embeddings using Siamese BERT-networks. EMNLP 2019. arXiv:1908.10084.
[50]S. Wang, et al., MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. NeurIPS 2020. arXiv:2002.10957.
[51]Y. Wang, W. Wang, S. Joty, and S.C. Hoi, CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint, arXiv:2109.00859, 2021. https://arxiv.org/abs/2109.00859
[52]ISO/IEC 26514:2008. Systems and software engineering – Requirements for designers and developers of user documentation. International Organization for Standardization, 2008. https://www.iso.org/standard/43110.html.
[53]IEEE Std 1063-2001. Standard for Software User Documentation. Institute of Electrical and Electronics Engineers, 2001. https://standards.ieee.org/standard/1063-2001.html.
[54]S. Vladov, V. Jotsov, A. Sachenko, O. Prokudin, A. Ostapiuk, and V. Vysotska, Neural Network Method of Analysing Sensor Data to Prevent Illegal Cyberattacks. Sensors, vol. 25, no. 17, p. 5235, 2025. https://doi.org/10.3390/s25175235
[55]S. Vladov, L. Chyrun, E. Muzychuk, V. Vysotska, V. Lytvyn, T. Rekunenko, and A. Basko, Intelligent Method for Generating Criminal Community Influence Risk Parameters Using Neural Networks and Regional Economic Analysis. Algorithms, vol. 18, no. 8, p. 523, 2025. https://doi.org/10.3390/a18080523

International Journal of Modern Education and Computer Science (IJMECS)

MECS Press Journal