Performance Evaluation of Various Machine Learning Algorithms for User Story Clustering

PDF (757KB), PP.92-105

Views: 0 Downloads: 0

Author(s)

Bhawnesh Kumar 1,* Umesh Kumar Tiwari 1 Dinesh C. Dobhal 1

1. Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2025.03.07

Received: 23 Jul. 2024 / Revised: 3 Sep. 2024 / Accepted: 20 Oct. 2024 / Published: 8 Jun. 2025

Index Terms

User Story, Agile Development, Clustering, Standard Deviation, Silhouette Coefficient

Abstract

In agile development, user stories are the primary method for defining requirements. These days, managing user stories effectively is difficult because software projects typically contain a large number of them. A project can involve a large amount of user stories, which should be clustered into different groups based on their functionality’s similarity for systematic requirements analysis, effective mapping to developed features, and efficient maintenance. Unfortunately, the majority of user story clustering methods now in use require a great deal of manual work, which is error-prone and time-consuming. In this research, we suggest an automated framework that uses a family of machine learning algorithms to classify user stories. First, preprocessing the data is done in order to examine user stories and extract keywords from them. After that, features are taken out, which allow user stories to be automatically grouped into distinct categories. We use four feature extraction algorithms and six clustering algorithms. According to our experimental results, K-means and BIRCH clustering outperform other clustering methods, whereas cosine similarity and distance are the best feature extraction for user stories categorization to form the more balanced clusters as they both have the standard deviation is 3.08.  In case of user stories cohesion, the silhouette coefficient value is 0.225 for spectral with (cosine similarity and cosine distance feature extraction) is best outcome than other clustering algorithms. The usefulness and applicability of the suggested framework are demonstrated by this study. Additionally, it offers some useful recommendations for enhancing the effectiveness of user stories clustering, for example through parameter adjustments for enhanced feature extraction and clustering.

Cite This Paper

Bhawnesh Kumar, Umesh Kumar Tiwari, Dinesh C. Dobhal, "Performance Evaluation of Various Machine Learning Algorithms for User Story Clustering", International Journal of Modern Education and Computer Science(IJMECS), Vol.17, No.3, pp. 92-105, 2025. DOI:10.5815/ijmecs.2025.03.07

Reference

[1]S. Nazir, B. Price, N. C. Surendra, and K. Kopp, “Adapting agile development practices for hyper-agile environments: lessons learned from a COVID-19 emergency response research project,” Information Technology and Management, vol. 23, no. 3, pp. 193–211, Sep. 2022, doi: 10.1007/s10799-022-00370-y.
[2]B. Yang, X. Ma, C. Wang, H. Guo, H. Liu, and Z. Jin, “User story clustering in agile development: a framework and an empirical study,” Front Comput Sci, vol. 17, no. 6, Dec. 2023, doi: 10.1007/s11704-022-8262-9.
[3]Z. Masood, R. Hoda, and K. Blincoe, “How agile teams make self-assignment work: a grounded theory study,” Empir Softw Eng, vol. 25, no. 6, pp. 4962–5005, Nov. 2020, doi: 10.1007/s10664-020-09876-x. 
[4]T. Georges, L. Rice, M. Huchard, M. König, C. Nebut, and C. Tibermacine, “Guiding Feature Models Synthesis from User-Stories: An Exploratory Approach,” in Proceedings of the 17th International Working Conference on Variability Modelling of Software-Intensive Systems, New York, NY, USA: ACM, Jan. 2023, pp. 65–70. doi: 10.1145/3571788.3571797.
[5]N. Bik, G. Lucassen, and S. Brinkkemper, “A reference method for user story requirements in agile systems development,” Proceedings - 2017 IEEE 25th International Requirements Engineering Conference Workshops, REW 2017, no. September, pp. 292–298, 2017, doi: 10.1109/REW.2017.83.
[6]D Y. Wautelet, S. Heng, M. Kolp, and I. Mirbel, “Unifying and Extending User Story Models,” in International Conference on Advanced Information Systems Engineering, 2014, pp. 211–225.
[7]S. Dimitrijević, J. Jovanović, and V. Devedžić, “A comparative study of software tools for user story management,” Inf Softw Technol, vol. 57, pp. 352–368, Jan. 2015, doi: 10.1016/j.infsof.2014.05.012.
[8]C. Yang, X. Shi, L. Jie, and J. Han, “I Know You’ll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA: ACM, Jul. 2018, pp. 914–922. doi: 10.1145/3219819.3219821.
[9]Sarwosri, U. L. Yuhana, and S. Rochimah, “Identification of Conflicts in User Story Requirements Using The Clustering Algorithm,” in 2022 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), IEEE, Nov. 2022, pp. 1–5. doi: 10.1109/CENIM56801.2022.10037416.
[10]T. Kochbati, S. Li, S. Gérard, and C. Mraidha, “From user stories to models: A machine learning empowered automation,” MODELSWARD 2021 - Proceedings of the 9th International Conference on Model-Driven Engineering and Software Development, pp. 28–40, 2021, doi: 10.5220/0010197800280040.
[11]R. A. Putri, U. L. Yuhana, and T. C. Amri, “K-Means and Feature Selection Mechanism to Improve Performance of Clustering User Stories in Agile Development,” in 2023 International Conference on Modeling & E-Information Research, Artificial Learning and Digital Applications (ICMERALDA), IEEE, Nov. 2023, pp. 39–43. doi: 10.1109/ICMERALDA60125.2023.10458165.
[12]A. Mihelič, T. Hovelja, and S. Vrhovec, “Identifying Key Activities, Artifacts and Roles in Agile Engineering of Secure Software with Hierarchical Clustering,” Applied Sciences, vol. 13, no. 7, p. 4563, Apr. 2023, doi: 10.3390/app13074563.
[13]B. Yang, H. Guo, and H. Liu, “Evaluation and assessment of machine learning based user story grouping: A framework and empirical studies,” Sci Comput Program, vol. 227, Apr. 2023, doi: 10.1016/j.scico.2023.102943.
[14]F. H. Vera-Rivera, E. G. Puerto Cuadros, B. Perez, H. Astudillo, and C. Gaona, “SEMGROMI—a semantic grouping algorithm to identifying microservices using semantic similarity of user stories,” PeerJ Comput Sci, vol. 9, p. e1380, May 2023, doi: 10.7717/peerj-cs.1380.
[15]A. Jarzebowicz and P. Weichbroth, “A Qualitative Study on Non-Functional Requirements in Agile Software Development,” IEEE Access, vol. 9, pp. 40458–40475, 2021, doi: 10.1109/ACCESS.2021.3064424.
[16]M. A. Kuhail and S. Lauesen, “User Story Quality in Practice: A Case Study,” Software, vol. 1, no. 3, pp. 223–243, Jun. 2022, doi: 10.3390/software1030010.
[17]F. Dalpiaz, “Requirements data sets (user stories),” in Mendeley Data, V1, 2018. doi: 10.17632/7zbk8zsd8y.1.
[18]V. A. Kozhevnikov and E. S. Pankratova, “RESEARCH OF THE TEXT DATA VECTORIZATION AND CLASSIFICATION ALGORITHMS OF MACHINE LEARNING.,” Theoretical & Applied Science, vol. 85, no. 05, pp. 574–585, May 2020, doi: 10.15863/TAS.2020.05.85.106.
[19]K. Sparck Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of documentation, vol. 28, pp. 11–21, 1972.
[20]A. R. Lahitani, A. E. Permanasari, and N. A. Setiawan, “Cosine similarity to determine similarity measure: Study case in online essay assessment,” in 2016 4th International Conference on Cyber and IT Service Management, IEEE, Apr. 2016, pp. 1–6. doi: 10.1109/CITSM.2016.7577578.
[21]B. Li and L. Han, “Distance Weighted Cosine Similarity Measure for Text Classification,” in International Conference on Cyber and IT Service Management, 2013, pp. 611–618. doi: 10.1007/978-3-642-41278-3_74.
[22]Y. Li and H. Wu, “A Clustering Method Based on K-Means Algorithm,” Phys Procedia, vol. 25, pp. 1104–1109, 2012, doi: 10.1016/j.phpro.2012.03.206.
[23]W. Y. E, A. Jian, L. Yan, and W. HongGang, “Optimization of K-medoids Algorithm for Initial Clustering Center,” J Phys Conf Ser, vol. 1487, no. 1, p. 012011, Mar. 2020, doi: 10.1088/1742-6596/1487/1/012011.
[24]D. Müllner, “Modern hierarchical, agglomerative clustering algorithms,” arXiv:1109.2378v1, 2011.
[25]T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: A New Data Clustering Algorithm and Its Applications,” Data Min Knowl Discov, vol. 1, no. 2, pp. 141–182, 1997, doi: 10.1023/A:1009783824328.
[26]U. von Luxburg, “A Tutorial on Spectral Clustering,” Stat Comput, vol. 17, no. 4, 2007.
[27]A. Feizollah, N. B. Anuar, R. Salleh, and F. Amalina, “Comparative study of k-means and mini batch k-means clustering algorithms in android malware detection using network traffic analysis,” in 2014 International Symposium on Biometrics and Security Technologies (ISBAST), IEEE, Aug. 2014, pp. 193–197. doi: 10.1109/ISBAST.2014.7013120.
[28]K. R. Shahapure and C. Nicholas, “Cluster Quality Analysis Using Silhouette Score,” in IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020. doi: 10.1109/DSAA49011.2020.00096.