Work place: Department of Information Technology, Politeknik Caltex Riau, Pekanbaru, Indonesia
E-mail: dadang@pcr.ac.id
Website: https://orcid.org/0000-0002-5398-8548
Research Interests:
Biography
Dadang Syarif Sihabudin Sahid received a bachelor’s degree in mathematics from Bandung Institute of Technology, Indonesia, in 1999. The M.Sc. degree in Information Technology was earned at the Universiti Teknologi Malaysia in 2009. Doctoral holder from Universitas Gadjah Mada, Indonesia, in 2018. He has been a lecturer at the Information Technology Department of Politeknik Caltex Riau since July 1st, 2000. He carries out three main activities as a lecturer: teaching, research, and community service. He teaches in Information Systems, Application and Systems Development, IT Project Management, Information Technology Concepts, and IT strategic planning. The research areas focused on IT Project Management, Soft Computing, Context-Aware Computing, and IT Planning. He received many research grants abroad, including collaboration grants and grants from the Higher Education Ministry. He also has experience as deputy director and director at the Politeknik Caltex Riau.
By Muhammad Ihsan Zul Suhaila Mohd. Yasin Ivan Chatisa Fikri Muhaffizh Imani Siti Syahidatul Helma Dadang Syarif Sihabudin Sahid
DOI: https://doi.org/10.5815/ijmecs.2026.02.05, Pub. Date: 8 Apr. 2026
User stories are essential in agile software development for capturing software requirements, yet concerns over their quality persist globally. While prior studies have evaluated user story quality using practitioners and artificial intelligence, they primarily focus on general settings. This study addresses a gap by evaluating the quality of student-generated user stories in an educational context, specifically in Indonesia. The objective of this study is to compare evaluations by human evaluators and ChatGPT using the Quality User Story (QUS) Framework and evaluate the quality of the student-generated user story compared to the global studies. A total of 951 user stories from 103 student software projects were analyzed. Evaluations were conducted by three human evaluators and ChatGPT (GPT-4o). Percentage Agreement and Cohen’s Kappa measured inter-rater agreement, while the McNemar Test assessed statistical significance, and effect sizes were examined using Cohen’s g. Results show generally high agreement between human and ChatGPT evaluations, but lower consistency in several criteria, such as Conceptually Sound, Independent, and Unambiguous. Only four of the thirteen criteria—Conflict-Free, Unique, Well-Formed, and Atomic—showed no significant differences. Most criteria showed small to medium effect sizes, whereas Complete exhibited a large practical difference. Common quality issues among students included Uniform, Independent, and Complete (set criteria), Atomic, Conceptually Sound, and Unambiguous (individual criteria), with overlap observed in global studies. This study shows that ChatGPT can support user story evaluation in educational settings when guided by clear rubrics and validated by humans. It also offers practical insights for educators by identifying criteria that require stronger emphasis in teaching, particularly in software engineering education in Indonesia.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals