AI vs. Human Writing: Developing a Novel Method for Text Authenticity Detection in Education

PDF (735KB), PP.45-58

Views: 0 Downloads: 0

Author(s)

Vijay H. Kalmani 1,* Amol C. Adamuthe 2 Arati Premnath Gondil 3 Vaishnavi Prashant Patil 3 Riya Amar Kore 3 Vaishnavi Mahadev Metkari 3

1. Department of Computer Science and Engineering, Rajarambapu Institute of Technology, Rajaramnagar, Affiliated to Shivaji University, Kolhapur, Maharashtra – 415414, India

2. Department of Information Technology, Kasegaon Education Society‟s, Rajarambapu Institute of Technology, Shivaji University, Sakharale, Maharashtra – 415414, India

3. Department of Information Technology, Rajarambapu Institute of Technology, Rajaramnagar, Affiliated to Shivaji University, Kolhapur, Maharashtra – 415414, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2025.03.04

Received: 5 Dec. 2024 / Revised: 11 Feb. 2025 / Accepted: 18 Mar. 2025 / Published: 8 Jun. 2025

Index Terms

AI Detection, Text Classification, Machine Learning, Naturalness Score, Logistic Regression, Academic Integrity, Large Language Models

Abstract

Rapid progress in generative artificial intelligence (AI) technologies has brought forth stupendous challenges in differentiating AI-written text from human text. The Naturalness Score, a composite measure that considers lexical diversity, syntactic complexity, sentiment variability, and grammatical faults, is a new idea that emerged from this study. The Naturalness score is part of a larger machine learning framework, although it does have an individual classifier called the Naturalness-Based Logistic Regression Classifier or NLRC. The NLRC model was analyzed against a large, diverse corpus of nearly 45,000 text samples, most of which were student essays, articles, and web-scraped content. The proposed model outperformed all existing baseline models with an accuracy of 96.41%, precision of 0.98, recall of 0.95, and F1 score of 0.96. The high areas under the receiver operating characteristic curve (AUC=1.00) and precision-recall curve (AUC-PR) also indicate the effectiveness of the model in differentiating AI generated from human-written text. The proposed approach offers several advantages including increased detection accuracy, resilience against AI-generated content, cross-domain applicability, and interpretability. The research has implications for applying such models in schools, although it also calls for future research on the implications of the rapidly changing landscape of AI-generated content which it states. It emphasizes the importance of these findings in developing robust and adaptive detection systems to ensure the integrity of academic assessments, thereby preventing the misuse of AI tools.

Cite This Paper

Vijay H. Kalmani, Amol C. Adamuthe, Arati Premnath Gondil, Vaishnavi Prashant Patil, Riya Amar Kore, Vaishnavi Mahadev Metkari, "AI vs. Human Writing: Developing a Novel Method for Text Authenticity Detection in Education", International Journal of Modern Education and Computer Science(IJMECS), Vol.17, No.3, pp. 45-58, 2025. DOI:10.5815/ijmecs.2025.03.04

Reference

[1]G. D. Fisk, “AI or Human? Finding and Responding to Artificial Intelligence in Student Work,” Teaching of Psychology, May 2024, doi: 10.1177/00986283241251855.
[2]R. Kumar and M. Mindzak, “Who wrote this?,” Canadian Perspectives on Academic Integrity, vol. 7, no. 1, Jan. 2024, doi: 10.55016/ojs/cpai.v7i1.77675.
[3]G. S. Silva et al., “Reviewer Experience Detecting and Judging Human versus Artificial Intelligence Content: The Stroke Journal Essay Contest,” Stroke, vol. 55, no. 10, pp. 2573–2578, Sep. 2024, doi: 10.1161/strokeaha.124.045012.
[4]M. A. Flitcroft et al., “Performance of artificial intelligence content detectors using human and Artificial Intelligence-Generated Scientific Writing,” Annals of Surgical Oncology, vol. 31, no. 10, pp. 6387–6393, Jun. 2024, doi: 10.1245/s10434-024-15549-6.
[5]H. Mondal, “Do I write like artificial intelligence?,” Annals of Surgical Oncology, Nov. 2024, doi: 10.1245/s10434-024-16480-6.
[6]V. Bellini, F. Semeraro, J. Montomoli, M. Cascella, and E. Bignami, “Between human and AI: assessing the reliability of AI text detection tools,” Current Medical Research and Opinion, vol. 40, no. 3, pp. 353–358, Jan. 2024, doi: 10.1080/03007995.2024.2310086.
[7]H. Alamleh, A. A. S. AlQahtani, and A. ElSaid, “Distinguishing Human-Written and ChatGPT-Generated Text Using Machine Learning,” in https://ieeexplore.ieee.org/document/10137767, Apr. 27, 2023. doi: 10.1109/sieds58326.2023.10137767.
[8]T. L. Dinh, T. D. Le, S. Uwizeyemungu, and C. Pelletier, “Human-Centered Artificial Intelligence in Higher Education: A framework for Systematic Literature reviews,” Information, vol. 16, no. 3, p. 240, Mar. 2025, doi: 10.3390/info16030240.
[9]I. Pentina, T. Xie, T. Hancock, and A. Bailey, “Consumer–machine relationships in the age of artificial intelligence: Systematic literature review and research directions,” Psychology and Marketing, vol. 40, no. 8, pp. 1593–1614, Jun. 2023, doi: 10.1002/mar.21853.
[10]E. Kasneci et al., “ChatGPT for good? On opportunities and challenges of large language models for education,” Learning and Individual Differences, vol. 103, p. 102274, Mar. 2023, doi: 10.1016/j.lindif.2023.102274.
[11]Y. K. Dwivedi et al., “Opinion Paper: ‘So what if ChatGPT wrote it?’ Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy,” International Journal of Information Management, vol. 71, p. 102642, Mar. 2023, doi: 10.1016/j.ijinfomgt.2023.102642.
[12]K. Alshahrani and R. J. Qureshi, “Review the Prospects and obstacles of AIEnhanced Learning Environments: The Role ofChatGPT in Education,” International Journal of Modern Education and Computer Science, vol. 16, no. 4, pp. 71–86, Jul. 2024, doi: 10.5815/ijmecs.2024.04.06.
[13]D. S. Pashchenko, “Early Formalization of AI-tools usage in software engineering in Europe: Study of 2023,” International Journal of Information Technology and Computer Science, vol. 15, no. 6, pp. 29–36, Dec. 2023, doi: 10.5815/ijitcs.2023.06.03.
[14]X.-Q. Dao and N.-B. Le, “LLMS performance on Vietnamese High School Biology Examination,” International Journal of Modern Education and Computer Science, vol. 15, no. 6, pp. 14–30, Nov. 2023, doi: 10.5815/ijmecs.2023.06.02.
[15]A. Svetasheva and K. Lee, “Harnessing large language models for effective and efficient hate speech detection,” Proceedings of the ... Annual Hawaii International Conference on System Sciences/Proceedings of the Annual Hawaii International Conference on System Sciences, Jan. 2024, doi: 10.24251/hicss.2024.826.
[16]X. Yang et al., “A Survey on Detection of LLMS-Generated Content,” arXiv (Cornell University), Jan. 2023, doi: 10.48550/arxiv.2310.15654.
[17]J. Cutler, L. Dugan, S. Havaldar, A. Stein, and Trustees of the University of Pennsylvania, “Automatic detection of hybrid Human-Machine text boundaries,” Trustees of the University of Pennsylvania.
[18]Y. Zhou, J. Wen, J. Jia, L. Gao, and Z. Zhang, “C-NET: a Compression-Based lightweight network for Machine-Generated text detection,” IEEE Signal Processing Letters, vol. 31, pp. 1269–1273, Jan. 2024, doi: 10.1109/lsp.2024.3394264.
[19]B. Koloski, N. Lavrač, B. Cestnik, S. Pollak, B. Škrlj, and A. Kastrin, “AHAM: Adapt, Help, Ask, Model -- Harvesting LLMs for literature mining,” arXiv (Cornell University), Jan. 2023, doi: 10.48550/arxiv.2312.15784.
[20]M. Sandler, H. Choung, A. Ross, and P. David, “A Linguistic Comparison between Human and ChatGPT-Generated   Conversations,” arXiv (Cornell University), Jan. 2024, doi: 10.48550/arxiv.2401.16587.
[21]Z. Gao, H. Wang, Y. Zhou, W. Zhu, and C. Zhang, “How far have we gone in vulnerability detection using large language models,” arXiv (Cornell University), Jan. 2023, doi: 10.48550/arxiv.2311.12420.
[22]M. Gallé, J. Rozen, G. Kruszewski, and H. Elsahar, “Unsupervised and distributional detection of Machine-Generated text,” arXiv (Cornell University), Jan. 2021, doi: 10.48550/arxiv.2111.02878.
[23]J. Su, C. Cardie, and P. Nakov, “Adapting fake news detection to the era of large language models,” arXiv (Cornell University), Jan. 2023, doi: 10.48550/arxiv.2311.04917.
[24]T. B. Sardinha, “AI-generated vs human-authored texts: A multidimensional comparison,” Applied Corpus Linguistics, vol. 4, no. 1, p. 100083, Dec. 2023, doi: 10.1016/j.acorp.2023.100083.
[25]J. Fleckenstein, J. Meyer, T. Jansen, S. D. Keller, O. Köller, and J. Möller, “Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays,” Computers and Education Artificial Intelligence, vol. 6, p. 100209, Jan. 2024, doi: 10.1016/j.caeai.2024.100209.
[26]Md. S. Salim and S. I. Hossain, “An Applied Statistics dataset for human vs AI-generated answer classification,” Data in Brief, vol. 54, p. 110240, Mar. 2024, doi: 10.1016/j.dib.2024.110240.
[27]K. Misiejuk, R. Kaliisa, and J. Scianna, “Augmenting assessment with AI coding of online student discourse: A question of reliability,” Computers and Education Artificial Intelligence, vol. 6, p. 100216, Mar. 2024, doi: 10.1016/j.caeai.2024.100216.
[28]H. Desaire, A. E. Chua, M.-G. Kim, and D. Hua, “Accurately detecting AI text when ChatGPT is told to write like a chemist,” Cell Reports Physical Science, vol. 4, no. 11, p. 101672, Nov. 2023, doi: 10.1016/j.xcrp.2023.101672.
[29]R. An, Y. Yang, F. Yang, and S. Wang, “Use prompt to differentiate text generated by ChatGPT and humans,” Machine Learning With Applications, vol. 14, p. 100497, Sep. 2023, doi: 10.1016/j.mlwa.2023.100497.
[30]Y. Li et al., “Can large language models write reflectively,” Computers and Education Artificial Intelligence, vol. 4, p. 100140, Jan. 2023, doi: 10.1016/j.caeai.2023.100140.
[31]A. Singh, D. Sharma, A. Nandy, and V. K. Singh, “Towards a large sized curated and annotated corpus for discriminating between human written and AI generated texts: A case study of text sourced from Wikipedia and ChatGPT,” Natural Language Processing Journal, vol. 6, p. 100050, Dec. 2023, doi: 10.1016/j.nlp.2023.100050.
[32]L. Berriche and S. Larabi-Marie-Sainte, “Unveiling ChatGPT text using writing style,” Heliyon, vol. 10, no. 12, p. e32976, Jun. 2024, doi: 10.1016/j.heliyon.2024.e32976.
[33]A. S. Xiao and Q. Liang, “Spam detection for Youtube video comments using machine learning approaches,” Machine Learning With Applications, vol. 16, p. 100550, Apr. 2024, doi: 10.1016/j.mlwa.2024.100550.
[34]D. Barman, Z. Guo, and O. Conlan, “The Dark Side of Language Models: Exploring the potential of LLMs in multimedia disinformation generation and dissemination,” Machine Learning With Applications, vol. 16, p. 100545, Mar. 2024, doi: 10.1016/j.mlwa.2024.100545.
[35]Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, “A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly,” High-Confidence Computing, vol. 4, no. 2, p. 100211, Mar. 2024, doi: 10.1016/j.hcc.2024.100211.
[36]A. De Wynter, X. Wang, A. Sokolov, Q. Gu, and S.-Q. Chen, “An evaluation on large language model outputs: Discourse and memorization,” Natural Language Processing Journal, vol. 4, p. 100024, Jul. 2023, doi: 10.1016/j.nlp.2023.100024.
[37]F. R. Elali and L. N. Rachid, “AI-generated research paper fabrication and plagiarism in the scientific community,” Patterns, vol. 4, no. 3, p. 100706, Mar. 2023, doi: 10.1016/j.patter.2023.100706.
[38]Y. Yan, P. Zheng, and Y. Wang, “Enhancing large language model capabilities for rumor detection with Knowledge-Powered Prompting,” Engineering Applications of Artificial Intelligence, vol. 133, p. 108259, Mar. 2024, doi: 10.1016/j.engappai.2024.108259.
[39]M. Bernabei, S. Colabianchi, A. Falegnami, and F. Costantino, “Students’ use of large language models in engineering education: A case study on technology acceptance, perceptions, efficacy, and detection chances,” Computers and Education Artificial Intelligence, vol. 5, p. 100172, Jan. 2023, doi: 10.1016/j.caeai.2023.100172.
[40]K. Gilhooly, “AI vs humans in the AUT: Simulations to LLMs,” Journal of Creativity, vol. 34, no. 1, p. 100071, Dec. 2023, doi: 10.1016/j.yjoc.2023.100071.
[41]Z. Rasool et al., “Evaluating LLMs on document-based QA: Exact answer selection and numerical extraction using CogTale dataset,” Natural Language Processing Journal, vol. 8, p. 100083, Jun. 2024, doi: 10.1016/j.nlp.2024.100083.
[42]A. H. Nasution and A. Onan, “ChatGPT Label: Comparing the quality of Human-Generated and LLM-Generated annotations in Low-Resource Language NLP tasks,” IEEE Access, vol. 12, pp. 71876–71900, Jan. 2024, doi: 10.1109/access.2024.3402809.
[43]H. Jiang, C. Liang, C. Wang, and T. Zhao, Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing, vol. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi: 10.18653/v1/2020.acl-main.165.
[44]M. Lee et al., “Evaluating Human-Language model interaction,” arXiv.org, Dec. 19, 2022. https://arxiv.org/abs/2212.09746
[45]Y. Liu et al., “ROBERTA: A robustly optimized BERT pretraining approach,” arXiv (Cornell University), Jan. 2019, doi: 10.48550/arxiv.1907.11692.
[46]V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv (Cornell University), Jan. 2019, doi: 10.48550/arxiv.1910.01108.
[47]Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A lite BERT for self-supervised learning of language representations,” arXiv (Cornell University), Jan. 2019, doi: 10.48550/arxiv.1909.11942.