Work place: Department of Educational Science, Ganesha University of Education (Undiksha), Bali, Indonesia
E-mail: Songmin@nbu.edu.cn
Website:
Research Interests:
Biography
Min Song was born in Zhejiang Province, China, in 1988. She received the M.S. degree in Marine Biology in 2015. After completing her master’s degree, she began working in higher education and has been engaged in teaching management. She is currently a Ph.D. candidate in Educational Science at Ganesha University of Education (Undiksha), Indonesia. She works in teaching management at the Faculty of Information Engineering, College of Science and Technology, Ningbo University, China. Her research interests include educational science, teaching management, classroom assessment, and multimodal approaches to analyzing students’ learning engagement.
By Min Song I Gusti Putu Sudiarta Putu Kerti Nitiasih Putu Nanci Riastini Zhang Wang Junyi Chai
DOI: https://doi.org/10.5815/ijmecs.2026.03.12, Pub. Date: 8 Jun. 2026
An accurate and comprehensive assessment of student engagement in classrooms is crucial for enabling data-driven teaching and personalized education. Current approaches primarily rely on teacher observation or student self-reports, which are often subjective, delayed, and unable to capture cognitive engagement. To address these limitations, this study proposes a Multimodal Cognitive-Attention Fusion (MCA Fusion) framework, grounded in Fredricks’ three-dimensional engagement model. The framework integrates electroencephalography (EEG), facial expressions, and body posture to simultaneously quantify cognitive, emotional, and behavioral engagement. Built on a Transformer architecture, it employs self-attention to extract temporal features within each modality and introduces a cognition-guided cross-attention mechanism to dynamically integrate multimodal signals. To validate the framework, experiments were conducted with 36 undergraduate students in real classroom settings. The results demonstrate that our framework significantly outperforms all single-modality baselines, achieving an accuracy of 92% and an F1-score of 94.87%. Compared with the best single-modality model (EEG), the F1-score improves by 34.58 percentage points. Ablation studies further confirm the critical role of the cognitive modality (EEG) and the MCA Fusion mechanism, the removal of which leads to F1-score reductions of 62.58 and 56.16 percentage points, respectively. The proposed approach not only provides a theoretically informed and technically evaluated framework for engagement recognition but also provides a methodological foundation for future closed-loop “perception–assessment–feedback” systems in intelligent learning environments.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals