A Review on Student Attrition in Higher Education Using Big Data Analytics and Data Mining Techniques

Full Text (PDF, 597KB), PP.1-14

Views: 0 Downloads: 0


Syaidatus Syahira Ahmad Tarmizi 1,* Sofianita Mutalib 1 Nurzeatul Hamimah Abdul Hamid 1 Shuzlina Abdul Rahman 1

1. Faculty of Computer and Mathematical Sciences, Universiti Teknologi Mara, 40450 Shah Alam, Selangor, Malaysia

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2019.08.01

Received: 15 Apr. 2019 / Revised: 1 May 2019 / Accepted: 25 May 2019 / Published: 8 Aug. 2019

Index Terms

Student attrition, higher education, big data analytics, data mining


Student attrition among undergraduate students is among the most concerned issues in higher educational institutions in Malaysia and abroad. This problem arises when these students unable to complete their studies within the stipulated period when there are majoring in the Science, Technology, Engineering, and Mathematics (STEM) fields. Research findings highlight numerous factors contribute to the student attrition. These findings also suggest that the factors differ from one case to another case. Effects of student attrition not only for the student itself but also to the institutions and community. It is challenging to classify the factors based on general assumptions. Moreover, increasing students’ information makes the problem more complicated. This student information can provide a useful database for analytical analysis. Methods such as big data analytics and data mining techniques can be deployed to gain insights and pattern that related to student attrition problem. The objective of this paper (i) review the student attrition in higher education (HE) and the contributing factors; and (ii) review the existing computational model to analyze and predict student attrition in HE.

Cite This Paper

Syaidatus Syahira Ahmad Tarmizi, Sofianita Mutalib, Nurzeatul Hamimah Abdul Hamid, Shuzlina Abdul Rahman, "A Review on Student Attrition in Higher Education Using Big Data Analytics and Data Mining Techniques", International Journal of Modern Education and Computer Science(IJMECS), Vol.11, No.8, pp. 1-14, 2019.DOI: 10.5815/ijmecs.2019.08.01


[1]“Kementerian Pendidikan Malaysia. Malaysia Education Blueprint 2015-2025 (Higher Education)”, 2015. Retrieved from http://www.mohe.gov.my/en/download/awam/penerbitan/p ppm-2015-2025-pt/5-malaysia-education-blueprint-2015- 2025-higher-education/file
[2]“Tackling lack of interest in STEM subjects”. (2017). Malaysia Education Hub. Retrieved from http://www.edumsia.my/article/tackling-lack-of-interest-in- stem-subjects
[3]Chen, Y., Johri, A., & Rangwala, H., “Running out of STEM: A Comparative Study across STEM Majors of College Students At-Risk of Dropping Out Early”, In Proceedings of the 8th International Conference on Learning Analytics and Knowledge, pp. 270–279, 2018.
[4]Sangodiah, A., Beleya, P., Muniandy, M., Heng, L. E., & Spr, C. R., “Minimizing Student Attrition in Higher Learning Institutions in Malaysia Using Support Vector Machine”, Journal of Theoretical and Applied Information Technology, Vol. 71, No. 3, pp. 377–385, 2015.
[5]Ryan, P., & Greig, G., “Student Attrition and Completion: What is it that we are Actually Trying to Measure?”, pp. 1– 7, 2017.
[6]Alom, B. M. M., & Courtney, M., “Educational Data Mining: A Case Study Perspectives from Primary to University Education in Australia”, International Journal of Information Technology and Computer Science, Vol. 10, No.2, pp. 1–9, 2018. https://doi.org/10.5815/ijitcs.2018.02.01
[7]Shariffuddin, S. A., Razali, J. R., Ghani, M. A., Shaaidi, W. R. W., & Ibrahim, I. S. A., “Transformation of Higher Education Institutions in Malaysia: A Review”, Journal of Global Business and Social Entrepreneurship (GBSE), Vol. 1, No. 2, pp. 126–136, 2017.
[8]Crosling, G., “Quality Assurance and Quality Enhancement in Malaysian Higher Education”, The Rise of Quality Assurance in Asian Higher Education, pp. 127–141, 2017. DOI:10.1016/b978-0-08-100553-8.00004-5
[9]Azman, N., Omar, I. C., Yunus, A. S. M., & Zain, N. M., “Academic Promotion in Malaysian Public Universities: A Critical Look at Issues and Challenges”, Oxford Review of Education, Vol. 42, No. 1, pp. 71–88, 2016. DOI:10.1080/03054985.2015.1135114
[10]Mokhtar, R., Abdul Rahman, A., & Othman, S. H., “Academic Quality Assurance Metamodel Knowledge Repository as a Quality Monitoring Mediator”, 2016 IEEE Conference on E-Learning, E-Management and E-Services (IC3e), pp. 176–181, 2016.
[11]Migin, M. W., Falahat, M., Yajid, M. S. A., & Khatibi, A., “Impacts of Institutional Characteristics on International Students’ Choice of Private Higher Education Institutions in Malaysia”, Higher Education Studies, Vol. 5, No. 1, pp. 31–42, 2015. DOI:10.5539/hes.v5n1p31
[12]Adusei-asante, K., & Doh, D., “Students’ Attrition and Retention in Higher Education: A Conceptual Discussion”, pp. 1–10, 2016.
[13]Beer, C., & Lawson, C., “The Problem of Student Attrition in Higher Education: An Alternative Perspective”, Journal of Further and Higher Education, Vol. 41, No. 6, pp. 773– 784, 2016. DOI:10.1080/0309877X.2016.1177171
[14]Martins, L. C. B., Carvalho, R. N., Carvalho, R. S., Victoria, M. C., & Holanda, M., “Early Prediction of College Attrition Using Data Mining”, In Machine Learning and Applications (ICMLA), pp. 1075–1078, 2017. DOI:10.1109/ICMLA.2017.000-6
[15]Viale Tudela, E. H., “A Theoretical Approach to The College Student Drop Out”, Revista Digital de Investigación En Docencia Universitaria (RIDU), Vol. 8, No. 1, pp. 59– 74, 2014.
[16]Chai, K. E., & Gibson, D., “Predicting The Risk of Attrition for Undergraduate Students with Time Based Modelling”, International Association for Development of the Information Society, pp. 109–116, 2015.
[17]Hoffait, A., & Schyns, M., “Early detection of university students with potential difficulties”, Decision Support Systems, pp. 1–37, 2017. DOI:10.1016/j.dss.2017.05.003
[18]Kang, K., & Wang, S., “Analyze and Predict Student Dropout from Online Programs”, In Proceedings of the 2nd International Conference on Compute and Data Analysis, pp. 6–12, 2018.
[19]Mansour, E. A., Gemeay, E. M., Behilak, S., & Albarrak, M., “Factors Affecting Attrition Rate Among Nursing Students College of Health Sciences, Taibah University, Saudi Arabia”, International Journal of Nursing, Vol. 3, No. 1, pp. 65–72, 2016. DOI:10.15640/ijn.v3n1a8
[20]Taipe, M. A., & Mauricio, D., “Predicting University Dropout through Data Mining: A Systematic Literature”, Indian Journal of Science and Technology, Vol. 12, No. 4, pp. 1–12, 2019. DOI:10.17485/ijst/2019/v12i4/139729
[21]Almarabeh, H., “Analysis of Students’ Performance by Using Different Data Mining Classifiers”, International Journal of Modern Education and Computer Science, Vol. 9, No. 8, pp. 9–15, 2017. DOI:10.5815/ijmecs.2017.08.02
[22]Chatterjee, S., & Jose, P. G., “Text Classification Using SVM Enhanced by Multithreading and CUDA”, International Journal of Modern Education and Computer Science, Vol. 11, No. 1, pp. 11–23, 2019. DOI:10.5815/ijmecs.2019.01.02
[23]Christo, Z., & Oyinlade, A. O., “Factors of Student Attrition at an Urban University”, International Journal of Humanities and Social Science, Vol. 5, No. 9, pp. 9–22, 2015.
[24]Yukselturk, E., Ozekes, S., & Turel, Y. K., “Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program”, European Journal of Open, Distance and E-Learning, Vol. 17, No. 1, pp. 118–133, 2014.
[25]Boton, E. C., & Gregory, S., “Minimizing Attrition in Online Degree Courses”, Journal of Educators Online, Vol. 12, No. 1, pp. 62–90, 2015.
[26]Mortagy, Y., Boghikian-Whitby, S., & Helou, I., An “Analytical Investigation of the Characteristics of the Dropout Students in Higher Education”, Issues in Informing Science and Information Technology Education, Vol. 15, pp. 249–278, 2018.
[27]Paura, L., & Arhipova, I., “Cause Analysis of Students’ Dropout Rate in Higher Education Study Program”, Procedia - Socail and Bahavioral Sciences, Vol. 109, pp. 1282–1286, 2014. DOI:10.1016/j.sbspro.2013.12.625
[28]Tiwari, S., Wee, H. M., & Daryanto, Y., “Big Data Analytics in Supply Chain Management Between 2010 And 2016: Insights to industries”, Computers & Industrial Engineering, Vol. 115, pp. 319–330, 2018.
[29]El-mouadib, F. A., & El-majressi, A. O., “A Study of Multilevel Association Rule Mining”, pp. 1–6, 2014.
[30]Sandeep, Chauhan, S. K., Reema, & Sangwan, S., “Big data analytics”, International Journal on Recent and Innovation Trends in Computing and Communication, Vol. 5, No. 4, pp. 407–410, 2017.
[31]Oussous, A., Benjelloun, F.-Z., Lahcen, A. A., & Belfkih, S., “Big Data Technologies: A survey”, Journal of King Saud University – Computer and Information Sciences, Vol. 30, No. 4, pp. 431–448, 2018.
[32]Wang, Y., Kung, L., & Anthony, T., “Big data Analytics: Understanding its Capabilities and Potential Benefits for Healthcare Organizations”, Technological Forecasting & Social Change, pp. 1–11, 2016.
[33]Shoro, A. G., & Soomro, T. R., “Big Data Analysis: Ap Spark Perspective”, Global Journal of Computer Science and Technology: C Software & Data Engineering, Vol. 15, No. 1, pp. 1–9, 2015.
[34]Das, N., Das, L., Rautaray, S. S., & Pandey, M., “Big Data Analytics for Medical Applications”, International Journal of Modern Education and Computer Science, Vol. 10, No. 2, pp. 35–42, 2018. DOI:10.5815/ijmecs.2018.02.04
[35]Vassakis, K., Petrakis, E., & Kopanakis, I., “Big Data Analytics: Applications, prospects and challenges”, In Mobile Big Data, pp. 3–20, 2018.
[36]Kaur, H., & Phutela, A., “Commentary Upon Descriptive Data Analytics”, In 2018 2nd International Conference on Inventive Systems and Control (ICISC), pp. 678–683, 2018.
[37]Gandomi, A., & Haider, M., “Beyond the hype: Big Data Concepts, Methods, and Analytics”, International Journal of Information Management, Vol. 35, pp. 137–144, 2015. DOI:10.1016/j.ijinfomgt.2014.10.007
[38]Radha, D., Jayaparvathy, R., & Jyothirmayi Bhargavi, A., “A Novel Approach to Analyze Students’ Expectation from Colleges Using Data Mining Technique”, International Journal of Computer Applications, Vol. 137, pp. 25–28, 2016.
[39]Soltanpoor, R., & Sellis, T., “Prescriptive Analytics for Big Data”, Database Theory and Applications, pp. 245–256, 2016. DOI:10.1007/978-3-319-46922-5
[40]Kaur, P., Singh, M., & Josan, G. S., “Classification and Prediction Based Data Mining Algorithms to Predict Slow Learners in Education Sector”, Procedia - Procedia Computer Science, Vol. 57, pp. 500–508, 2015. DOI:10.1016/j.procs.2015.07.372
[41]Rehman, N., “Data Mining Techniques Methods Algorithms and Tools”, International Journal of Computer Science and Mobile Computing, Vol. 6, No. 7, pp. 227–231, 2017.
[42]Gulati, H., “Predictive Analytics Using Data Mining Technique”, In Computing for Sustainable Global Development (INDIACom), pp. 713–716, 2015.
[43]Asif, R., Merceron, A., & Pathan, M. K., “Predicting Student Academic Performance at Degree Level: A Case Study”, International Journal Intelligent Systems and Applications, Vol. 7, No. 1, pp. 49–61, 2015. DOI:10.5815/ijisa.2015.01.05
[44]Kumar, S. A., & Vijayalakshmi, M. N., “Efficiency of Multi-Instance Learning in Educational Data Mining”, Knowledge Computing and Its Applications, pp. 47–64, 2018. DOI:10.1007/978-981-10- 8258-0_3
[45]Houari, R., Bounceur, A., Tari, A., & Kechadi, M., “Handling Missing Data Problems with Sampling Methods”, 2014 International Conference on Advanced Networking Distributed Systems and Applications, pp. 1–6, 2014. DOI:10.1109/INDS.2014.25
[46]Sumantri, R. B. B., & Utami, E., “Determination of Status of Family Stage Prosperous of Sidareja District Using Data Mining Techniques”, International Journal Intelligent Systems and Applications, Vol. 10, No. 10, pp. 1–10, 2018. DOI:10.5815/ijisa.2018.10.01
[47]Yusof, N. N., Mohamed, A., & Abdul rahman, S., “Reviewing Classification Approaches in Sentiment Analysis”, In International Conference on Soft Computing in Data Science, pp. 43–53, 2015. DOI:10.1007/978-981-287-936-3
[48]Satı, N. U., “Semi-Supervised Classification in Educational Data Mining: Students’ Performance Case Study”, International Journal of Computer Applications, Vol. 179, No. 26, pp. 13–17, 2018. DOI:10.5120/ijca2018916549
[49]Khedr, A. E., Salama, S. E., & Yaseen, N., “Predicting Stock Market Behavior Using Data Mining Technique and News Sentiment Analysis”, International Journal Intelligent Systems and Applications, Vol. 9, No. 7, pp. 22–30, 2017. DOI:10.5815/ijisa.2017.07.03
[50]Najdi, L., & Er-Raha, B., “A Novel Predictive Modeling System to Analyze Students at Risk of Academic Failure”, International Journal of Computer Applications, Vol. 156, No. 6, pp. 25–30, 2016.
[51]Gavrilovski, A., Jimenez, H., Mavris, D., Rao, A., Shin, S.- H., Hwang, I., & Marais, K., “Challenges and Opportunities in Flight Data Mining: A Review of the State of the Art”, In AIAA Infotech@ Aerospace, pp. 1–18, 2016. https://doi.org/10.2514/6.2016-0923
[52]Saa, A. A., “Educational Data Mining & Students’ Performance Prediction”, International Journal of Advanced Computer Science and Applications, Vol. 7, No. 5, pp. 212–220, 2016.
[53]Dey, L., Chakraborty, S., Biswas, A., Bose, B., & Tiwari, S., “Sentiment Analysis of Review Datasets using Naïve Bayes’ and K-NN Classifier”, International Journal of Information Engineering and Electronic Business, Vol. 8, No. 4, pp. 54–62, 2016. DOI:10.5815/ijieeb.2016.04.07
[54]Adeniyi, D. A., Wei, Z., & Yongquan, Y., Automated Web Usage Data Mining and Recommendation System using K- Nearest Neighbor (KNN) Classification Method. Applied Computing and Informatics, Vol. 12, pp. 90–108, 2016.
[55]Bilal, M., Israr, H., Shahid, M., & Khan, A., “Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques”, Journal of King Saud University – Computer and Information Sciences, Vol. 28, No. 3, pp. 330–344, 2016.
[56]Oyedotun, O. K., Tackie, S. N., Olaniyi, E. O., & Khashman, A., “Data Mining of Students’ Performance: Turkish Students as a Case Study”, International Journal Intelligent Systems and Applications, Vol. 7, No. 9, pp. 20– 27, 2015. DOI:10.5815/ijisa.2015.09.03
[57]Rahman, N. A. A., Tan, K. L., & Lim, C. K., “Supervised and Unsupervised Learning in Data Mining for Employment Prediction of Fresh Graduate Students”, Journal of Telecommunication, Electronic and Computer Engineering, Vol. 9, No. 2, pp. 155–161, 2017.
[58]Zavvar, M., Rezaei, M., & Garavand, S., “Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine”, International Journal of Modern Education and Computer Science, Vol. 8, No. 7, pp. 68–74, 2016. DOI:10.5815/ijmecs.2016.07.08
[59]Najafabadi, M. K., Mohamed, A., & Mahrin, M. N., “A Survey on Data Mining Techniques in Recommender Systems”, Soft Computing, pp. 1–28, 2017. DOI:10.1007/s00500-017-2918-7
[60]Mehrotra, D., Srivastava, R., Nagpal, R., & Nagpal, D., “Multiclass Classification of Mobile Applications as per Energy Consumption”, Journal of King Saud University – Computer and Information Sciences, pp. 1–9, 2018.
[61]Hasbun, T., Araya, A., & Villalon, J., “Extracurricular Activities as Dropout Prediction Factors in Higher Education using Decision Trees”, 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT), pp. 242–244, 2016. DOI:10.1109/ICALT.2016.66
[62]He, L., Levine, R. A., Fan, J., Beemer, J., & Stronach, J., “Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research”, Practical Assessment, Research & Evaluation, Vol. 23, No. 1, pp. 1–16, 2018.
[63]Shariff, S. S. R., Mohd Rodzi, N. A., Abdul Rahman, K., Zahari, S. M., & Mohd Deni, S., “Predicting the “Graduate on Time (GOT)” Of PhD Students Using Binary Logistics Regression Model”, The 4th International Conference on Quantitative Sciences and Its Applications (ICOQSIA 2016), pp. 1–8, 2016. DOI:10.1063/1.4966105
[64]Hussain, A., & Cambria, E., “Semi-supervised Learning for Big Social Data Analysis”, Neurocomputing, Vol. 275, pp. 1662–1673, 2018.
[65]Zacharis, N. Z., “Classification and regression trees (CART) for predictive modeling in blended learning”, International Journal Intelligent Systems and Applications, Vol. 10, No. 3, pp. 1–9, 2018. DOI:10.5815/ijisa.2018.03.01
[66]Ramadas, M., Abraham, A., & Kumar, S., “FSDE-Forced Strategy Differential Evolution used for Data Clustering”, Journal of King Saud University – Computer and Information Sciences, pp. 1–11, 2016.
[67]Rawat, B., “Analyzing the Performance of Various Clustering Algorithms”, International Journal of Modern Education and Computer Science, Vol. 11, No. 1, pp. 45– 53, 2019. DOI:10.5815/ijmecs.2019.01.06.
[68]Ahuja, R., Jha, A., Maurya, R., & Srivastava, R., “Analysis of Educational Data Mining”, In Harmony Search and Nature Inspired Optimization Algorithms, pp. 897–907, 2019.
[69]Anand, V. K., Rahiman, S K, A., Ben George, E., & Huda, A. S., “Recursive Clustering Technique for Students’ Performance Evaluation in Programming Courses”, 2018 Majan International Conference (MIC), pp. 1–5, 2018.
[70]Alsmadi, I., & Alhami, I., “Clustering and Classification of Email Contents”, Journal of King Saud University – Computer and Information Sciences, Vol. 27, No. 1, pp. 46–57, 2015.
[71]Oeda, S., & Hashimoto, G., “Log-Data Clustering Clustering Analysis Analysis for Dropout Prediction in Beginner Programming Classes”, Procedia Computer Science, Vol. 112, pp. 614–621, 2017.
[72]Karkhanis, S. P., & Dumbre, S. S., “A Study of Application of Data Mining and Analytics in Education Domain”, International Journal of Computer Application, Vol. 120, No. 22, pp. 23–29, 2015.
[73]Mohd Yakop, M. A., Mutalib, S., & Abdul Rahman, S., “Review of Frequent Itemsets Mining in High Dimensional Dataset”, International Conference on Artificial Intelligence with Applications in Engineering and Technology, pp. 57–62, 2014. DOI:10.1109/ICAIET.2014.19
[74]Bakhshinategh, B., Zaiane, O. R., Elatia, S., & Ipperciel, D., “Educational Data Mining Applications and Tasks: A Survey of the Last 10 Years”, Education and Information Technologies, Vol. 23, No. 1, pp. 537–553, 2017. DOI:10.1007/s10639-017-9616-z
[75]Hegde, V., & Prageeth, P. P., “Higher Education Student Dropout Prediction and Analysis through Educational Data Mining”, Proceedings of the Second International Conference on Inventive Systems and Control (ICISC 2018), pp. 694–699, 2018.
[76]Mittal, V., & Anuradha., “A Real Time Data Mining Model to Predict Academic Attrition”, International Journal for Research in Science Engineering & Technology, Vol. 4, No. 7, pp. 46–54, 2017.
[77]Slater, S., Joksimovic, S., Kavanovic, V., Baker, R. S., & Gasevic, D., “Tools for Educational Data Mining: A Review”, Journal of Educational and Behavioral Statistics, Vol. 42, No. 1, pp. 85–106, 2016. DOI:10.3102/1076998616666808
[78]Jovic, A., Brkic, K., & Bogunovic, N., “An Overview of Free Software Tools for General Data Mining”, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), No. 1112-1117, pp. 26–30, 2014.
[79]Rajeswari, C., Basu, D., & Maurya, N., “Comparative Study of Big Data Analytics Tools: R and Tableau”, In IOP Conference Series: Materials Science and Engineering, Vol. 263, No. 4, pp. 1–9, 2017. DOI:10.1088/1757-899X/263/4/042052
[80]Sivakumar, S., Venkataraman, S., & Selvaraj, R., “Predictive Modeling of Student Dropout Indicators in Educational Data Mining Using Improved Decision Tree”, Indian Journal of Science and Technology, Vol. 9, No. 4, pp. 1–5, 2016. DOI:10.17485/ijst/2016/v9i4/87032