Predicting Student Program Completion Using Naïve Bayes Classification Algorithm

Full Text (PDF, 338KB), PP.57-67

Views: 0 Downloads: 0


Joann Galopo Perez 1,* Eugene S. Perez 1

1. Bulacan State University, Philippines

* Corresponding author.


Received: 25 Jan. 2021 / Revised: 23 Feb. 2021 / Accepted: 14 Mar. 2021 / Published: 8 Jun. 2021

Index Terms

Data Mining, Naïve Bayes Classification Algorithm, Predictive Model, and Program Completion


Data mining approaches provide different educational institutions opportunities to find hidden patterns from the data stored in the database. Many researchers have used these data to develop a model that would assist the institution administrators in decision-making. This study was performed to predict student program completion using the Naïve Bayes classifier technique. The dataset utilized in this study was obtained from Bulacan State University – Sarmiento Campus in the Philippines under BS Information Technology program from five-year graduates’ data for Academic Year 2012-2016. This dataset was pre-processed, cleansed, transformed, and balanced before constructing the model. Ten predictors were used for predicting student completion. The feature selection technique was used to filter and evaluate the significance of each factor. The significant variables assessed by the feature selection technique (Weight by Correlation) were the final parameters in creating the model. The Naïve Bayes classifier was applied to predict the students’ completion using the 70:30 ratios for training and testing dataset distribution. Correlation analysis identified the weight of individual attributes to the label attribute. From 10 possible predictor variables, only four (4) predictor variables were selected after correlation analysis. The identified significant attributes affecting program completion, namely (in order of significance): parents' monthly income, mother and father's educational attainment, and High School GPA attributes. The significant attributes identified in correlation analysis splitted into 70% training data or 447 records and 30% testing data or 191 records. There were 84 out of 191 data samples, or 44% of students were predicted to complete the program. On the other hand, 107 out of 191 data samples, or 56%, were predicted as not completing the program. The accuracy values performed an 84% rating with 80.46% class precision, and 83.33% class recall in the testing dataset (n=191). The outcomes of this study have a significant impact on HEIs, particularly on college completion rates. This study shall be highly significant and beneficial specifically to university administrators as this be a tool for them to identify students who will complete college based on variables included in the model.

Cite This Paper

Joann Galopo Perez, Eugene S. Perez, " Predicting Student Program Completion Using Naïve Bayes Classification Algorithm ", International Journal of Modern Education and Computer Science(IJMECS), Vol.13, No.3, pp. 57-67, 2021.DOI: 10.5815/ijmecs.2021.03.05


[1] D. E. Azarcon, C. D. Gallardo, C. G. Anacin, and E. Velasco, “Attrition and Retention in Higher Education Institution: A Conjoint Analysis of Consumer Behavior in Higher Education,” Asia Pacific J. Educ. Arts Sci., vol. 1, no. 107, pp. 2362–8022, 2014, [Online]. Available:
[2] H. K. Das and V. Janardhan, “Materials Today : Proceedings Machine learning approaches in education,” Mater. Today Proc., no. xxxx, 2020, doi: 10.1016/j.matpr.2020.09.566.
[3] R. S. Agieb, “Machine learning models for the prediction the necessity of resorting to icu of covid-19 patients,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 5, pp. 6980–6984, 2020, doi: 10.30534/ijatcse/2020/15952020.
[4] J. A. A. Repaso and E. T. Capariño, “Analyzing and predicting career specialization using classification techniques,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 1 Special Issue 3, pp. 342–348, 2020, doi: 10.30534/ijatcse/2020/5391.32020.
[5] S. Cui et al., “Using Naive Bayes Classifier to predict osteonecrosis of the femoral head with cannulated screw fixation,” Injury, vol. 49, no. 10, pp. 1865–1870, 2018, doi: 10.1016/j.injury.2018.07.025.
[6] J. Wu, “A generalized tree augmented naive Bayes link prediction model,” J. Comput. Sci., vol. 27, pp. 206–217, 2018.
[7] J. S. Aviles and R. A. Esquivel, “Mining social media data of Philippine higher education institutions using naïve bayes classifier algorithm,” Proc. 2019 9th Int. Work. Comput. Sci. Eng. WCSE 2019, pp. 681–688, 2020.
[8] H. Shaziya, R. Zaheer, and G. Kavitha, “Prediction of Students Performance in Semester Exams using a Naïve bayes Classifier,” pp. 9823–9829, 2015, doi: 10.15680/IJIRSET.2015.0410072.
[9] T. Barbé, L. P. Kimble, L. M. Bellury, and C. Rubenstein, “Predicting student attrition using social determinants: Implications for a diverse nursing workforce,” J. Prof. Nurs., vol. 34, no. 5, pp. 352–356, 2018, doi: 10.1016/j.profnurs.2017.12.006.
[10] D. Delen, K. Topuz, and E. Eryarsoy, “Development of a Bayesian Belief Network-based DSS for predicting and understanding freshmen student attrition,” Eur. J. Oper. Res., vol. 281, no. 3, pp. 575–587, 2020, doi: 10.1016/j.ejor.2019.03.037.
[11] J. M. Ryan, T. Potier, A. Sherwin, and E. Cassidy, “Identifying factors that predict attrition among first year physiotherapy students: a retrospective analysis,” Physiotherapy, 2017, doi: 10.1016/
[12] Raheela Asif, Agathe Merceron, Mahmood K. Pathan, "Predicting Student Academic Performance at Degree Level: A Case Study", International Journal of Intelligent Systems and Applications, vol.7, no.1, pp.49-61, 2015.
[13] J. L. Wircenski and C. Membe, “Identifying factors that predict student success in a community college online distance learning course.”
[14] L. Thamarai, L. Parthiban, and K. Mahalakshmi, “Comparison of classification techniques on data mining,” no. April, 2019, doi: 10.12732/ijpam.v118i11.43.
[15] P. S. Performance, “,” 2012.
[16] M. Rashedur, “,” pp. 0–25, 2015, doi: 10.1186/s40165-014-0010-2.
[17] F. Ahmad, N. H. Ismail, and A. A. Aziz, “The Prediction of Students ’ Academic Performance Using Classification Data Mining Techniques,” vol. 9, no. 129, pp. 6415–6426, 2015.
[18] Abhilasha Nakra, Manoj Duhan, "Comparative Analysis of Bayes Net Classifier, Naive Bayes Classifier and Combination of both Classifiers using WEKA", International Journal of Information Technology and Computer Science, Vol.11, No.3, pp.38-45, 2019.
[19] Ma Da, Wei Wei, Hu Hai-guang, Guan Jian-he,"The Application of Bayesian Classification Theories in Distance Education System", International Journal of Modern Education and Computer Science, vol.3, no.4, pp.9-16, 2011.
[20] Kirtika Yadav, Reema Thareja, "Comparing the Performance of Naive Bayes And Decision Tree Classification Using R", International Journal of Intelligent Systems and Applications, Vol.11, No.12, pp.11-19, 2019.
[21] W. Jang, J. K. Lee, J. Lee, and S. H. Han, “Naive Bayesian Classifier for Selecting Good/Bad Projects during the Early Stage of International Construction Bidding Decisions,” Math. Probl. Eng., vol. 2015, 2015, doi: 10.1155/2015/830781.
[22] P. Butka, P. Bednár, and J. Ivančáková, “Methodologies for Knowledge Discovery Processes in Context of AstroGeoInformatics,” Knowl. Discov. Big Data from Astron. Earth Obs., pp. 1–20, 2020, doi: 10.1016/b978-0-12-819154-5.00010-2.
[23] M. A. Valle, S. Varas, and G. A. Ruz, “Expert Systems with Applications Job performance prediction in a call center using a naive Bayes classifier,” Expert Syst. Appl., vol. 39, no. 11, pp. 9939–9945, 2012, doi: 10.1016/j.eswa.2011.11.126.
[24] Ajay Kumar Pal, Saurabh Pal, "Classification Model of Prediction for Placement of Students", International Journal of Modern Education and Computer Science, vol.5, no.11, pp.49-56, 2013.
[25] Hubert, P. Phoenix, R. Sudaryono, and D. Suhartono, “Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 498–506, 2021, doi: 10.1016/j.procs.2021.01.033.
[26] V. Ramesh and K. Ramar, “Predicting Student Performance: A Statistical and Data Mining Approach,” Int. J. Comput. Appl., vol. 63, no. 8, pp. 975–8887, 2013.
[27] D. D. Pokrajac, K. R. Sudler, P. Y. Edamatsu, and T. Hardee, “Prediction of Retention at Historically Black College / University using Artificial Neural Networks,” 2016.
[28] S. Hall and M. Aryee, “College Students ’ Persistence and Degree Completion In Science , Technology , Engineering , and Mathematics ( STEM ): The Role Of Non- Cognitive Attributes Of Self-Efficacy , Outcome Expectations , And Interest,” 2017.
[29] A. A. Aziz, N. Hafieza, and I. Ahmad, “First Semester Computer Science Students ’ Academic Performances Analysis by Using Data Mining Classification Algorithms,” no. September, pp. 15–16, 2014.
[30] S. Geiser and M. V. Santelices, “VALIDITY OF HIGH-SCHOOL GRADES IN PREDICTING STUDENT SUCCESS BEYOND THE FRESHMAN YEAR : High-School Record vs . Standardized Tests as Indicators of Four-Year College Outcomes”, 2007.
[31] T. A. Cardona, E. A. Cudney, and J. Snyder, “Predicting degree completion through data mining,” ASEE Annu. Conf. Expo. Conf. Proc., 2019, doi: 10.18260/1-2--33183.
[32] C. Ernesto and L. Guarín, “Data Mining Model to Predict Academic Performance at the Universidad Nacional de Colombia,” 2013.
[33] B. K. Baradwaj, “No Title,” IJACSA) Int. J. Adv. Comput. Sci. Appl., vol. 2, no. 6, pp. 63–69, 2011.
[34] Muladi, U. Pujianto, and U. Qomaria, “Predicting high school graduates using Naive Bayes in State University Entrance Selections,” 4th Int. Conf. Vocat. Educ. Training, ICOVET 2020, pp. 155–159, 2020.
[36] M. M. Chingos, “What Matters Most for College Completion? ACADEMIC PREPARATION IS A KEY PREDICTOR OF SUCCESS,” AEI Pap. Stud., p. 3A, 2018.
[37] O. M. Way, “Knowledge Discovery and Data Mining : Towards a Unifying Framework,” 1996.
[38] N. V. Chawla, “Data Mining for Imbalanced Datasets: An Overview,” Data Min. Knowl. Discov. Handb., pp. 875–886, 2009, doi: 10.1007/978-0-387-09823-4_45.
[39] J. D. Febro and J. Barbosa, “Mining student at risk in higher education using predictive models,” J. Adv. Technol. Eng. Res., vol. 3, no. 4, 2017, doi: 10.20474/jater-3.4.2.
[40] T. Fu, X. Tang, Z. Cai, Y. Zuo, Y. Tang, and X. Zhao, “Correlation research of phase angle variation and coating performance by means of Pearson’s correlation coefficient,” Prog. Org. Coatings, vol. 139, no. October 2019, p. 105459, 2020, doi: 10.1016/j.porgcoat.2019.105459.