Using Machine Learning Algorithms to Predict First-generation College Students’ Six-year Graduation: A Case Study

Full Text (PDF, 313KB), PP.1-8

Views: 0 Downloads: 0


Zhixin Kang 1,*

1. Dept. of Economics and Decision Sciences, University of North Carolina at Pembroke, Pembroke, U.S.A

* Corresponding author.


Received: 26 May 2019 / Revised: 9 Jun. 2019 / Accepted: 15 Jun. 2019 / Published: 8 Sep. 2019

Index Terms

Machine learning algorithms, first-generation college students, six-year graduation, forecasting evaluation


This paper studies the forecasting mechanism of the most widely used machine learning algorithms, namely linear discriminant analysis, logistic regression, k-nearest neighbors, random forests, artificial neural network, naive Bayes, classification and regression trees, support vector machines, adaptive boosting, and stacking ensemble model, in forecasting first-generation college students’ six-year graduation using the first college year’s data. Five standard evaluating metrics are used to evaluate these models. The results show that these machine learning models can significantly predict first-generation college students’ six-year graduation with mean forecasting accuracy rate spanning from 69.58% to 75.17% and median forecasting accuracy rate spanning from 70.37% to 74.52%. Among these machine learning algorithms, stacking ensemble model, logistic regression model, and linear discriminant analysis are the best three ones in terms of mean forecasting accuracy rate. Furthermore, the results from the repeated ten-fold cross-validation process reveal that the variations of the five evaluating metrics exhibit remarkably different patterns across the ten machine learning algorithms.

Cite This Paper

Zhixin Kang, "Using Machine Learning Algorithms to Predict First-generation College Students’ Six-year Graduation: A Case Study", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.9, pp.1-8, 2019. DOI:10.5815/ijitcs.2019.09.01


[1]R. Asif, A. Merceron and M. K. Pathan, "Predicting Student Academic Performance at Degree Level: A Case Study," International Journal of Intelligent Systems and Applications, vol. 7, no. 1, pp. 49-61, 2015. 

[2]B. M. Monjurul Alom and M. Courtney, "Educational Data Mining: A Case Study Perspectives from Primary to University Education in Australia," International Journal of Information Technology and Computer Science (IJITCS), vol. 10, no. 2, pp. 1-9, 2018. 

[3]M. Bogard, T. Helbig, G. Huff and C. James, "A comparison of empirical models for predicting student retention," Western Kentucky University, Bowling Green, 2011.

[4]P. Attewell, S. Heil and L. Reisel, "Competing explanations of undergraduate noncompletion," American Educational Research Journal, vol. 48, no. 3, pp. 536-559, 2011. 

[5]S. Gershenfeld, D. Hood and M. Zhan, "The Role of First-Semester GPA in Predicting Graduation Rates of Underrepresented Students," Journal of College Student Retention: Research, Theory & Practice, vol. 0, no. 0, pp. 1-20, 2015. 

[6]S. Herzog, "Estimating student retention and degree completion time: Decision trees and neural networks vis-vis regression," New Directions for Institutional Research, vol. 131, pp. 17-33, 2006. 

[7]A. Mueen, B. Zafar and U. Manzoor, "Modeling and Predicting Students' Academic Performance Using Data Mining Techniques," International Journal of Modern Education and Computer Science (IJMECS), vol. 8, no. 11, pp. 36-42, 2016. 

[8]A. C. Atherton, "Academic preparedness of first-generation college students: Different perspectives," Journal of College Student Development, vol. 55, no. 8, pp. 824-829, 2014. 

[9]K. V. T. Bui, "First-generation students at a four-year university: Background characteristics, reasons for pursuing higher education, and first-year experience," College Student Journal, vol. 36, pp. 3-11, 2002. 

[10]J. Engle and V. Tinto, "Moving beyond access: College success for low-income, firstgeneration students," The Pell Institute for the Study of Opportunity in Higher Education, Washington, DC, 2008.

[11]J. Aspelmeier, M. Love, L. Mcgill, A. Elliott and T. Pierce, "Self-Esteem, Locus of Control, College Adjustment, and GPA Among First- and Continuing-Generation Students: A Moderator Model of Generational Status," Research in Higher Education, vol. 53, pp. 755-781, 2012. 

[12]J. J. Lee, L. J. Sax, A. K. Kim and L. S. Hagedorn, "Understanding students’ parental education beyond firstgeneration status," Community College Review, vol. 32, no. 1, pp. 1-20, 2004. 

[13]S. Choy, "Students whose parents did not go to college: Postsecondary access, persistence, and attainment," 2001. [Online]. Available: [Accessed 29 01 2019].

[14]G. P. McCarron and K. K. Inkelas, "The gap between educational aspirations and attainment for first-generation college students and the role of parental involvement," Journal of College Student Development, vol. 47, pp. 534-549, 2006. 

[15]P. T. Terenzini, L. Springer, P. M. Yaeger, E. T. Pascarella and A. Nora, "First-generation college students: Characteristics, experiences, and cognitive development," Research in Higher Education, vol. 37, no. 1, pp. 1-22, 1996. 

[16]E. Warburton, R. Bugarin, A. Nunez and C. Carroll, "Bridging the gap: Academic preparation and postsecondary success of first-generation students," 2001. [Online]. Available: [Accessed 29 01 2019].

[17]K. Cushman, "Facing the Culture Shock of College," Educational Leadership, vol. 64, no. 7, pp. 44-47, 2007. 

[18]S. Robinson, "Underprepared students," 1996. [Online]. Available: [Accessed 17 12 2018].

[19]A. Shahiri, W. Husain and N. Rashid, "A Review on Predicting Student's Performance Using Data Mining Techniques," Procedia Computer Science, vol. 72, pp. 414-422, 2015. 

[20]J. Cohen, "A coefficient of agreement for nominal Scales," Education and Psychological Measurement, vol. 20, no. 1, pp. 37-46, 1960. 

[21]J. R. Landis and G. G. Koch, "The measurement of observer agreement for categorical data," Biometrics, vol. 33, no. 1, pp. 159-174, 1977.