Building Predictive Model by Using Data Mining and Feature Selection Techniques on Academic Dataset

Full Text (PDF, 307KB), PP.16-29

Views: 0 Downloads: 0


Mukesh Kumar 1,* Nidhi 2 Bhisham Sharma 3 Disha Handa 4

1. School of Computer Application, Lovely Professional University, Phagwara, Punjab, India

2. Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India

3. Department of Computer Science and Engineering, Chitkara University, Himachal Pradesh, India

4. Department of University Institute of Computing, Chandigarh University, Mohali, Punjab, India

* Corresponding author.


Received: 8 Dec. 2021 / Revised: 15 Mar. 2022 / Accepted: 19 Jun. 2022 / Published: 8 Aug. 2022

Index Terms

Classification Algorithms, Feature Selection, Correlation Attribute Evaluator, Information Gain, Gain Ratio.


In the field of education, every institution stores a significant amount of data in digital form on the academic performance of students. If this data is correctly analysed to discover any pattern related to student learning, it can assist the institution in achieving a favorable outcome in the future. Because of this, the use of data mining techniques makes it much simpler to unearth previously concealed information or detect patterns in student data. We use a variety of data mining methods, such as Naive Bayes, Random Forest, Decision Tree, Multilayer Perceptron, and Decision Table, to predict the academic performance of individual students. In the real world, a dataset may contain many features, yet the mining process may only place significance on some of those aspects. The correlation attribute evaluator, the information gain attribute evaluator, and the gain ratio attribute evaluator are some of the feature selection methods that are used in data mining to remove features that are not important for the mining process. Other feature selection methods include the gain ratio attribute evaluator and the gain ratio attribute evaluator. In conclusion, each classification algorithm that is designed using some feature selection methods enhances the overall predictive performance of the algorithms, which in turn improves the performance of the algorithms overall.

Cite This Paper

Mukesh Kumar, Nidhi, Bhisham Sharma, Disha Handa, "Building Predictive Model by Using Data Mining and Feature Selection Techniques on Academic Dataset", International Journal of Modern Education and Computer Science(IJMECS), Vol.14, No.4, pp. 16-29, 2022. DOI:10.5815/ijmecs.2022.04.02


[1]Asif R., Merceron A., & Pathan M. K. (2015). Investigating performance of students: a longitudinal study. In Proceedings of the Fifth International Conference on Learning Analytics and Knowledge (pp. 108-112). ACM.
[2]Kumar M., & Singh A. J. (2017). Evaluation of Data Mining Techniques for Predicting Student’s Performance. International Journal of Modern Education and Computer Science, 9(8), 25.
[3]Yahya A. A. (2017). Swarm intelligence-based approach for educational data classification. Journal of King Saud University-Computer and Information Sciences.
[4]Costa E. B., Fonseca B., Santana M. A., de Araújo F. F., & Rego J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses. Computers in Human Behavior, 73, 247-256.
[5]Ghani A. A., & Mohamed R. (2017). The Effect of Entry Requirement for Civil Engineering Student Performance. Journal of Science and Technology, 9(4).
[6]Ramanathan L., Parthasarathy G., Vijayakumar, K., Lakshmanan, L., & Ramani, S. (2018). Cluster-based distributed architecture for prediction of student’s performance in higher education. Cluster Computing, 1-16.
[7]Thomas, C. L., Cassady, J. C., & Heller, M. L. (2017). The influence of emotional intelligence, cognitive test anxiety, and coping strategies on undergraduate academic performance. Learning and Individual Differences, 55, 40-48.
[8]F. Castro, A. Valid, `A. Neat, and F. Mugica, “Applying data mining techniques to e-learning problems,” in Evolution of teaching and learning paradigms in intelligent environment, pp. 183–221, Springer,2007.
[9]Asif R., Haider N. G., & Ali S. A. (2016). Prediction of Undergraduate Student's Performance using Data Mining Methods. International Journal of Computer Science and Information Security, 14(5), 374.
[10]Kumar, M., Singh, A. J., & Handa, D. (2017). Literature survey on student’s performance prediction in education using data mining techniques. International Journal of Education and Management Engineering (IJEME), 7(6), 40-49.
[11]Pandey M., & Taruna S. (2016). Towards the integration of multiple classifiers pertaining to the student’s performance prediction. Perspectives in Science, 8, 364-366.
[12]Almutairi F. M., Sidiropoulos N. D., & Karypis G. (2017). Context-aware recommendation-based learning analytics using tensor and coupled matrix factorization. IEEE Journal of Selected Topics in Signal Processing, 11(5), 729-741.
[13]Oskouei R. J., & Askari M. (2014). Predicting academic performance with applying data mining techniques (Generalizing the results of two different case studies). Computer Engineering and Applications Journal, 3(2), 79.
[14]Hussain M., Al-Mourad M., Mathew S., & Hussein A. (2017). Mining educational data for academic accreditation: Aligning assessment with outcomes. Global Journal of Flexible Systems Management, 18(1), 51-60.
[15]Yehuala M. A. (2015). Application of Data Mining Techniques for Student Success and Failure Prediction (The Case of Debre Markos University). International Journal of Scientific & Technology Research, 4(4), 91-94.
[16]Brown S., Bowmar A., White, S., & Power, N. (2017). Evaluation of an instrument to measure undergraduate nursing student engagement in an introductory Human anatomy and physiology course. Collegian, 24(5), 491-497.
[17]Tran T. O., Dang H. T., Dinh, V. T., & Phan, X. H. (2017). Performance Prediction for Students: A Multi-Strategy Approach. Cybernetics and Information Technologies, 17(2), 164-182.
[18]Bharara S., Sabitha S., & Bansal A. (2017). Application of learning analytics using clustering data Mining for Students’ disposition analysis. Education and Information Technologies, 1-28.
[19]Kumar, M., Singh, A. J., & Handa, D. (2017). Literature survey on educational (IJEME) dropout prediction. International Journal of Education and Management Engineering, 7(2), 8.
[20]Kumar, M., Bajaj, K., Sharma, B., & Narang, S. (2021). A Comparative Performance Assessment of Optimized Multilevel Ensemble Learning Model with Existing Classifier Models. Big Data.
[21]Kumar, M., Mehta, G., Nayyar, N., & Sharma, A. (2021). EMT: Ensemble meta-based tree model for predicting student performance in academics. In IOP Conference Series: Materials Science and Engineering (Vol. 1022, No. 1, p. 012062). IOP Publishing.
[22]Walia, N., Kumar, M., Nayar, N., & Mehta, G. (2020, April). Student’s Academic Performance Prediction in Academic using Data Mining Techniques. In Proceedings of the International Conference on Innovative Computing & Communications (ICICC).
[23]Awoyelu I.O., Oguntoyinbo E. O., Awoyelu T. M., " Fuzzy K-Nearest Neighbour Model for Choice of Career Path for Upper Basic School Students ", International Journal of Education and Management Engineering, Vol.10, No.4, pp.18-32, 2020.
[24]Phyo Thu Thu Khine, Htwe Pa Pa Win, Tun Min Naing, "Towards Implementation of Blended Teaching Approaches for Higher Education in Myanmar", International Journal of Education and Management Engineering, Vol.11, No.1, pp. 19-27, 2021.
[25]Mohammed Abdullah Al-Hagery, Maryam Abdullah Alzaid, Tahani Soud Alharbi, Moody Abdulrahman Alhanaya, "Data Mining Methods for Detecting the Most Significant Factors Affecting Students’ Performance", International Journal of Information Technology and Computer Science, Vol.12, No.5, pp.1-13, 2020.