A Unified Model of Clustering and Classification to Improve Students’ Employability Prediction

Full Text (PDF, 594KB), PP.10-18

Views: 0 Downloads: 0


Pooja Thakar 1,* Anil Mehta 2 Manisha 1

1. Banasthali University, Jaipur, 304022, India

2. University of Rajasthan, Jaipur, 304022, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.09.02

Received: 2 May 2017 / Revised: 15 Jun. 2017 / Accepted: 24 Jul. 2017 / Published: 8 Sep. 2017

Index Terms

Clustering, Classification, Data Mining, Employability, Prediction, Education


Data Mining is gaining immense popularity in the field of education due to its predictive capabilities. But, most of the prior effort in this area is only directed towards prediction of performance in academic results only. Nowadays, education has become employment oriented. Very little attempt is made to predict students’ employability. Precise prediction of students’ performance in campus placements at an early stage can identify students, who are at the risk of unemployment and proactive actions can be taken to improve their performance.
Existing researches on students’ employability prediction are either based upon only one type of course or on single University/Institute; thus is not scalable from one context to another. With this necessity, the conception of a unified model of clustering and classification is proposed in this paper.
With the notion of unification, data of professional courses namely Engineering and Masters in Computer Applications students are collected from various universities and institutions pan India. Data is large, multivariate, incomplete, heterogeneous and unbalanced in nature. To deal with such a data, a unified predictive model is built by integrating clustering and classification techniques. Two- Level clustering (k-means kernel) with chi-square analysis is applied at the pre-processing stage for the automated selection of relevant attributes and then ensemble vote classification technique with a combination of four classifiers namely k-star, random tree, simple cart and the random forest is applied to predict students’ employability. Proposed framework provides a generalized solution for student employability prediction. Comparative results clearly depict model performance over various classification techniques. Also, when the proposed model is applied up to the level of the state, classification accuracy touches 96.78% and 0.937 kappa value.

Cite This Paper

Pooja Thakar, Anil Mehta, Manisha, "A Unified Model of Clustering and Classification to Improve Students’ Employability Prediction", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.9, pp.10-18, 2017. DOI:10.5815/ijisa.2017.09.02


[3]Thakar, Pooja, and Anil Mehta. "Cluster Model for parsimonious selection of variables and enhancing Students' Employability Prediction." International Journal of Computer Science and Information Security 14.12 (2016): 611.
[4]Mishra, Tripti, Dharminder Kumar, and Sangeeta Gupta. "Students’ Employability Prediction Model through Data Mining." International Journal of Applied Engineering Research 11.4 (2016): 2275-2282.
[5]Jantawan, Bangsuk, and Cheng-Fa Tsai. "The Application of Data Mining to Build Classification Model for Predicting Graduate Employment." International Journal Of Computer Science And Information Security (2013).
[6]Hu, Zhengbing, et al. "Possibilistic Fuzzy Clustering for Categorical Data Arrays Based on Frequency Prototypes and Dissimilarity Measures." International Journal of Intelligent Systems and Applications (IJISA) 9.5 (2017): 55-61.
[7]Catal, Cagatay, and Mehmet Nangir. "A sentiment classification model based on multiple classifiers." Applied Soft Computing 50 (2017): 135-141.
[8]Maina, Elizaphan M., Robert O. Oboko, and Peter W. Waiganjo. "Using Machine Learning Techniques to Support Group Formation in an Online Collaborative Learning Environment." International Journal of Intelligent Systems & Applications 9.3 (2017).
[9]Bhanuprakash, C., Y. S. Nijagunarya, and M. A. Jayaram. "Clustering of Faculty by Evaluating their Appraisal Performance by using Feed Forward Neural Network Approach." International Journal of Intelligent Systems and Applications 9.3 (2017): 34.
[10]Chaudhury, Pamela, et al. "Enhancing the capabilities of Student Result Prediction System." Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies. ACM, 2016.
[11]Elbadrawy, Asmaa, R. Scott Studham, and George Karypis. "Collaborative multi-regression models for predicting students' performance in course activities." Proceedings of the Fifth International Conference on Learning Analytics And Knowledge. ACM, 2015.
[12]Rogers, Tim, Cassandra Colvin, and Belinda Chiera. "Modest analytics: using the index method to identify students at risk of failure." Proceedings of the Fourth International Conference on Learning Analytics And Knowledge. ACM, 2014.
[13]Sanchez-Santillan, Miguel, et al. "Predicting Students' Performance: Incremental Interaction Classifiers." Proceedings of the Third (2016) ACM Conference on Learning@ Scale. ACM, 2016.
[14]Pandey, Mrinal, and S. Taruna. "Towards the integration of multiple classifier pertaining to the Student's performance prediction." Perspectives in Science8 (2016): 364-366.
[15]Alfiani, Ardita Permata, and Febriana Ayu Wulandari. "Mapping Student's Performance Based on Data Mining Approach (A Case Study)." Agriculture and Agricultural Science Procedia 3 (2015): 173-177.
[16]Rahmat, Normala, Yahya Buntat, and Abdul Rahman Ayub. "Determination of Constructs and Dimensions of Employability Skills Based Work Performance Prediction: A Triangular Approach." International Journal of Economics and Financial Issues 5.1S (2015).
[17]Thakar, Pooja, and Anil Mehta. "Role of Secondary Attributes to Boost the Prediction Accuracy of Students’ Employability Via Data Mining." International Journal of Advanced Computer Science & Applications 11.6 (2015): 84-90.
[18]Finch, David J., Leah K. Hamilton, Riley Baldwin, and Mark Zehner. "An exploratory study of factors affecting undergraduate employability." Education+ Training 55, no. 7 (2013): 681-704.
[19]Bakar, Noor Aieda Abu, Aida Mustapha, and Kamariah Md Nasir. "Clustering Analysis for Empowering Skills in Graduate Employability Model." Australian Journal of Basic and Applied Sciences 7.14 (2013): 21-28.
[20]Jamaludin, Nor Azliana Akmal, and Shamsul Sahibuddin. "Pilot Study of Industry Perspective on Requirement Engineering Education: Measurement of Rasch Analysis." Editorial Preface 4.8 (2013).
[21]Piad, Keno C., et al. "Predicting IT employability using data mining techniques." Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC), 2016 Third International Conference on. IEEE, 2016.
[22]Rashid, Tarik A. "Improvement on Classification Models of Multiple Classes through Effectual Processes." International Journal of Advanced Computer Science and Applications (IJACSA) 6.7 (2015).
[23]Kim, Dae-Won, et al. "Evaluation of the performance of clustering algorithms in kernel-induced feature space." Pattern Recognition 38.4 (2005): 607-611.
[24]Denison, David GT, Bani K. Mallick, and Adrian FM Smith. "A bayesian CART algorithm." Biometrika 85.2 (1998): 363-377.
[25]Tejera Hernández, Dayana C. "An Experimental Study of K* Algorithm." International Journal of Information Engineering & Electronic Business 7.2 (2015).
[26]Aldous, David. "The continuum random tree II: an overview." Stochastic analysis 167 (1991): 23-70.
[27]Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.
[28]Witten, Ian H., et al. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.
[29]Chawla, Nitesh V. "Data mining for imbalanced datasets: An overview." Data mining and knowledge discovery handbook. Springer US, 2009. 875-886.
[30]Sokolova, Marina, and Guy Lapalme. "A systematic analysis of performance measures for classification tasks." Information Processing & Management 45.4 (2009): 427-437.
[31]Whig P, Ahmad SN (2017) Controlling the Output Error for Photo Catalytic Sensor (PCS) Using Fuzzy Logic. J Earth Sci Clim Change 8: 394. doi:10.4172/2157-7617.1000394