Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance

Full Text (PDF, 1116KB), PP.1-13

Views: 0 Downloads: 0


Udoinyang G. Inyang 1,* Imo J. Eyoh 1 Samuel A. Robinson 1 Edward N. Udo 1

1. Department of Computer Science, University of Uyo, Nigeria

* Corresponding author.


Received: 19 Aug. 2019 / Revised: 25 Sep. 2019 / Accepted: 19 Oct. 2019 / Published: 8 Dec. 2019

Index Terms

Association Rule Mining, Predictive analytics, students’ performance, hierarchal clustering, at-risk students


Persistent and quality graduation rates of students are increasingly important indicators of progressive and effective educational institutions. Timely analysis of students’ data to guide instructors in the provision of academic interventions to students who are at risk of performing poorly in their courses or dropout is vital for academic achievement. In addition there is need for performance attributes relationship mining for the generation of comprehensible patterns. However, there is dearth in pieces of knowledge relating to predicting students’ performance from patterns. This therefore paper adopts hierarchical cluster analysis (HCA) to analyze students’ performance dataset for the discovery of optimal number of fail courses clusters and partitioning of the courses into groups, and association rule mining for the extraction of interesting course-status association. Agglomerative HCA with Ward’s linkage method produced the best clustering structure (five clusters) with a coefficient of 92% and silhouette width 0.57. Apriori algorithm with support (0.5%), confidence (80%) and lift (1) thresholds were used in the extraction of rules with student’s status as consequent. Out of the twenty one courses offered by students in the first year, seven courses frequently occur together as failed courses, and their impact on the respective students’ performance status were assessed in the rules. It is conjectured that early intervention by the instructors and management of educational activities on these seven courses will increase the students’ learning outcomes leading to increased graduation rate at minimum course duration, which is the overarching objective of higher educational institutions. As further work, the integration of other machine learning and nature inspired tools for the adaptive learning and optimization of rules respectively would be performed.

Cite This Paper

Udoinyang G. Inyang, Imo J. Eyoh, Samuel A. Robinson, Edward N. Udo, " Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance ", International Journal of Modern Education and Computer Science(IJMECS), Vol.11, No.12, pp. 1-13, 2019. DOI:10.5815/ijmecs.2019.12.01


[1]Hussein, S., Dahan, N. A., Ba-Alwib, F. M. and Ribata, N. “Educational Data Mining and Analysis of Students’ Academic Performance Using WEKA Indonesian Journal of Electrical Engineering and Computer Science” 9(2), (2018), 447~459,. DOI: 10.11591/ijeecs.v9.i2.pp447-459
[2]Ray, S., and M. Saeed. “Applications of educational data mining and learning analytics tools in handling big data in higher education”. In Applications of Big Data Analytics, 135-160, 2018. Springer, Cham.
[3]Inyang, U, G, Umoh, U. A., Nnaemeka, C and S. Robinson. “Unsupervised Characterization and Visualization of Students’ Academic Performance Features”. 12(2), 103-105, 2019.
[4]Romero, Cristóbal, and Sebastián Ventura. "Educational data mining: a review of the state of the art." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40, no. 6 (2010): 601-618.
[5]Sklyar, Eduard. "Exploring First-Time Community College Transfer Students' Perception of Their Experience as They Transition to a Large Public Four-Year Institution." PhD diss., Northeastern University, 2017
[6]O'Keeffe, Patrick. "A sense of belonging: Improving student retention." College Student Journal 47, no. 4 (2013): 605-613.
[7]K. E. Arnold, and M. D. Pistilli. Course signals at Purdue: using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. 267–270, 2012. doi:10.1145/2330601.2330666.
[8]U. G. Inyang and E. E Joshua. “Fuzzy Clustering of Students' Data Repository for At-Risks Students Identification and Monitoring”. Computer and Information Science, 2013. 6(4), 37-50.
[9]J., Xu, Moon, K. H., and M. Van Der Schaar, “A machine learning approach for tracking and predicting student performance in degree programs” IEEE Journal of Selected Topics in Signal Processing, 2017. 11(5), 742-753
[10]R. Agrawal, T. Imielinski and A. Swami “Mining association rules between sets of items in large databases”. in: Proceedings of the ACM SIGMOD Conference on Management of Data, (1993). 207-216.
[11]D. Gašević, S. Dawson, T. Rogers, and D. Gasevic. “Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success”. The Internet and Higher Education, 2016. 28, 68-84.
[12]O. Zaïane “Web usage mining for a better web-based learning environment”, In proceeedings of the 4th International Conference on Advanced Technology for Education (CATE’01), 27–28 June 2001, Banff, Canada.
[13]D. Gašević, N. Mirriahi, and S. Dawson. “Analytics of the effects of video use and instruction to support reflective learning”. In Proceedings of the fourth international conference on learning analytics and Knowledge. 2014. 123-132
[14]N. Zacharis “A multivariate approach to predicting student outcomes in web-enabled blended learning courses, Internet and Higher Education”, 2015, 27, 44–53.
[15]J. Ruipérez-Valiente, P. Muñoz-Merino, D. Leony, and Kloos Delgado. “ALAS-KA: A learning analytics extension for better understanding the learning process in the Khan Academy platform”. Computers in Human Behavior, 2015. 47, 139–148.
[16]Y. Park, and L. Jo. “Development of the Learning Analytics Dashboard to Support Students’ Learning Performance” Journal of Universal Computer Science, 2015. 21(1), 110-133
[17]A. Daud, N. Aljohani, R. Abbasi, M. Lytras, F. Abbas, and J. Alowibdi. “Predicting Student Performance using Advanced Learning Analytics, International World Wide Web Conference Committee (IW3C2)”, 2017, 415-421.
[18]X. Wanli, G. Rui, P. Eva, and G. Sean. “Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory” Computers in Human Behaviour, 2015. 47. 168–181.
[19]J. Hair, R. Anderson, Tatham; and C. Black. Análise multivariada de dados, Bookman, 2005, Porto Alegre, Brazil
[20]R. W. Sembiring, J. M. Zain, and A. Embong. “A comparative agglomerative hierarchical clustering method to cluster implemented course”. Journal of Computing, 2,(12), December 2010, ISSN 2151-9617 Arxiv Preprint Arxiv:1101.4270.
[21]Singh, E. Hjorleifsson, and G. Stefansson. “Robustness of fish assemblages derived from three hierarchical agglomerative clustering algorithms performed on Icelandic ground fish survey data” Journal of Marine Science, 2011, 68(1), 189 –200. doi:10.1093/icesjms/fsq144
[22]O. Yim, and K. T. Ramdeen. “Hierarchical cluster analysis: comparison of three linkage measures and application to psychological data”. The quantitative methods for psychology, 11(1), 2015, 8-21.
[23]Z. Li, and M. deRijke. “The impact of linkage methods in hierarchical clustering for active learning to rank”. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, 941-944). ACM.
[24]F. Murtagh, and P. Legendre. “Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?” Journal of classification, 31(3), 2014, 274-295.
[25]J. Vesanto, and E. Alhoniemi. “Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3), 2000, 586-600.
[26]N. Ye. “Data mining: theories, algorithms, and examples”. 2014. CRC press.
[27]P. D. McNicholas, T. B. Murphy, and O’Regan, M. Standardizing the lift of an association rule. Computational Statistics and Data Analysis. 52(10), 2008, 4712-4721.
[28]N. Hussein, A. Alashqur and B. Sowan. “Using the interestingness measure lift to generate association rules”. Journal of Advanced Computer Science & Technology, 4(1), (2015, 156.
[29]F. Verhein. “Frequent pattern growth (FP-growth) algorithm”. School of Information Studies, The University of Sydney, Australia, 2008, 1-16.
[30]J. Han, M. Kamber and J. Pei. “Data mining: Concepts and techniques” (3rd ed.). 2012, San Francisco: Morgan Kaufmann Inc
[31]M. Y. Avcilar, E. Yakut. “Association Rules in Data Mining: An Application on a Clothing and Accessory Specialty Store”. Canadian Social Science. 10(3), 2014.75-83. DOI: 10.3968/4541
[32]M. Dimitrijevic and Z. Bosnjak. “Pruning statistically insignificant association rules in the presence of high-confidence rules in web usage data”. Procedia Computer Science, 35, 2014. 271-280.
[33]Mandave, Pratibha, Megha Mane, and Sharada Patil. "Data mining using Association rule based on APRIORI algorithm and improved approach with illustration." International Journal of Latest Trends in Engineering and Technology (IJLTET), ISSN (2013).
[34]A. M. Shahiri, and W. A. Husain. “Review on predicting student's performance using data mining techniques”. Procedia Computer Science, 72, 2015. 414-422
[35]Meng, Xue-Hui, Yi-Xiang Huang, Dong-Ping Rao, Qiu Zhang, and Qing Liu. "Comparison of three data mining models for predicting diabetes or prediabetes by risk factors." The Kaohsiung journal of medical sciences 29, no. 2 (2013): 93-99.
[36]W. Venables; and D. Smith. “The R Core Team, An introduction to R” (2017). https://cran.r Accessed on 6th June, 2019
[37]R. Ihaka, and R. Gentleman. “R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics. 5(3), 299-314,996:DOI: 10.1080/10618600.1996.10474713
[38]I. Mohamad and D. Usman. “Standardization and Its Effects on K-Means Clustering Algorithm. Research” Journal of Applied Sciences, Engineering and Technology. 6(17), 2013. 3299-3303
[39]M. Pathak. “Hierarchical Clustering in R”. (2018) Accessed June 28, 2019.
[40]A. Timofeeva. “Evaluating the robustness of goodness-of-fit measures for hierarchical clustering”. In Journal of Physics: Conference Series, January 2019, 1145(1), 012049. IOP Publishing.
[41]P. Carvalho, C. Munita, and A. Lapolli1. “Validity Studies among Hierarchical Methods of Cluster Analysis Using Cophenetic Correlation Coefficient” International Nuclear Atlantic Conference - INAC 2017 Belo Horizonte, MG, Brazil, October 22-27, 2017
[42]R. Gove. “Using the elbow method to determine the optimal number of clusters for k-means clustering”. URL: https://blocks. org/rpgove/0060ff3b656618e9136b, 17-19. (2017)
[43]P. Bholowalia, and A. Kumar. “EBK-means: A clustering technique based on elbow method and k-means in WSN”. International Journal of Computer Applications, 105(9), (2014). 17-24
[44]Wolzinger, Renah, and Henry O'Lawrence. "Student Characteristics and Enrollment in a CTE Pathway Predict Transfer Readiness." Pedagogical Research 3, no. 2 (2018): n2..
[45]M. Charrad, N. Ghazzali, V. Boiteau, and Niknafs, A. “NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set”. Journal of Statistical Software 61(6), (2014). 1-36
[46]T. Van Craenendonck and H. Blockeel. “Using internal validity measures to compare clustering algorithms”. In AutoML workshop at ICML 2015, 1-8.
[47]J. Deogun and L. Jiang. “Prediction mining–an approach to mining association rules for prediction”. In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, 98-108, (August 2005). Springer, Berlin, Heidelberg.
[48]J. Thakkar and M. Parikh. “An Efficient Approach for Accurate Frequent Pattern Mining Practicing Threshold Values”. International journal of Engineering and Technology. 4(4), (2018) 2394-4099