Football Match Prediction with Tree Based Model Classification

Full Text (PDF, 484KB), PP.20-28

Views: 0 Downloads: 0


Yoel F. Alfredo 1,* Sani M. Isa 1

1. Computer Science Department, BINUS Graduate Program-Master in Computer Science, Bina Nusantara University Jakarta, 11480, Indonesia

* Corresponding author.


Received: 26 Jan. 2019 / Revised: 27 Feb. 2019 / Accepted: 19 Mar. 2019 / Published: 8 Jul. 2019

Index Terms

Football match prediction, supervised machine learning, decision tree, feature selection, classification


This paper presents the football match prediction using a tree-based model algorithm (C5.0, Random Forest, and Extreme Gradient Boosting). Backward wrapper model was applied as a feature selection methodology to help select the best feature that will improve the accuracy of the model. This study used 10 seasons of football data match history (2007/2008 – 2016/2017) in the English Premier League with 15 initial features to predict the match results. With the tuning process, each model showed improvement in accuracy. Random Forest algorithm generated the best accuracy with 68,55% while the C5.0 algorithm had the lowest accuracy at 64,87% and Extreme Gradient Boosting algorithm produced accuracy of 67,89%. With the output produced in this study, the Decision Tree based algorithm is concluded as not good enough in predicting a football match history.

Cite This Paper

Yoel F. Alfredo, Sani M. Isa, "Football Match Prediction with Tree Based Model Classification", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.7, pp.20-28, 2019. DOI:10.5815/ijisa.2019.07.03


[1]W. J. Murray and B. Murray, The worlds game: a history of soccer, vol. 14. Urbana: University of Illinois Press, 1998.
[2]D. Prasetio and Harlili, “Predicting football match results with logistic regression,” in 4th IGNITE Conference and 2016 International Conference on Advanced Informatics: Concepts, Theory, and Application, ICAICTA 2016, 2016.
[3]M. Faculty, A. Yezus, and A. Igoshkin, “Predicting outcome of soccer matches using machine learning,” Saint-petersbg. Univ., 2014.
[4]C. P. Igiri and E. O. Nwachukwu, “An Improved Prediction System for Football a Match Result,” IOSR J. Eng., vol. 04, no. 12, pp. 12–20, 2014.
[5]C. P. Igiri, “Support Vector Machine–Based Prediction System for a Football Match Result,” IOSR J. Comput. Eng. Ver. III, vol. 17, no. 3, pp. 2278–661, 2015.
[6]N. Razali, A. Mustapha, F. A. Yatim, and R. Ab Aziz, “Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL),” in IOP Conference Series: Materials Science and Engineering, 2017, vol. 226, no. 1.
[7]V. Chaurasia, “Early Prediction of Heart Diseases Using Data Mining,” Caribbean. J. Sci. Technol., vol. 1, pp. 208–217, 2013.
[8]A. Zakerian, A. Maleki, Y. Mohammadnian, and T. Amraee, “Bad data detection in state estimation using Decision Tree technique,” in 2017 25th Iranian Conference on Electrical Engineering, ICEE 2017, 2017.
[9]P. Carmona, F. Climent, and A. Momparler, “Predicting failure in the U.S. banking sector: An extreme gradient boosting approach,” International Review of Economics and Finance, 2018.
[10]J. Tang, S. Alelyani, and H. Liu, “Feature Selection for Classification: A Review,” Data Classif. Algorithms Appl., 2014.
[11]R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artif. Intell., 1997.
[12]F. Gorunescu, “Data mining: Concepts, models and techniques,” Intell. Syst. Ref. Libr., 2011.
[13]P. Refaeilzadeh, L. Tang, and H. Liu., “‘Cross-Validation.,’” in Encyclopedia of database systems, 2009.
[14]S. PANG and J. GONG, “C5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks,” Syst. Eng. - Theory Pract., 2009.
[15]T. Bujlow, T. Riaz, and J. M. Pedersen, “A method for classification of network traffic based on C5.0 machine learning algorithm,” in 2012 International Conference on Computing, Networking, and Communications, ICNC’12, 2012.
[16]L. Breiman, “Random Forrest,” Mach. Learn., 2001.
[17]A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R news, 2002.
[18]T. Chen, G. Cowan, C. Germain, I. Guyon, B. Azs Kégl, and D. Rousseau, “Higgs Boson Discovery with Boosted Trees,” in NIPS 2014 Workshop on High-energy Physics and Machine Learning, 2014.
[19]T. Chen and C. Guestrin, “XGBoost: Reliable Large-scale Tree Boosting System,” in LearningSys, 2016.