Diabetes Mellitus Data Classification by Cascading of Feature Selection Methods and Ensemble Learning Algorithms

Full Text (PDF, 556KB), PP.10-16

Views: 0 Downloads: 0


Kemal Akyol 1,* Baha sen 2

1. Computer Engineering, Kastamonu University, Kastamonu, 37100, Turkey

2. Computer Engineering, Yıldırım Beyazıt University, Ankara, 06500, Turkey

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2018.06.02

Received: 30 Mar. 2018 / Revised: 26 Apr. 2018 / Accepted: 18 May 2018 / Published: 8 Jun. 2018

Index Terms

Diabetes mellitus, feature selection, ensemble learning, AdaBoost, Gradient Boosted Trees, Random Forest


Diabetes is a chronic disease related to the rise of levels of blood glucose. The disease that leads to serious damage to the heart, blood vessels, eyes, kidneys, and nerves is one of the reasons of death among the people in the world. There are two main types of diabetes: Type 1 and Type 2. The former is a chronic condition in which the pancreas produces little or no insulin by itself. The latter usually in adults, occurs when insulin level is insufficient. Classification of diabetes mellitus data which is one of the reasons of death among the people in the world is important. This study which successfully distinguishes diabetes or normal persons contains two major steps. In the first step, the feature selection or weighting methods are analyzed to find the most effective attributes for this disease. In the further step, the performances of AdaBoost, Gradient Boosted Trees and Random Forest ensemble learning algorithms are evaluated. According to experimental results, the prediction accuracy of the combination of Stability Selection method and AdaBoost learning algorithm is a little better than other algorithms with the classification accuracy by 73.88%.

Cite This Paper

Kemal Akyol, Baha Şen, " Diabetes Mellitus Data Classification by Cascading of Feature Selection Methods and Ensemble Learning Algorithms", International Journal of Modern Education and Computer Science(IJMECS), Vol.10, No.6, pp. 10-16, 2018. DOI:10.5815/ijmecs.2018.06.02


[1]H. Wu, S. Yang, Z. Huang, J. He, X. Wang, “Type 2 diabetes mellitus prediction model based on data mining,” Informatics in Medicine Unlocked, vol. 10, pp. 100-107, 2018. DOI: 10.1016/j.imu.2017.12.006.
[2]A. K. Dewangan and P. Agrawal, “Classification of Diabetes Mellitus Using Machine Learning Techniques,” International Journal of Engineering and Applied Sciences (IJEAS), vol. 2, no. 5, pp. 145-148, May 2015.
[3]S. R. Priyanka Shetty and S. Joshi, “A Tool for Diabetes Prediction and Monitoring Using Data Mining Technique,” I.J. Information Technology and Computer Science, vol. 8, no. 11, pp. 26-32, 2016. DOI: 10.5815/ijitcs.2016.11.04.
[4]V. Jain and S. Raheja, “Improving the Prediction Rate of Diabetes using Fuzzy Expert System,” I.J. Information Technology and Computer Science, vol. 7, no. 10, pp. 84-91, 2015. DOI: 10.5815/ijitcs.2015.10.10.
[5]D. K. Choubey and S. Paul, “GA_MLP NN: A Hybrid Intelligent System for Diabetes Disease Diagnosis,” I.J. Intelligent Systems and Applications, vol. 1, pp. 49-59, 2016. DOI: 10.5815/ijisa.2016.01.06.
[6]M. R. Bozkurt, N. Yurtay, Z. Yılmaz, “Comparison of different methods for determining diabetes,” Turk J Elec Eng & Comp Sci, vol. 22, pp. 1044-1055, 2014. DOI: 10.3906/elk-1209-82.
[7]A. H. Osman, H. M. Aljahdali, “Diabetes Disease Diagnosis Method based on Feature Extraction using K-SVM,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 1, pp. 237-244, 2017.
[8]S. Kou, J.Y. Cao, S. Yeo, D.J. Holmes-Walker, S.L. Lau, J.E. Gunton, “Ethnicity influences cardiovascular outcomes and complications in patients with type 2 diabetes,” vol. 32, no. 2, pp. 144-149, Feb 2018. DOI: 10.1016/j.jdiacomp.2017.10.016.
[9]J. A. Campbell, G. C. Farmer, S. Nguyen-Rodriguez, R. Walker, L. Egede, “Relationship between individual categories of adverse childhood experience and diabetes in adulthood in a sample of US adults: Does it differ by gender?”, J Diabetes Complications, vol. 32, no. 2, pp. 139-143, Feb 2018. DOI: 10.1016/j.jdiacomp.2017.
[10]A. G. Karegowda, M. A. Jayaram, A. S. Manjunath, “Cascading k-means with Ensemble Learning: Enhanced Categorization of Diabetic Data,” Journal of Intelligent Systems, vol. 21, no.3, pp. 237-253, 2012. DOI: 10.1515/jisys-2012-0010.
[11]M. Maniruzzaman, N. Kumar, Md. M. Abedin, Shaykhul Islam, H. S. Suri, A. S. El-Baz, J.S. Suri, “Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm,” Comput Methods Programs Biomed, vol. 152, pp. 23-34, Dec 2017. DOI: 10.1016/j.cmpb.2017.09.004.
[12]M. Shinohara, N. Sato, “Bidirectional interactions between diabetes and Alzheimer's disease,” Neurochemistry International, vol. 108, pp. 296-302, 2017. DOI: 10.1016/j.neuint.2017.04.020.
[13]T. E. Silva, M. F. Ronson, L.L.Schiavon, “Challenges in diagnosing and monitoring diabetes in patients with chronic liver diseases,” Diabetes & Metabolic Syndrome: Clinical Research & Reviews, In Press, Corrected Proof, 2017. DOI: 10.1016/j.dsx.2017.12.013.
[14]M.N.N. Vieira A.S. Ricardo, Lima-Filho, F.G.De Felice, “Connecting Alzheimer's disease to diabetes: Underlying mechanisms and potential therapeutic targets,” Neuropharmacology, In Press, 10 Nov. 2017. DOI: 10.1016/j.neuropharm.2017.11.014.
[15]D. Baglietto-Vargas, J. Shi, M.D. Yaeger, R. Ager, F. M. LaFerla, “Diabetes and Alzheimer’s disease crosstalk,” Neuroscience & Biobehavioral Reviews, vol. 64, pp. 272-287, May 2016. DOI: 10.1016/j.neubiorev.2016.03.005.
[16]W.C. Knowler, P.H. Bennett, R.F. Hammam and M. Miller, “Diabetes incidence and prevalence in Pima Indians: a 19-fold greater incidence than in Rochester, Minnesota,” Am J Epidemiol, vol. 108, no.6, pp. 497-504, 1978.
[17]A. Krosnick, “The diabetes and obesity epidemic among the Pima Indians,” N J Med, vol. 97, no. 8, pp. 31-37, 2000.
[18]L.J. Baier, R.L. Hanson, “Genetic studies of the etiology of type 2 diabetes in Pima Indians: hunting for pieces to a complicated puzzle,” Diabetes, vol. 53, no. 5, pp. 1181-1186, 2004. DOI: 10.2337/diabetes.53.5.1181.
[19]Han J., Kamber M. and Pei J., (2012) Data Mining: Concepts and Techniques, 3rd ed., Waltham, MA, USA.
[20]Y.K. Jain, S.K. Bhandare, “Min Max Normalization Based Data Perturbation Method for Privacy Protection,” International Journal of Computer & communication Technology, vol. 2, no.8, pp. 45-50.
[21]R.P.L. Durgabai, “Feature Selection using ReliefF Algorithm,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 3, no 10, Oct pp. 8215-8288, 2014.
[22]H. Hijazi, C. Chan, “A classification framework applied to cancer gene expression profiles,” J. Healthc. Eng., vol. no. 4, 255-284, 2012. DOI: 10.1260/2040-2295.4.2.255.
[23]A. F. Alia and A. Taweel, “Feature Selection based on Hybrid Binary Cuckoo Search and Rough Set Theory in Classification for Nominal Datasets,” I.J. Information Technology and Computer Science, vol. 9, no. 4, pp. 63-72, 2017. DOI: 10.5815/ijitcs.2017.04.08.
[24]A. Enshaei and J. Faith, “Feature Selection with Targeted Projection Pursuit,” I.J. Information Technology and Computer Science, vol. 7, no. 5, pp. 34-39, 2015. DOI: 10.5815/ijitcs.2015.05.05.
[25]Y. Liu and U. Aickelin, “Feature Selection in Detection of Adverse Drug Reactions from the Health Improvement Network (THIN) Database,” I.J. Information Technology and Computer Science, vol. 7, no. 3, pp. 68-85, 2015. DOI: 10.5815/ijitcs.2015.03.10.
[26]L. Breiman, “Random forests,” Mach Learn, vol. 45, pp. 5-32, 2011.
[27]F. Mordelet, J. Horton, A.J. Hartemink, B.E. Engelhardt, R. Gordân, “Stability selection for regression-based models of transcription factor–DNA binding specificity,” Bioinformatics, vol. 29, pp. i117–i125, 2013. DOI: 10.1093/bioinformatics/btt221.
[28]Yijun Sun and Jian Li, “Iterative RELIEF for feature weighting,” ICML '06 Proceedings of the 23rd international conference on Machine learning, pp. 913-920, Pittsburgh, Pennsylvania, USA, June 25-29, 2006. DOI: 10.1109/TPAMI.2007.1093.
[29]A. Baratloo, M. Hosseini, A. Negida and G.E. Ashal, “Part 1: Simple definition and calculation of accuracy, sensitivity and specificity,” Emerg (Tehran), vol. 3, pp. 48-49, 2015.