A Study on Liver Disease Diagnosis based on Assessing the Importance of Attributes

Full Text (PDF, 468KB), PP.1-9

Views: 0 Downloads: 0


Kemal Akyol 1,* Yasemin Gultepe 1

1. Department of Computer Engineering, Kastamonu University, Kastamonu, 37100, Turkey

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.11.01

Received: 25 Jul. 2017 / Revised: 24 Aug. 2017 / Accepted: 11 Sep. 2017 / Published: 8 Nov. 2017

Index Terms

Classification, liver disease, under-sampling, stability selection, random forest


Liver is a needful body organ that forms an important barrier between the gastrointestinal blood, which contains large amounts of toxins, and antigens. Liver diseases contain hepatitis B and hepatitis C virus infections, alcoholic liver disease, nonalcoholic fatty liver disease and associated cirrhosis, liver failure and hepatocellular carcinoma are primary causes of death. The main purpose of this study is to investigate which attributes are important for effective diagnosis of liver disorders by performing the machine learning approach based on the combination of Stability Selection and Random Forest methods. In order to generate more accuracy, dataset was balanced by utilizing the Random Under-Sampling method. Important ones in all attributes were detected by utilizing the Stability Selection method which was performed on sub-datasets, which were obtained with 5 fold cross-validation technique. By sending these datasets to the Random Forest algorithm, the performance of the proposed approach was evaluated within the frame of accuracy and sensitive metrics. The experimental results clearly show that the Random Under-Sampling method can potentially improve the performance of the combination of Stability Selection and Random Forest methods in machine learning. And, the combination of these methods provides new perspectives for the diagnosis of this disease and other medical diseases.

Cite This Paper

Kemal Akyol, Yasemin Gültepe, " A Study on Liver Disease Diagnosis based on Assessing the Importance of Attributes", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.11, pp.1-9, 2017. DOI:10.5815/ijisa.2017.11.01


[1]E.M. Hashem, and M.S. Mabrouk, “A study of support vector machine algorithm for liver disease diagnosis,” American Journal of Intelligent Systems, vol. 4, pp. 9-14, 2014.
[2]W.A. Zatoński, U. Sulkowska, M. Mańczuk, J. Rehm, P. Boffetta, A.B. Lowenfels, and C. La Vecchia, “Liver cirrhosis mortality in Europe, with special attention to Central and Eastern Europe,” Eur Addict Res, vol. 16, pp. 193-201, 2010.
[3]Internet: Liver disease, http://www.nhs.uk/conditions/liver-disease/Pages/Introduction.aspx, 2017.
[4]F.S. Wang, J.G. Fan, Z. Zhang, B. Gao and H.Y. Wang, “The global burden of liver disease: The major impact of China,” Hepatology, vol. 60, pp. 2099-2108, 2014.
[5]Internet: Liver Center - Liver Disease Facts, http://livercenter.slu.edu/index.php?page=liver-disease-facts, 2017.
[6]Internet: Medline Plus. Alcoholic liver disease, http://www.nlm.nih.gov/medlineplus/ency/article/000281.htm, 2017.
[7]European Association for the Study of the Liver, “Clinical practical guidelines: management of alcoholic liver disease,” J. Hepatol, vol. 57, pp. 399-420, 2012.
[8]P. Byass, “The global burden of liver disease: a challenge for methods and for public health,” BMC Med., vol. 12, pp. 1-3, 2014.
[9]Internet: Alcohol - alcohol use disorders and alcoholic liver diseases, http://www.who.int/medicines/areas/priority_medicines/BP6_14Alcohol.pdf?ua=1, 2017.
[10]J. Pahareeya, R. Vohra, J. Makhijani and S. Patsariya, “Liver patient classification using intelligence techniques,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 4, pp. 295-299, 2014.
[11]H. Pakhale and D.K. Xaxa, “Development of an efficient classifier for classification of liver patient with feature selection,” International Journal of Computer Science and Information Technologies, vol. 7, pp. 1541-1544, 2016.
[12]S. Dhamodharan “Liver disease prediction using Bayesian classification,” 4th National Conference on Advanced Computing, Applications & Technologies, pp. 1-3, May 2014.
[13]Reetu and N. Kumar, “Medical diagnosis for liver cancer using classification techniques,” International journal of Recent Scientific Research, vol. 6, pp. 4809-4813, 2015.
[14]A.S. Aneeshkumar and C. Jothi Venkateswaran, “Estimating the survallience of liver disorder using classification algorithms,” International Journal of Computer Applications, vol. 57, pp. 39-42, 2012.
[15]H. Jin, S. Kim and J. Kim, “Decision factors on effective liver patient data prediction,” International Journal of Bio-Science and Bio-Technology, vol. 6, pp. 167-178, 2014.
[16]P. Rajeswari and G.S. Reena, “Analysis of liver disorder using data mining algorithm,” Global Journal of Computer Science and Technology, vol. 10, pp. 48-52, 2010.
[17]C. Liang and L. Peng, “An automated diagnosis system of liver disease using Artificial Immune and Genetic Algorithms,” J Med Syst., vol. 37, p. 9932, 2013.
[18]P. Saxena and S. Lehri, “Analysis of various clustering algorithms of data mining on health informatics,” International Journal of Computer & Communication Technology, vol. 4, pp. 108-112, 2013.
[19]S. Kant and I.A. Ansari, “An improved K means clustering with Atkinson index to classify liver patient dataset,” International Journal of System Assurance Engineering and Management, vol. 7 (Supplement 1), pp. 222–228, 2016.
[20]D.H. Fisher, “Knowledge acquisition via incremental conceptual clustering,” Machine Learning, vol. 2, pp. 139-172, 1987.
[21]M. Ester, H.P. Kriegel, J. Sander and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” In Proceedings of 2nd International Conference on KDD, pp. 226-231, 1996.
[22]Internet: Hierarchical and K-means clustering, https://www.autonlab.org/tutorials/kmeans.html, 2017.
[23]H. Sug, “Improving the prediction accuracy of liver disorder disease with oversampling,” Proceedings of the 6th WSEAS international conference on Computer Engineering and Applications, and Proceedings of the 2012 American conference on Applied Mathematics, Applied Mathematics in Electrical and Computer Engineering, pp. 331-335, Harvard, Cambridge, Jan 2012.
[24]T.Y. Park, M. Hong, H. Sung, S. Kim and K.T. Suk, “Effect of Korean Red Ginseng in chronic liver disease,” Journal of Ginseng Research, in press.
[25]D. Joshi, N. Gupta, M. Samyn, M. Deheragoda, F. Dobbels and M.A. Heneghan, “The management of childhood liver diseases in adulthood,” Journal of Hepatology, vol. 66, pp. 631-644, 2017.
[26]E. Buzzetti, P.M. Parikh, A. Gerussi and E. Tsochatzis, “Gender differences in liver disease and the drug-dose gender gap,” Pharmacological Research, vol. 120, pp. 97-108, 2017.
[27]J.F. Gallegos-Orozco and M.R. Charlton, “Alcoholic liver disease and liver transplantation,” Clinics in Liver Disease, vol. 20, pp. 521-534, 2016.
[28]J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, 3rd ed., San Francisco, CA, USA, 2011.
[29]K. Akyol, “Assessing the importance of attributes for diagnosis of diabetes disease,” International Journal of Information Engineering and Electronic Business(IJIEEB), vol. 9, pp. pp. 1-9, 2017.
[30]S. Chittineni and R.B. Bhogapathi, “Determining contribution of features in clustering multidimensional data using Neural Network,” I.J. Information Technology and Computer Science, vol. 10, pp. 29-36, 2012.
[31]R. Parimala and R. Nallaswamy, “Feature selection using a Novel Particle Swarm Optimization and It’s variants,” I.J. Information Technology and Computer Science, vol. 5, pp. 16-24, 2012.
[32]P. Kalpana and K. Mani, “An exploratory analysis between the feature selection algorithms IGMBD and IGChiMerge,” I.J. Information Technology and Computer Science, vol. 7, pp. 61-68, 2017.
[33]A.F. Alia and A. Taweel, “Feature selection based on hybrid Binary Cuckoo Search and Rough Set Theory in classification for nominal datasets,” I.J. Information Technology and Computer Science, vol. 4, pp. 63-72, 2017.
[34]F. Mordelet, J. Horton, A.J. Hartemink, B.E. Engelhardt, R. Gordân, “Stability selection for regression-based models of transcription factor–DNA binding specificity,” Bioinformatics, vol. 29, pp. i117–i125, 2013.
[35]L. Breiman, “Random forests,” Mach Learn, vol. 45, pp. 5-32, 2011.
[36]A. Baratloo, M. Hosseini, A. Negida and G.E. Ashal, “Part 1: Simple definition and calculation of accuracy. sensitivity and specificity,” Emerg (Tehran), vol. 3, pp. 48-49, 2015.