Medical Big Data Classification Using a Combination of Random Forest Classifier and K-Means Clustering

Full Text (PDF, 391KB), PP.11-19

Views: 0 Downloads: 0


R. Saravana kumar 1,* P. Manikandan 2

1. Department of computer science and Engineering, Dayananda Sagar Academy of Technology and Management, Bangalore

2. Computer Science and Engineering Department from Malla Reddy Engineering College for Women, Maisammaguda, Secunderabad, Telangana

* Corresponding author.


Received: 19 Feb. 2018 / Revised: 15 Apr. 2018 / Accepted: 20 May 2018 / Published: 8 Nov. 2018

Index Terms

Decision trees, k-means clustering, medical big data, random forest, Classification


An efficient classification algorithm used recently in many big data applications is the Random forest classifier algorithm. Large complex data include patient record, medicine details, and staff data etc., comprises the medical big data. Such massive data is not easy to be classified and handled in an efficient manner. Because of less accuracy and there is a chance of data deletion and also data missing using traditional methods such as Linear Classifier K-Nearest Neighbor, Random Clustering K-Nearest Neighbor. Hence we adapt the Random Forest Classification using K-means clustering algorithm to overcome the complexity and accuracy issue. In this paper, at first the medical big data is partitioned into various clusters by utilizing k- means algorithm based upon some dimension. Then each cluster is classified by utilizing random forest classifier algorithm then it generating decision tree and it is classified based upon the specified criteria. When compared to the existing systems, the experimental results indicate that the proposed algorithm increases the data accuracy.

Cite This Paper

R. Saravana kumar, P. Manikandan, "Medical Big Data Classification Using a Combination of Random Forest Classifier and K-Means Clustering", International Journal of Intelligent Systems and Applications(IJISA), Vol.10, No.11, pp.11-19, 2018. DOI:10.5815/ijisa.2018.11.02


[1]U. Sivarajah, M. Kamal, Z. Irani and V. Weerakkody, "Critical analysis of Big Data challenges and analytical methods", Journal of Business Research, vol. 70, no.1, pp. 263-286, Jan 2017.
[2]A. Azar and A. Hassanien, "Dimensionality reduction of medical big data using neural-fuzzy classifier", Soft Computing, vol. 19, no. 4, pp. 1115-1127, June 2014.
[3]A. Nega and A. Kumlachew, "Data Mining Based Hybrid Intelligent System for Medical Application", International Journal of Information Engineering and Electronic Business, vol. 9, no. 4, pp. 38-46, 2017.
[4]I. Hashem, I. Yaqoob, N. Anuar, S. Mokhtar, A. Gani and S. Ullah Khan, "The rise of “big data” on cloud computing: Review and open research issues", Information Systems, vol. 47, No.1, pp. 98-115, January 2015.
[5]C. Philip Chen and C. Zhang, "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data", Information Sciences, vol. 275, No.22, pp. 314-347, August 2014.
[6]F. Costa, "Big data in biomedicine", Drug Discovery Today, vol. 19, no. 4, pp. 433-440, 2014.
[7]F. Shen, Q. Ouyang, W. Kasai and O. Hasegawa, "A general associative memory based on self-organizing incremental neural network", Neurocomputing, vol. 104, no.6, pp. 57-71, March 2013.
[8]A. Altaher, "An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features", Neural Computing and Applications, vol.27, no. 1,pp1-11, November 2016.
[9]D. Dimitrov, "Medical Internet of Things and Big Data in Healthcare", Healthcare Informatics Research, vol. 22, no. 3, p. 156, July 2016.
[10]K. Panihar and V. Verma, "A Study Some Data Mining Classification Techniques", International Journal of Modern Trends in Engineering & Research, vol. 4, no. 1, pp. 210-215, Jan 2017.
[11]N. Das, L. Das, S. Swarup Rautaray and M. Pandey, "Big Data Analytics for Medical Applications", International Journal of Modern Education and Computer Science, vol. 10, no. 2, pp. 35-42, 2018.
[12]G. M S, N. R and S. Prabhu, "High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework", International Journal of Intelligent Systems and Applications, vol.9, no. 1, pp.75, 2017.
[13]J. Zhang and J. Yang, "Linear reconstruction measure steered nearest neighbor classification framework", Pattern Recognition, vol. 47, no. 4, pp. 1709-1720, April 2014.
[14]Y. Chen, J. Yang, S. Liou, G. Lee and J. Wang, "Online classifier construction algorithm for human activity detection using a tri-axial accelerometer", Applied Mathematics and Computation, vol. 205, no. 2, pp. 849-860, November 2008.
[15]E. Althagafy and M. Jameel Qureshi, "Novel Cloud Architecture to Decrease Problems Related to Big Data", International Journal of Computer Network and Information Security, vol. 9, no. 2, pp. 53-60, Feb 2017.
[16]Z. Benmounah, S. Meshoul and M. Batouche, "Scalable Differential Evolutionary Clustering Algorithm for Big Data Using Map-Reduce Paradigm", International Journal of Applied Metaheuristic Computing, vol. 8, no.1, pp. 45-60, Jan 2017.
[17]G. Luo, "PredicT-ML: a tool for automating machine learning model building with big clinical data", Health Information Science and Systems, vol. 4, no. 1, 2016.
[18]M. Pratama, J. Lu and G. Zhang, "Evolving Type-2 Fuzzy Classifier", IEEE Transactions on Fuzzy Systems, vol. 24, no. 3, pp. 574-589, June 2016.
[19]P. Gutiérrez, M. Lastra, J. Bacardit, J. Benítez and F. Herrera, "GPU-SME-kNN: Scalable and memory efficient kNN and lazy learning using GPUs ", Information Sciences, vol. 373, no.9, pp. 165-182, December 2016.
[20]Y. Zhang, J. Ren, J. Liu, C. Xu, H. Guo and Y. Liu, "A Survey on Emerging Computing Paradigms for Big Data", Chinese Journal of Electronics, vol. 26, no. 1, pp. 1-12, Jan 2017.
[21]Y. Wang and N. Hajli, "Exploring the path to big data analytics success in healthcare", Journal of Business Research, vol. 70, no.1, pp. 287-299, Jan 2017.
[22]P. Marcheschi, "Relevance of eHealth standards for big data interoperability in radiology and beyond", La radiologia medica, vol. 122, no. 6, pp. 437-443, November 2016.
[23]D. Dimitrov, "Medical Internet of Things and Big Data in Healthcare", Healthcare Informatics Research, vol. 22, no. 3, p. 156, July 2016.