MMeMeR: An Algorithm for Clustering Heterogeneous Data using Rough Set Theory

Full Text (PDF, 456KB), PP.25-33

Views: 0 Downloads: 0


B.K. Tripathy 1,* Akarsh Goyal 1 Rahul Chowdhury 1 Patra Anupam Sourav 1

1. School of Computing Science and Engineering VIT University, Vellore-632014 Tamil Nadu, India

* Corresponding author.


Received: 13 Feb. 2017 / Revised: 10 Mar. 2017 / Accepted: 17 Mar. 2017 / Published: 8 Aug. 2017

Index Terms

Categorical data, clustering, uncertainty, MMR, MMeR, SDR, SSDR


In recent times enumerable number of clustering algorithms have been developed whose main function is to make sets of objects having almost the same features. But due to the presence of categorical data values, these algorithms face a challenge in their implementation. Also some algorithms which are able to take care of categorical data are not able to process uncertainty in the values and so have stability issues. Thus handling categorical data along with uncertainty has been made necessary owing to such difficulties. So, in 2007 MMR algorithm was developed which was based on basic rough set theory. MMeR was proposed in 2009 which surpassed the results of MMR in taking care of categorical data and it could also handle heterogeneous values as well. SDR and SSDR were postulated in 2011 which were able to handle hybrid data. These two showed more accuracy when compared to MMR and MMeR. In this paper, we further make improvements and conceptualize an algorithm, which we call MMeMeR or Min-Mean-Mean-Roughness. It takes care of uncertainty and also handles heterogeneous data. Standard data sets have been used to gauge its effectiveness over the other methods.

Cite This Paper

B.K. Tripathy, Akarsh Goyal, Rahul Chowdhury, Patra Anupam Sourav, "MMeMeR: An Algorithm for Clustering Heterogeneous Data using Rough Set Theory", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.8, pp.25-33, 2017. DOI:10.5815/ijisa.2017.08.03


[1]Dempster, A., Laird, N. and Rubin, D., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, Vol.39 (1), (1977), pp. 1–38. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
[2]Gibson, D., Kleinberg, J. and Raghavan, P., “Clustering categorical data: an approach based on dynamical systems”, The Very Large Data Bases Journal, Vol.8 (3–4), (2000), pp. 222–236.
[3]Guha, S., Rastogi, R. and Shim, K., “ROCK: a robust clustering algorithm for categorical attributes, Information Systems”, Vol.25 (5), (2000), pp. 345–366.
[4]Haimov, S., Michalev,M. and Savchenko, A. and Yordanov, O., “Classification of radar signatures by autoregressive model fitting and cluster analysis”, IEEE Transactions on Geo Science and Remote Sensing Vol.8 (1), (1989), pp. 606–610.
[5]Huang, Z., “Extensions to the k-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, Vol.2 (3), (1998), pp. 283–304.
[6]Johnson, R. and Wichern, W., “Applied Multivariate Statistical Analysis”, Prentice Hall, New York, (2002).
[7]Jiang, D., Tang, C. and Zhang, A., “Cluster analysis for gene expression data: a survey”, IEEE Transactions on Knowledge and Data Engineering, Vol 16 (11), (2004), pp. 1370–1386.
[8]Kim, D., Lee, K. and Lee, D., “Fuzzy clustering of categorical data using fuzzy centroids”, Pattern Recognition Letters, Vol.25 (11), (2004), pp. 1263–1271.
[9]Mathieu, R. and Gibson, G., “A Methodology for large scale R&D planning based on cluster analysis”, IEEE Transactions on Engineering Management 40 (3) (2004), pp. 283–292.
[10]Mazlack, L.J., He, A. and Zhu, Y., “A rough set approach in choosing partitioning attributes”, Proceedings of the ISCA 13th International Conference (CAINE-2000), (2000).
[11]Parmar, D., Wu, T. and B, Jennifer, “MMR: An algorithm for clustering categorical data using Rough Set Theory”, Data & Knowledge Engineering, Vol.63, (2007), pp.879 - 893.
[12]Pawlak, Z., “Rough Sets”, Int. Jour of Computer and information Sciences, Vol.11, (1982), pp.341- 356.
[13]Pawlak, Z., “Rough Sets- Theoretical Aspects of Reasoning About Data”. Norwell: Kluwar Academic Publishers, (1992).
[14]Sharmila, B.K. and Tripathy, B.K., “Clustering Mixed Data using Neighborhood Rough Sets”, International Journal of Advanced Intelligence Paradigms, September (2016).
[15]Sharmila, B.K. and Tripathy, B.K., “Exploring incidence-prevalence patterns in spatial epidemiology via neighborhood rough sets”, International Journal of Healthcare Information Systems and Informatics, Vol. 12(1), (2017), pp. 30-43.
[16]Swarnalatha, P. and Tripathy, B.K., "A Centroid Model for the Depth Assessment of Images using Rough Fuzzy Set Techniques", IJISA, vol.4 (3), (2012), pp.20-26.
[17]Tripathy, B.K. and Ghosh, A., “Data Clustering Algorithms Using Rough Sets”, Handbook of Research on Computational Intelligence for Engineering, Science, and Business, (2012), p.297.
[18]Tripathy, B.K. and Ghosh, A., "SDR: An algorithm for clustering categorical data using rough set theory", Recent Advances in Intelligent Computational Systems (RAICS), 2011 IEEE, Trivandrum, (2011), pp. 867-872.
[19]Tripathy, B.K. and Ghosh, A., “SSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory”, Advances in Applied Science Research, Vol.2 (3), (2011), pp. 314-326.
[20]Tripathy, B.K., Goyal, A. and Patra, A.S., “ A Comparative Analysis of Rough Intuitionistic Fuzzy K-mode for Clustering Categorical Data”, Research Journal of Pharmaceutical, Biological and Chemical Sciences, Vol. 7(5), (2016), pp. 2787-2802.
[21]Tripathy, B.K., Goyal, A. and Patra, A.S., “Clustering Categorical Data Using Intuitionistic Fuzzy K-mode”, International Journal of Pharmacy and Technology, Vol. 8 (3), September (2016), pp. 16688-16701.
[22]Tripathy, B.K., Khandelwal, S., and Satapathy, M.K., "A Bag Theoretic Approach towards the Count of an Intuitionistic Fuzzy Set", IJISA, vol.7 (5), (2015), pp.16-23.
[23]Tripathy, B.K. and Kumar, M S, “Ch.: MMeR: An algorithm for clustering Heterogeneous data using rough Set Theory”, International Journal of Rapid Manufacturing (special issue on Data Mining) (Switzerland), Vol.1, Issue No.2, (2009), pp.189-207.
[24]Tripathy, B.K. and Nagaraju, M., "On Some Topological Properties of Pessimistic Multigranular Rough Sets", IJISA, vol.4 (8), 2012, pp.10-17.
[25]Tripathy, B.K. and Parida, S.C., "Covering Based Optimistic Multigranular Approximate Rough Equalities and their Properties", International Journal of Intelligent Systems and Applications (IJISA), Vol.8 (6), (2016), pp.70-79.
[26]Tripathy, B. K., Rawat, R., Vani, D., and Parida, S.C., "Approximate Reasoning through Multigranular Approximate Rough Equalities", IJISA, vol.6 (8), (2014), pp.69-76.
[27]Wong, K., Feng, D. and Meikle, S. and Fulham, M., “Segmentation of dynamic pet images using cluster analysis”, IEEE Transactions on Nuclear Science Vol.49 (1), (2002), pp. 200–207.
[28]Wu, S., Liew, A., Yan, H. and Yang, M., “Cluster analysis of gene expression data based on self-splitting and merging competitive learning”, IEEE Transactions on Information Technology in Biomedicine, Vol 8(1), (2004), pp.5–15.