Clustering Techniques in Bioinformatics

Full Text (PDF, 273KB), PP.38-46

Views: 0 Downloads: 0


Muhammad Ali Masood 1,* M. N. A. Khan 1

1. Shaheed Zulfikar Ali Bhutto Institute of Science and Technologies, Islamabad, Pakistan

* Corresponding author.


Received: 2 Oct. 2014 / Revised: 16 Nov. 2014 / Accepted: 6 Dec. 2014 / Published: 8 Jan. 2015

Index Terms

Clustering Techniques, Data Mining, DBSCAN, Hierarchical Clustering, Performance Analysis


Dealing with data means to group information into a set of categories either in order to learn new artifacts or understand new domains. For this purpose researchers have always looked for the hidden patterns in data that can be defined and compared with other known notions based on the similarity or dissimilarity of their attributes according to well-defined rules. Data mining, having the tools of data classification and data clustering, is one of the most powerful techniques to deal with data in such a manner that it can help researchers identify the required information. As a step forward to address this challenge, experts have utilized clustering techniques as a mean of exploring hidden structure and patterns in underlying data. Improved stability, robustness and accuracy of unsupervised data classification in many fields including pattern recognition, machine learning, information retrieval, image analysis and bioinformatics, clustering has proven itself as a reliable tool. To identify the clusters in datasets algorithm are utilized to partition data set into several groups based on the similarity within a group. There is no specific clustering algorithm, but various algorithms are utilized based on domain of data that constitutes a cluster and the level of efficiency required. Clustering techniques are categorized based upon different approaches. This paper is a survey of few clustering techniques out of many in data mining. For the purpose five of the most common clustering techniques out of many have been discussed. The clustering techniques which have been surveyed are: K-medoids, K-means, Fuzzy C-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Self-Organizing Map (SOM) clustering.

Cite This Paper

Muhammad Ali Masood, M. N. A. Khan, "Clustering Techniques in Bioinformatics", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.1, pp.38-46, 2015. DOI:10.5815/ijmecs.2015.01.06


[1]O. O. Jelili, O. O. Ojeniyiand I. C. Obagbuwa. Application of K-Means Clustering Algorithm for Prediction of Students’ Academic Performance. International Journal of Computer Science and Information Security (IJCSIS), Vol. 7, No. 1, 2010
[2]T. Velmurugan. Efficiency of K-Means & K-Medoids Algorithms for Clustering Arbitrary Data Points. International Journal of Computer Technology & Applications (IJCTA), Vol. 3 (5) Sept-Oct 2012
[3]Tajunisha and Saravanan. Performance analysis of k-means with different initialization methods for high dimensional data. International Journal of Artificial Intelligence & Applications (IJAIA), Vol.1, No.4, October 2010
[4]M. Khalilian, N. Mustapha, M. N.Suliman and M. A.Mamat. A Novel K-Means Based Clustering Algorithm for High Dimensional Data Sets. International Multi Conference of Engineers and Computer Scientists (IMECS). Vol. I. March 17, 2010.
[5]J.H. Peter and A. Antonysamy. An Optimized Density Based Clustering Algorithm. International Journal of Computer Applications, Volume 6– No.9, September 2010
[6]J. Zhang, W. Li and J. Tan. An Improved Clustering Algorithm Based on Density Distribution Function. Computer and Information Science Vol. 3, No. 3; August 2010
[7]A. R. Pratap A, J. R. Devi, K. S. Vani and K. N. Rao. An Efficient Density based Improved K-Medoids Clustering algorithm. International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 2, No. 6, 2011
[8]S. A. L. Maryand K.R. S. Kumar. A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset. Journal of Computer Science 8 (5) 2012
[9]S. Kisilevich, F. Mansmann and D. Keim. P-DBSCAN: A density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. 1st International Conference and Exhibition on Computing for Geospatial Research & Application Article No. 38 ACM New York. 2010
[10]R. Mayer and A.Rauber. Visualizing Clusters in Self-Organizing Maps with Minimum Spanning Trees. K. Diamantaras, W. Duch, L.S. Iliadis (Eds.): ICANN 2010, Part II, LNCS 6353, pp. 426–431.Springer-Verlag Berlin Heidelberg. 2010
[11]B. Silva and N. Marques. Feature Clustering With Self-Organizing Maps and an Application to Financial Time-Series for Portfolio Selection. International Conference on Neural Computation (ICNC). 2010
[12]M.Sakthi and A. S. Thanamani. An Efficient Constrained K-Means Clustering using Self Organizing Map. International Journal of Computer Science and Information Security (IJCSIS), Vol. 9, No. 4. April 2011
[13]T.Velmurugan and T.Santhanam. Clustering Mixed Data Points Using Fuzzy C-Means Clustering Algorithm for Performance Analysis. International Journal on Computer Science and Engineering (IJCSE) Vol. 02, No. 09, 2010, 3100-3105
[14]X. SU, X. WANG, Z. WANG and Y. XIAO. A New Fuzzy Clustering Algorithm Based on Entropy Weighting. Journal of Computational Information Systems (JOFCIS) 6:10 (2010) 3319-3326. October, 2010
[15]S. P. Chatzis. A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. Expert Systems with Applications 38, 8684–8689. (2011)
[16]Iqbal S., Khalid M., Khan, M N A. A Distinctive Suite of
Performance Metrics for Software Design. International Journal of Software Engineering & Its Applications, 7(5), (2013).
[17]Iqbal S., Khan M.N.A., Yet another Set of Requirement Metrics for Software Projects. International Journal of Software Engineering & Its Applications, 6(1), (2012).
[18]Faizan M., Ulhaq S., Khan M N A., Defect Prevention and Process Improvement Methodology for Outsourced Software Projects. Middle-East Journal of Scientific Research, 19(5), 674-682, (2014).
[19]Faizan M., Khan M NA., Ulhaq S., Contemporary Trends in Defect Prevention: A Survey Report. International Journal of Modern Education & Computer Science, 4(3), (2012).
[20]Khan K., Khan A., Aamir M., Khan M N A., Quality Assurance Assessment in Global Software Development. World Applied Sciences Journal, 24(11), (2013).
[21]Amir M., Khan K., Khan A., Khan M N A., An Appraisal of Agile Software Development Process. International Journal of Advanced Science & Technology, 58, (2013).
[22]Khan, M., & Khan, M. N. A. Exploring Query Optimization Techniques in Relational Databases. International Journal of Database Theory & Application, 6(3). (2013).
[23]Khan, MNA., Khalid M., ulHaq S., Review of Requirements Management Issues in Software Development. International Journal of Modern Education & Computer Science, 5(1), (2013).
[24]Umar M., Khan, M N A., A Framework to Separate Non-Functional Requirements for System Maintainability. Kuwait Journal of Science & Engineering, 39(1 B), 211-231, (2012).
[25]Umar M., Khan, M. N. A, Analyzing Non-Functional Requirements (NFRs) for software development. In IEEE 2nd International Conference on Software Engineering and Service Science (ICSESS), 2011 pp. 675-678), (2011).
[26]Khan, M. N. A., Chat win, C. R., & Young, R. C. (2007). A framework for post-event timeline reconstruction using neural networks. Digital investigation, 4(3), 146-157.
[27]Khan, M. N. A., Chat win, C. R., & Young, R. C. (2007). Extracting Evidence from File system Activity using Bayesian Networks. International journal of Forensic computer science, 1, 50-63.
[28]Khan, M. N. A. (2012). Performance analysis of Bayesian networks and neural networks in classification of file system activities. Computers & Security, 31(4), 391-401.
[29]Rafique, M., & Khan, M. N. A. (2013). Exploring Static and Live Digital Forensics: Methods, Practices and Tools. International Journal of Scientific & Engineering Research 4(10): 1048-1056.
[30]Bashir, M. S., & Khan, M. N. A. (2013). Triage in Live Digital Forensic Analysis. International journal of Forensic Computer Science 1, 35-44.