Work place: Maharishi Dayanand University, Rohtak, Manva Rachna Collage of Engineering, Faridabad, India

E-mail: pushp12lata@gmail.com


Research Interests: Data Mining, Data Compression, Data Structures and Algorithms


Pushplata received the Bachelor degree in Computer Science and Engineering from Maharishi Dayanand University Rohtak, India in 2010. She is doing her Master's in Computer Engineering from Maharishi Dayanand University Rohtak (Manav Rachna College of Engineering). Her Research interest is Data Mining (Clustering) including theory and techniques of the data mining.

Author Articles
An Analytical Assessment on Document Clustering

By Pushplata Ram Chatterjee

DOI: https://doi.org/10.5815/ijcnis.2012.05.08, Pub. Date: 8 Jun. 2012

Clustering is related to data mining for information retrieval. Relevant information is retrieved quickly while doing the clustering of documents. It organizes the documents into groups; each group contains the documents of similar type content. Document clustering is an unsupervised approach of data mining. Different clustering algorithms are used for clustering the documents such as partitioned clustering (K-means Clustering) and Hierarchical Clustering (Agglomerative Hierarchical Clustering (AHC)). This paper presents analysis of Suffix Tree Clustering (STC) Algorithm and other clustering techniques (K-means, AHC) that are being done in literature survey. The paper also focuses on traditional Vector Space Model (VSM) for similarity measures, which is used for clustering the documents. This paper also focuses on the comparison of different clustering algorithms. STC algorithm improves the searching performance as compare to other clustering algorithms as the papers studied in literature survey. The paper presents STC algorithm applied on the search result documents, which is stored in the dataset. This paper articulates the key requirements for web document clustering and clusters would be created on the full text of the web documents. STC perform the clustering and make the clusters based on phrases shared between the documents. STC is faster clustering algorithm for document clustering.

[...] Read more.
Other Articles