Ram Chatterjee

Work place: Maharishi Dayanand University, Rohtak, Manva Rachna Collage of Engineering, Faridabad, India

E-mail: ramchatterjee.mrce@mrei.ac.in


Research Interests: Software Engineering, Software Organization and Properties, Computer Architecture and Organization, Data Mining


Mr. Ram Chatterjee received his Master's in Master of Computer Application and M.Tech (Computer Science and Engineering) from CDAC, Noida. He is working as Assistant Professor in Manav Rachna College of Engineering, Computer Science Department, Faridabad - 121004. Haryana, INDIA. His interest area is Data mining and Software Engineering.

Author Articles
An Analytical Assessment on Document Clustering

By Pushplata Ram Chatterjee

DOI: https://doi.org/10.5815/ijcnis.2012.05.08, Pub. Date: 8 Jun. 2012

Clustering is related to data mining for information retrieval. Relevant information is retrieved quickly while doing the clustering of documents. It organizes the documents into groups; each group contains the documents of similar type content. Document clustering is an unsupervised approach of data mining. Different clustering algorithms are used for clustering the documents such as partitioned clustering (K-means Clustering) and Hierarchical Clustering (Agglomerative Hierarchical Clustering (AHC)). This paper presents analysis of Suffix Tree Clustering (STC) Algorithm and other clustering techniques (K-means, AHC) that are being done in literature survey. The paper also focuses on traditional Vector Space Model (VSM) for similarity measures, which is used for clustering the documents. This paper also focuses on the comparison of different clustering algorithms. STC algorithm improves the searching performance as compare to other clustering algorithms as the papers studied in literature survey. The paper presents STC algorithm applied on the search result documents, which is stored in the dataset. This paper articulates the key requirements for web document clustering and clusters would be created on the full text of the web documents. STC perform the clustering and make the clusters based on phrases shared between the documents. STC is faster clustering algorithm for document clustering.

[...] Read more.
Other Articles