A Novel Hierarchical Document Clustering Framework on Large TREC Biomedical Documents

Full Text (PDF, 321KB), PP.16-22

Views: 0 Downloads: 0


Pilli. Lalitha Kumari 1,* M. Jeeva 2 Ch. Satyanarayana 3

1. Department of CSE, Malla Reddy Institute of Technology, Secunderabad, Telangana, India

2. Computer Science and Engineering, Knowledge Institute of Technology, Tamilnadu, India

3. Computer Science and Engineering, University College of Engineering, JNTUK, Kakinada, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2022.03.02

Received: 12 Aug. 2021 / Revised: 28 Oct. 2021 / Accepted: 1 Dec. 2021 / Published: 8 Jun. 2022

Index Terms

Similarity, Retrieval, Clustering and Classification, Hierarchical Methods, Phrase patterns


The growth of microblogging sites such as Biomedical, biomedical, defect, or bug databases makes it difficult for web users to share and express their context identification of sequential key phrases and their categories on text clustering applications. In the traditional document classification and clustering models, the features associated with TREC texts are more complex to analyze. Finding relevant feature-based key phrase patterns in the large collection of unstructured documents is becoming increasingly difficult, as the repository's size increases. The purpose of this study is to develop and implement a new hierarchical document clustering framework on a large TREC data repository. A document feature selection and clustered model are used to identify and extract MeSH related documents from TREC biomedical clinical benchmark datasets. Efficiencies of the proposed model are indicated in terms of computational memory, accuracy, and error rate, as demonstrated by experimental results.

Cite This Paper

Pilli. Lalitha Kumari, M. Jeeva, Ch. Satyanarayana, "A Novel Hierarchical Document Clustering Framework on Large TREC Biomedical Documents", International Journal of Information Technology and Computer Science(IJITCS), Vol.14, No.3, pp.16-22, 2022. DOI:10.5815/ijitcs.2022.03.02


[1]W. Dai, G. Xue, Qi. Yang and Y. Yu, "Co-clustering based Classification for Out-of-domain Documents", Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp.210-219, 2007.
[2]S. W. Chan and M. W Chong, "Unsupervised clustering for non-textual web document classification", Decision Support Systems, pp.377-396, 2004.
[3]D. Curtis, V. Kubushyn, E. A. Yfantis and M. Rogers, "A Hierarchical Feature Decomposition Clustering Algorithm for Unsupervised Classification of Document Image Types", Sixth International Conference on Machine Learning and Applications, pp.423-428, 2007.
[4]I. Diaz-Valenzuela, V. Loia, M. J. Martin-Bautista, S. Senatore and M. A. Vila, "Automatic constraints generation for semi-supervised clustering: experiences with documents classification", Soft Computing 20, no. 6, pp. 2329-2339, 2016.
[5]C. Hachenberg and T. Gottron, "Locality Sensitive Hashing for Scalable Structural Classification and Clustering of Web Documents", Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, pp.359-363, 2013.
[6]S. Jiang, J. Lewis, M. Voltmer and H. Wang, "Integrating Rich Document Representations for Text Classification", IEEE Systems and Information Engineering Design Conference (SIEDS '16), pp.303-308, 2016.
[7]W. Ke, "Least Information Document Representation for Automated Text Classification", Proceedings of the American Society for Information Science and Technology 49.1, pp.1-10, 2012.
[8]B. Lin and T. Chen, "Genre Classification for Musical Documents Based on Extracted Melodic Patterns and Clustering", Conference on Technologies and Applications of Artificial Intelligence, pp. 39-43, 2012.
[9]L. N. Nam and H. B. Quoc, "A Combined Approach for Filter Feature Selection in Document Classification", IEEE 27th International Conference on Tools with Artificial Intelligence, pp.317-324, 2015.
[10]S. Shruti and L. Shalini, "Sentence Clustering in Text Document Using Fuzzy Clustering Algorithm", International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp.1473-1476, 2014.
[11]Michalis, "Clustering Validity Assessment: Finding the optimal partitioning of a data set", Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference.