Key Term Extraction using a Sentence based Weighted TF-IDF Algorithm

Full Text (PDF, 390KB), PP.11-19

Views: 0 Downloads: 0


T. Vetriselvi 1 N.P.Gopalan 2 G. Kumaresan 2,*

1. Department of Computer Science and Engineering, K. Ramakrishnan College of Technology, Tiruchirappalli, India

2. Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India

* Corresponding author.


Received: 6 Nov. 2018 / Revised: 25 Jan. 2019 / Accepted: 15 Feb. 2019 / Published: 8 Jul. 2019

Index Terms

Similarity Matrix, Term Count, WordNet


Keyword ranking with similarity identification is an approach to find the significant Keywords in a corpus using a Variant Term Frequency Inverse Document Frequency (VTF-IDF) algorithm. Some of these may have same similarity and they get reduced to a single term when WordNet is used. The proposed approach that does not require  any test or training set, assigns sentence  based Weightage to the keywords(terms) and it  is found to be  effective. Its suitability is analyzed with several data sets using precision and recall as metrics.

Cite This Paper

T. Vetriselvi, N. P. Gopalan, G. Kumaresan,"Key Term Extraction using a Sentence based Weighted TF-IDF Algorithm", International Journal of Education and Management Engineering(IJEME), Vol.9, No.4, pp.11-19, 2019. DOI: 10.5815/ijeme.2019.04.02


[1]S.Akter, AS.Asa and MP.Uddin, MD Hossain”An extractive text summarization technique for Bengali document (s) using K-means clustering algorithm  “on IEEE International Conference Imaging, Vision & Pattern Recognition (icIVPR), pp 1-6 , 2017.

[2]R.Silveira, V.Furtado, and V.Pinheiro “ Ranking Keyphrases from Semantic and Syntactic Features of Textual Terms”, Brazilian Conference on Intelligent Systems (BRACIS),  pp 134-139, , 2015

[3]M.Litvak and M.Last “Graph based keyword extraction for single –document summarization” on MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization pp:17-24,2008.

[4]P.Alireza and K.Mohadesh,”A Probabilistic Relational Model for Keyword Extraction” International Conference on Statistics in Science, Business and Engineering (ICSSBE) ,pp 1-5,2012.

[5]Sneha .S Desai, and Dr.J.A.Laxmonarayana ”WordNet and Semantic Similarity based Approach for Document Clustering”International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS),  pp 312-317 ,2016

[6]A .Guo, and T .Yang “Research And Improvement Of Feature Words  Weight Based On Tfidf Algorithm,” Information Technology, Networking, Electronic and Automation Control Conference, IEEE 2016 ,pp 415-419,2016

[7]C.Clifton, R.Cooley and J.Rennie “Topcat: Data Mining For Topic Identification In A Text Corpus” IEEE Transactions on Knowledge and Data Engineering  Vol 16, pp 949-964,Issue: 8, Aug. 2004 

[8]L.Suanmali and  N.Salim“ Fuzzy Genetic Semantic Based Text Summarization.IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), pp 1184-1191,2014

[9]A. Kiani, and  MR. Akbarzadeh Automatic Text Summarization Using: Hybrid Fuzzy GA-GP “IEEE International Conference on Fuzzy Systems Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada ,pp  977-983,2006

[10]P.Arora and O.Vikas  ” Semantic Searching and Ranking of Documents using Hybrid Learning System and WordNet” (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3,pp 113-120,2011

[11]L. Lemnitzer and P. Monache” Extraction and evaluation of keywords from Learning Objects – a multilingual approachs” Language Resources and Evaluation Conference LREC,  pp 112-120,2008

[12]YA.Jaradat and AT.Al-Taani “Hybrid-based Arabic Single-Document Text Summarization Approach Using Genatic Algorithm “7th International Conference on Information and Communication Systems (ICICS), pp 85-91 ,2016 

[13]Porter M.F., “An Algorithm for Suffix Stripping”, MCB UP Ltd Program, Vol. 14, no. 3, pp. 130-137, 1980.