An Efficient Approach for Keyphrase Extraction from English Document

Full Text (PDF, 409KB), PP.59-66

Views: 0 Downloads: 0


Imtiaz Hossain Emu 1,* Asraf Uddin Ahmed 1 Manowarul Islam 2 Selim Al Mamun 2 Ashraf Uddin 3

1. Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh

2. Department of Electrical and Communication Engineering, Okayama University, Okayama, Japan

3. Department of Information Technology, Federation University, Australia

* Corresponding author.


Received: 9 Apr. 2017 / Revised: 4 Aug. 2017 / Accepted: 13 Sep. 2017 / Published: 8 Dec. 2017

Index Terms

Keypharse, Stemming, Keyphrase Nomination, Term Frequency, Inverse Document Frequency


Keyphrases are set of words that reflect the main topic of interest of a document. It plays vital roles in document summarization, text mining, and retrieval of web contents. As it is closely related to a document, it reflects the contents of the document and acts as indices for a given document. Extracting the ideal keyphrases is important to understand the main contents of the document. In this work, we present a keyphrase extraction method that efficiently finds the keywords from English documents. The methods use some important features of the document such as TF, TF*IDF, GF, GF*IDF, TF*GF*IDF for the purpose. Finally, the performance of the proposal is evaluated using well-known document corpus.

Cite This Paper

Imtiaz Hossain Emu, Asraf Uddin Ahmed, Manowarul Islam, Selim Al Mamun, Ashraf Uddin, "An Efficient Approach for Keyphrase Extraction from English Document", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.12, pp.59-66, 2017. DOI:10.5815/ijisa.2017.12.06


[1]T. M. Froese B. Kosovac, D. J. Vanier. “Use of keyphrase extraction software for creation of an aec/fm thesaurus”, Journal of Information Technology in Construction, pages 25–36, 2000.
[2]M. Mahoui S.Jonse. “Hierarchical document clustering using automati¬cally extracted keyphrase”, In proceedings of the third international Asian conference on digital libraries, pages 113–120, Seoul, Korea, 2000.
[3]Matsuo and M. Ishizuka. “Keyword extraction from a single document using word co-occurrence statistical information”, International Journal on Artificial Intelligence., 13(1):157–169, 2004.
[4]A. Hulth. “Improved automatic keyword extraction given more linguistic knowledge”, In Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pages 216–223, Sapporo, Japan, 2003.
[5]Q. Li Y. B. Wu. “Document keyphrases as subject metadata: incorporating document keys concepts in search results”, Journal of Information Retrieval., 11(13):229–249, 2008.
[6]M. Staveley S. Jones. “Phrasier: A system for interactive document retrieval using keyphrases”, In Proceedings of of SIGIR, Berkeley, 1999.
[7]C. Gutwin, G. Paynter, I. Witten, C. Nevill Manning, and E. Frank. “Document keyphrases as subject metadata: incorporating document keys concepts in search results”, Journal of Decision Support Systems., 27(1):81–104, 2003.
[8]Kamal Sarkar. “Automatic keyphrase extraction from bengali documents: A preliminary study”, In Proceedings of Second International Confer¬ence on Emerging Applications of Information Technology, India, 2011.
[9]L. Plas, V.Pallotta, M.Rajman, and H.Ghorbel. “Automatic keyword extraction from spoken text. a comparison of two lexical resources: the edr and wordnet”, In Proceedings of the 4th International Language Resources and Evaluation, European Language Resource Association, 2004, 2004.
[10]I. H. Witten, G.W. Paynter, and E. Frank. “Kea: Practical automatic keyphrase extraction”, In Proceedings of Digital Libraries 99: The Fourth ACM Conference on Digital Libraries., pages 254–255, ACM Press, Berkeley, CA, 1999.
[11]Y. Matsuo, Y. Ohsawa, and M. Ishizuka. “Keyworld: Extracting keywords from a document as a small world”, In K. P. Jantke, A. shinohara (eds.): DS 2001. Lecture Notes in Computer Science,Springer-Verlag, Berlin Heidelberg, 2226(1):271–281, 2001.
[12]Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. “Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction”, In ACL. The Association for Computational Linguistics, 2007.
[13]Y. HaCohen-Kerner. “Automatic extraction of keywords from abstracts”, In V. Palade, R. J. Howlett, L. C. Jain (eds.): KES 2003. Lecture Notes in Artificial Intelligence,Springer-Verlag, Berlin Heidelberg, 2773:843– 849, 2003.
[14]Y. HaCohen-Kerner, Z. Gross, and A. Masa. “Automatic extraction and learning of keyphrases from scientific articles”, In A. Gelbukh (ed.): CICLing 2005. Lecture Notes in Computer Science,Springer-Verlag, Berlin Heidelberg, 3406:657–669, 2005.
[15]Shi, Wei, Weiguo Zheng, Jeffrey Xu Yu, Hong Cheng, and Lei Zou. “Keyphrase Extraction Using Knowledge Graphs”, In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, pp. 132-148. Springer, Cham, 2017.
[16]Bougouin, Adrien, Florian Boudin, and Béatrice Daille. “Keyphrase Annotation with Graph Co-Ranking”, arXiv preprint arXiv: 1611.02007,2016.
[17]Murali Krishna V.V. Ravinuthala, Satyananda Reddy Ch., Thematic “Text Graph: A Text Representation Technique for Keyword Weighting in Extractive Summarization System”, International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.8, No.4, pp.18-25, 2016. DOI: 10.5815/ijieeb.2016.04.03
[18]Lee-Feng Chien. “Pat-tree-based adaptive keyphrase extraction for intel¬ligent chinese information retrieval”, Inf. Process. Manage., 35(4):501– 521, 1999.
[19]Marina Litvak. “Graph-based keyword extraction for single-document summarization”, In Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization, pages 17–24, 2008.
[20]Jiabing Wang, Hong Peng, and Jing-Song Hu. “Automatic keyphrases extraction from document using neural network”, In ICMLC, pages 633– 641, 2005.
[21]Claude Pasquier. “Single document keyphrase extraction using sentence clustering and latent dirichlet allocation”, In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 154–157, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[22]P. D. Turney. “Learning algorithm for keyphrase extraction”, Journal of Information Retrieval, 2(4):303–336, 2000.
[23]Pabitha, P., Suganthi, S. and Ram, R.,. “Key Phrase Extraction Using Naive Bayes’ in Question Generation System”, Asian Journal of Information Technology, 15(3), pp.372-375, 2016.
[24]Kathait, S.S., Tiwari, S., Varshney, A. and Sharma, A. “Unsupervised Key-phrase Extraction using Noun Phrases”, International Journal of Computer Applications, 162(1), 2017.
[25]Gadag, Ashwini I., and B. M. Sagar. “N-gram based paraphrase generator from large text document”, In Computation System and Information Technology for Sustainable Solutions (CSITSS), International Conference on, pp. 91-94. IEEE, 2016.
[26]Shirakawa, Masumi, Takahiro Hara, and Shojiro Nishio. “N-gram idf: A global term weighting scheme based on information distance”, In Proceedings of the 24th International Conference on World Wide Web, pp. 960-970. International World Wide Web Conferences Steering Committee, 2015.
[27]Chatterjee, Niladri, and Neha Kaushik. “RENT: Regular Expression and NLP-Based Term Extraction Scheme for Agricultural Domain”, In Proceedings of the International Conference on Data Engineering and Communication Technology, pp. 511-522. Springer Singapore, 2017.
[28]Nesi, Paolo, Gianni Pantaleo, and Gianmarco Sanesi. “A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents”, In DMS, pp. 155-161. 2015.
[29]Onan, Aytuğ, Serdar Korukoğlu, and Hasan Bulut. “Ensemble of keyword extraction methods and classifiers in text classification”, Expert Systems with Applications 57 pp. 232-247, 2016.
[30]Habibi, M. and Popescu-Belis, A.. “Keyword extraction and clustering for document recommendation in conversations”, IEEE/ACM Transactions on audio, speech, and language processing, 23(4), pp.746-759, 2015.
[31]Rohini P. Kamdi, Avinash J. Agrawal, “Keywords based Closed Domain Question Answering System for Indian Penal Code Sections and Indian Amendment Laws”, I.J. Intelligent Systems and Applications (IJISA), vol.7, no.12, pp.57-67, 2015. DOI: 10.5815/ijisa.2015.12.06