Semi-Supervised Personal Name Disambiguation Technique for the Web

Full Text (PDF, 291KB), PP.28-36

Views: 0 Downloads: 0


P.Selvaperumal 1,* A.Suruliandi 1

1. Manonmaniam Sundaranar University/Department of Computer science and Engineering, Tirunelveli, India

* Corresponding author.


Received: 20 Nov. 2015 / Revised: 28 Dec. 2015 / Accepted: 23 Jan. 2016 / Published: 8 Mar. 2016

Index Terms

Personal Name disambiguation, Entity name disambiguation, Web page clustering


Personal name ambiguity in the web arises when more than one person shares the same name. Personal name disambiguation involves disambiguating the name by clustering web page collection such that each cluster represents a person having the ambiguous name. In this paper, a personal name disambiguation technique that makes use of rich set of features like Nouns, Noun phrases, and frequent keywords as features is proposed. The proposed method consists of two phases namely clustering seed pages and then clustering the actual web page collection. In the first phase, seed pages representing different namesakes are clustered and in the second phase, web pages in the collection are clustered with the similar seed page clusters. The usage of seed pages increases the accuracy of clustering process. Since it is difficult to predict the number of clusters need to be formed beforehand, the proposed technique uses Elbow method to calculate the number of clusters. The efficiency of the proposed name disambiguation technique is tested using both synthetic and organic datasets. Experimental result shows the proposed method achieves robust results across different datasets and outperforms many existing methods.

Cite This Paper

P.Selvaperumal, A.Suruliandi, "Semi-Supervised Personal Name Disambiguation Technique for the Web", International Journal of Modern Education and Computer Science(IJMECS), Vol.8, No.3, pp.28-36, 2016. DOI:10.5815/ijmecs.2016.03.04


