Work place: Department of Computer Science & Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, India



Research Interests: Computer systems and computational processes, Computer Architecture and Organization, Data Mining, Data Structures and Algorithms, Analysis of Algorithms


Bhupinderjit Singh received his M Tech degree in Computer Science and Engineering from Dr. B R Ambedkar National Institute of Technology, Jalandhar. His research interest includes Web Mining, Data Structures and Algorithms.

Improved Architecture of Focused Crawler on the basis of Content and Link Analysis

By Bhupinderjit Singh Deepak Kumar Gupta Raj Mohan Singh

DOI:, Pub. Date: 8 Nov. 2017

World Wide Web is a vast, dynamic and continuously growing collection of web documents. Due to its huge size, it is very difficult for the users to search for the relevant information about a particular topic of interest. In this paper, an improved architecture of focused crawler is proposed, which is a hybrid of various techniques used earlier. The main goal of a focused crawler is to fetch the web documents which are related to a pre-defined set of topics/domains and to ignore the irrelevant web pages. To check the relevancy of a web page, Page Score is computed on the basis of content similarity of the web page with reference to the topic keywords. URLs Priority Queue is implemented by calculating the Link Score of extracted URLs based on URLs attributes. URLs queue is also optimized by removing the duplicate contents. Topic Keywords Weight Table is expanded by extracting more keywords from the relevant pages database and recalculating the keywords weight. The experimental result shows that our proposed crawler has better efficiency than the earlier crawlers.

