Improving the Efficiency of Term Weighting in Set of Dynamic Documents

Full Text (PDF, 544KB), PP.42-47

Views: 0 Downloads: 0


Mehdi Jabalameli 1,* Ala Arman 2 Mohammadali Nematbakhsh 1

1. Department of Computer Engineering, University of Isfahan, Isfahan, 8174673441, Iran

2. KTH Royal Institute of Technology, Stockholm, SE-100 44, Sweden

* Corresponding author.


Received: 5 Nov. 2014 / Revised: 3 Dec. 2014 / Accepted: 13 Jan. 2015 / Published: 8 Feb. 2015

Index Terms

Document Revision, Term Frequency, Term Weightings, Ranked Terms, Information retrieval process


In real information systems, there are few static documents. On the other hand, there are too many documents that their content change during the time that could be considered as signals to improve the quality of information retrieval. Unfortunately, considering all these changes could be time-consuming. In this paper, a method has been proposed that the time of analyzing these changes could be reduced significantly. The main idea of this method is choosing a special part of changes that do not make effective changes in the quality of information retrieval; but it could be possible to reduce the analyzing time. To evaluate the proposed method, three different datasets selected from Wikipedia. Different factors have been assessed in term weighting and the effect of the proposed method investigated on these factors. The results of empirical experiments showed that the proposed method could keep the quality of retrieved information in an acceptable rate and reduce the documents’ analysis time as a result.

Cite This Paper

Mehdi Jabalameli, Ala Arman, Mohammadali Nematbakhsh, "Improving the Efficiency of Term Weighting in Set of Dynamic Documents", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.2, pp.42-47, 2015. DOI:10.5815/ijmecs.2015.02.06


[1]E. Adar, J. Teevan, and S. T. Dumais, “Resonance on the Web: Web Dynamics and Revisitation Patterns,” in Proceedings of CHI 2009, 2009, "doi: 10.1145/1871437.1871519".
[2]E. Adar, J. Teevan, S. T. Dumais, and J. L. Elsas, “The Web Changes Everything: Understanding the Dynamics of Web Content,” in Proceedings of the Second ACM International Conference on Web Search and Data Mining, 2009, pp. 282–291, "doi: 10.1145/1498759.1498837".
[3]A. Aji, Y. Wang, E. Agichtein, and E. Gabrilovich, “Using the Past to Score the Present: Extending Term Weighting Models Through Revision History Analysis.,” in CIKM, 2010, pp. 629–638, "doi: 10.1145/1871437.1871519".
[4]R. Campos, G. Dias, A. M. Jorge, and A. Jatowt, “Survey of Temporal Information Retrieval and Related Applications,” ACM Comput. Surv., vol. 47, no. 2, pp. 15:1–15:41, 2014, "doi: 10.1145/2619088".
[5]M. Efron, “Linear Time Series Models for Term Weighting in Information Retrieval.,” JASIST, vol. 61, no. 7, pp. 1299–1312, 2010, "doi: 10.1002/asi.21315 ".
[6]J. L. Elsas and S. T. Dumais, “Leveraging Temporal Dynamics of Dcument Content in Relevance Ranking.,” in WSDM, 2010, pp. 1–10, "doi: 10.1145/1718487.1718489".
[7]N. Kanhabua, “Time-aware Approaches to InformationRetrieval,” SIGIR Forum, vol. 46, no. 1, p. 85, 2012, "doi: 10.1145/2215676.2215691.
[8]Nunes, C. Ribeiro, and G. David, “Term Weighting Based on Document Revision History.,” JASIST, vol. 62, no. 12, pp. 2471–2478, 2011, "doi: 10.1002/asi.21597".
[9]Nunes, C. Ribeiro, and G. David, “Term Frequency Dynamics in Collaborative Articles,” in Proceedings of the 10th ACM Symposium on Document Engineering, 2010, pp. 267–270, "doi: 10.1145/1860559.1860620".
[10]K. Radinsky, F. Diaz, S. Dumais, M. Shokouhi, A. Dong, and Y. Chang, “Temporal Web Dynamics and Its Application to Information Retrieval,” in Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 2013, pp. 781–782.
[11]A. Zubiaga, “Enhancing Navigation on Wikipedia with Social Tags,” CoRR, vol. abs/1202.5, 2012.
[12]G. Salton and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Inf. Process. Manag. an Int. J., vol. 24, no. 5, pp. 513–523, 1988, "doi: 10.1016/0306-4573(88)90021-0".