A Genetic Programming Framework for Topic Discovery from Online Digital Library

Full Text (PDF, 551KB), PP.32-39

Views: 0 Downloads: 0


Yinxing Li 1,* Ning Li 2

1. Province Research Institution of Regional Economy Beihua University, Jilin, PR. China

2. School of Economics and Management China University of Petroleum, Dongying, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2010.01.05

Received: 15 Apr. 2010 / Revised: 22 Jun. 2010 / Accepted: 26 Aug. 2010 / Published: 8 Nov. 2010

Index Terms

Genetic Algorithms, Non-linear Matrix Factorization, Web-click Data, Convex Optimization, Interior Point Method


Various topic extraction techniques for digital libraries have been proposed over the past decade. Generally the topic extraction system requires a large number of features and complicated lexical analysis. While these features and analysis are effective to represent the statistical characteristics of the document, they didn't capture the high level semantics. In this paper, we present a new approach for topic extraction. Our approach combines user's click stream data with traditional lexical analysis. From our point of view, the user's click stream directly reflects human understanding of the high-level semantics in the document. Furthermore, a simple, yet effective, piece-wise linear model for topic evolution is proposed. We apply genetic algorithm to estimate the model and extract topics. Experiments on the set of US congress digital library documents demonstrate that our approach achieves better accuracy for the topic extraction than traditional methods.

Cite This Paper

Yinxing Li, Ning Li, "A Genetic Programming Framework for Topic Discovery from Online Digital Library", International Journal of Information Technology and Computer Science(IJITCS), vol.2, no.1, pp.32-39, 2010. DOI: 10.5815/ijitcs.2010.01.05


[1] J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In SIGIR, 1998.

[2] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information retrieval. Addison-Wesley, 1999.

[3] C. Barry and L. Schamber. Users criteria for relevance evaluation: A cross-situational comparison. Information Processing and Management, 34(2-3):219–236, 1998.

[4] N. J. Belkin. Intelligent information retrieval: Whose intelligence? In Proceedings of the Fifth International Symposium for Information Science, pages 25–31, 1996.

[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[6] http://pyevolve.sourceforge.net.

[7] D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.

[8] D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, volume 13, page 556C562, 2001.

[9] T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from flickr tags. In SIGIR, 2007.

[10] X. Xu and Z. Niu. Automatic document tagging in social semantic digital library. In ICONIP, volume 2, pages 344–351, 2009.

[11] Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007.