Challenges with Sentiment Analysis of On-line Micro-texts

Full Text (PDF, 892KB), PP.31-40

Views: 0 Downloads: 0


Ritesh Srivastava 1,* M.P.S. Bhatia 1

1. Computer Engineering Division, NSIT, Delhi University, New Delhi-INDIA

* Corresponding author.


Received: 5 Dec. 2016 / Revised: 20 Mar. 2017 / Accepted: 11 May 2017 / Published: 8 Jul. 2017

Index Terms

Sentiment analysis, On-line micro-texts, Natural language processing, Text Mining, Machine learning


With the evolution of World Wide Web (WWW) 2.0 and the emergence of many micro-blogging and social networking sites like Twitter, the internet has become a massive source of short textual messages called on-line micro-texts, which are limited to a few number of characters (e.g. 140 characters on Twitter). These on-line micro-texts are considered as real-time text streams. On-line micro-texts are extremely subjective; they contain opinions about various events, social issues, personalities, and products. However, despite being so voluminous in quantity, the qualitative nature of these micro-texts is very inconsistent. These qualitative inconsistencies of raw on-line micro-texts impose many challenges in sentiment analysis of on-line micro-texts by using the established methods of sentiment analysis of unstructured reviews. This paper presents many challenges and issues observed during sentiment analysis of On-line Micro-texts.

Cite This Paper

Ritesh Srivastava, M.P.S. Bhatia,"Challenges with Sentiment Analysis of On-line Micro-texts", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.7, pp.31-40, 2017. DOI:10.5815/ijisa.2017.07.04


[1]M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 168-177: ACM.
[2]R. Srivastava, M. Bhatia, H. K. Srivastava, and C. Sahu, "Exploiting grammatical dependencies for fine-grained opinion mining," in Computer and Communication Technology (ICCCT), 2010 International Conference on, 2010, pp. 768-775: IEEE.
[3]A. Kumar and M. S. Teeja, "Sentiment analysis: A perspective on its past, present and future," International Journal of Intelligent Systems and Applications, vol. 4, no. 10, p. 1, 2012.
[4]B. Narendra, K. U. Sai, G. Rajesh, K. Hemanth, M. C. Teja, and K. D. Kumar, "Sentiment Analysis on Movie Reviews: A Comparative Study of Machine Learning Algorithms and Open Source Technologies," International Journal of Intelligent Systems and Applications (IJISA), vol. 8, no. 8, p. 66, 2016.
[5](28/09/2016). Tweets. Available:
[6]A. Fahrni and M. Klenner, "Old wine or warm beer: Target-specific sentiment analysis of adjectives," in Proc. of the Symposium on Affective Language in Human and Machine, AISB, 2008, pp. 60-63.
[7]M. Hu and B. Liu, "Mining opinion features in customer reviews," in AAAI, 2004, vol. 4, no. 4, pp. 755-760.
[8]B. Liu, "Sentiment analysis and opinion mining," Synthesis lectures on human language technologies, vol. 5, no. 1, pp. 1-167, 2012.
[9]B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and trends in information retrieval, vol. 2, no. 1-2, pp. 1-135, 2008.
[10]A.-M. Popescu and O. Etzioni, "Extracting product features and opinions from reviews," in Natural language processing and text mining: Springer, 2007, pp. 9-28.
[11]P. D. Turney, "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews," in Proceedings of the 40th annual meeting on association for computational linguistics, 2002, pp. 417-424: Association for Computational Linguistics.
[12]A. Go, R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision," CS224N Project Report, Stanford, vol. 1, p. 12, 2009.
[13]L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao, "Target-dependent twitter sentiment classification," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 2011, pp. 151-160: Association for Computational Linguistics.
[14]H. Saif, Y. He, and H. Alani, "Alleviating data sparsity for twitter sentiment analysis," 2012: CEUR Workshop Proceedings (CEUR-WS. org).
[15]A. Bifet, G. Holmes, and B. Pfahringer, "Moa-tweetreader: real-time analysis in twitter streaming data," in International Conference on Discovery Science, 2011, pp. 46-60: Springer.
[16]A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," Journal of Machine Learning Research, vol. 11, no. May, pp. 1601-1604, 2010.
[17](28/09/2016). REST APIs. Available:
[18]S. Bird, "NLTK: the natural language toolkit," in Proceedings of the COLING/ACL on Interactive presentation sessions, 2006, pp. 69-72: Association for Computational Linguistics.
[19]G. van Rossum and F. L. Drake, "Python Reference Manual, PythonLabs, Virginia, USA, 2001," Available online at:(accessed 1 December 2012), 2001.
[20]R. C. Team, "R: A language and environment for statistical computing," 2013.
[21]B. Han, P. Cook, and T. Baldwin, "Automatically constructing a normalisation dictionary for microblogs," in Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, 2012, pp. 421-432: Association for Computational Linguistics.
[22]B. Han, P. Cook, and T. Baldwin, "Lexical normalization for social media text," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 4, no. 1, p. 5, 2013.
[23]R. Khoury, R. Khoury, and A. Hamou-Lhadj, "Microtext Processing," in Encyclopedia of Social Network Analysis and Mining: Springer, 2014, pp. 894-904.
[24]Z. Xue, D. Yin, B. D. Davison, and B. Davison, "Normalizing Microtext," Analyzing Microtext, vol. 11, p. 05, 2011.
[25]L. Derczynski, A. Ritter, S. Clark, and K. Bontcheva, "Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data," in RANLP, 2013, pp. 198-206.
[26]V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions and reversals," in Soviet physics doklady, 1966, vol. 10, p. 707.
[27]K. Toutanova and R. C. Moore, "Pronunciation modeling for improved spelling correction," in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002, pp. 144-151: Association for Computational Linguistics.
[28]G. Kothari, S. Negi, T. A. Faruquie, V. T. Chakaravarthy, and L. V. Subramaniam, "SMS based interface for FAQ retrieval," in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, 2009, pp. 852-860: Association for Computational Linguistics.
[29]S. Jiampojamarn, C. Cherry, and G. Kondrak, "Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion," in ACL, 2008, pp. 905-913.
[30]T. Rama, A. K. Singh, and S. Kolachina, "Modeling letter-to-phoneme conversion as a phrase based statistical machine translation problem with minimum error rate training," in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium, 2009, pp. 90-95: Association for Computational Linguistics.
[31]A. Van Den Bosch and S. Canisius, "Improved morpho-phonological sequence processing with constraint satisfaction inference," in Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology, 2006, pp. 41-49: Association for Computational Linguistics.
[32]M. S. Stinson, S. Eisenberg, C. Horn, J. Larson, H. Levitt, and R. Stuckless, "Real-time speech-to-text services," Reports of the National Task Force on Quality Services in Postsecondary Education of Deaf and Hard of Hearing Students. Rochester, NY: Northeast Technical Assistance Center, Rochester Institute of Technology, 1999.
[33]K. Taghva and J. Gilbreth, "Recognizing acronyms and their definitions," International Journal on Document Analysis and Recognition, vol. 1, no. 4, pp. 191-198, 1999.
[34]Y. Park and R. J. Byrd, "Hybrid text mining for finding abbreviations and their definitions," in Proceedings of the 2001 conference on empirical methods in natural language processing, 2001, pp. 126-133.
[35]K. Gimpel et al., "Part-of-speech tagging for twitter: Annotation, features, and experiments," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, 2011, pp. 42-47: Association for Computational Linguistics.
[36]The Twitter glossary. Available:
[37]H. Saif, Y. He, and H. Alani, "Semantic smoothing for twitter sentiment analysis," 2011.
[38]A. Ritter, S. Clark, and O. Etzioni, "Named entity recognition in tweets: an experimental study," in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, pp. 1524-1534: Association for Computational Linguistics.
[39]R. Srivastava and M. Bhatia, "Quantifying modified opinion strength: A fuzzy inference system for Sentiment Analysis," in Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on, 2013, pp. 1512-1519: IEEE.