Word Clustering as a Feature for Arabic Sentiment Classification

Full Text (PDF, 682KB), PP.1-13

Views: 0 Downloads: 0


Saud Alotaibi 1,* Charles Anderson 2

1. Umm Alqura Univerisity, Al Taif Road, Makkah and 24382, Saudi Arabia

2. Colorado State Univesity, 1100 Center Ave, Fort Collins, CO80521, US

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2017.01.01

Received: 23 Sep. 2016 / Revised: 1 Nov. 2016 / Accepted: 7 Dec. 2016 / Published: 8 Jan. 2017

Index Terms

Sentiment Classification, Polarity Classification, Arabic Natural Language Processing, Arabic Sentiment Sentence Classification, Machine Learning Classifier, Word Clustering


Rich morphology language, such as Arabic, requires more investigation and methods targeted toward improving the sentiment analysis task. An example of external knowledge that may provide some semantic relationships within the text is the word clustering technique. This article demonstrates the ongoing work that utilizes word clustering when conducting Arabic sentiment analysis. Our proposed method employs supervised sentiment classification by enriching the feature space model with word cluster information. In addition, the experiments and evaluations that were conducted in this study demonstrated that by combining the clustering feature with sentiment analysis for Arabic, this improved the performance of the classifier.

Cite This Paper

Saud Alotaibi, Charles Anderson,"Word Clustering as a Feature for Arabic Sentiment Classification", International Journal of Education and Management Engineering(IJEME), Vol.7, No.1, pp.1-13, 2017. DOI: 10.5815/ijeme.2017.01.01


[1] Farghaly and K. Shaalan, "Arabic Natural Language Processing: Challenges and Solutions," vol. 8, no. 4, pp. 14:1–14:22, Dec. 2009.

[2] Abbasi, H. Chen, and A. Salem, "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Trans. Inf. Syst., vol. 26, no. 3, pp. 12:1–12:34, Jun. 2008.

[3] A. Al-Subaihin, H. S. Al-Khalifa, and A. S. Al-Salman, "A Proposed Sentiment Analysis Tool for Modern Arabic Using Human-Based Computing," in Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, ser. iiWAS '11. New York, NY, USA: ACM, 2011, pp. 543–546.

[4] M. Abdul-Mageed, M. T. Diab, and M. Korayem, "Subjectivity and Sentiment Analysis of Modern Standard Arabic," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2, ser. HLT '11. Strouds- burg, PA, USA: Association for Computational Linguistics, 2011, pp. 587–591.

[5] A. El-Halees, "Arabic opinion mining using combined classification approach," in Proceeding The International Arab Conference On Information Technology, Azrqa, Jordan, 2011.

[6] Habernal, T. Ptacek, and J. Steinberger, "Supervised sentiment analysis in czech social media," Inf. Process. Manage., vol. 50, no. 5, pp. 693–707, Sep. 2014.

[7] M. Rushdi-Saleh, M. Martin-Valdivia, L. Urena-Lopez, and J. Perea-Ortega. OCA: Opinion corpus for arabic. Journal of the American Society for Information Science and Technology, volume 62(10): pages 2045–2054, 2011.

[8] M. Abdul-Mageed and M. Diab. AWATIF: A multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), pages 19–28, Istanbul, Turkey, may 2012.

[9] M. Abdul-Mageed, S. Kubler, and M. Diab. Samar: A system for subjectivity and sentiment analysis of arabic social media. In Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, pages 19–28. Association for Computational Linguistics, 2012.

[10] N. Farra, E. Challita, R. A. Assi, and H. Hajj, "Sentence- Level and Document-Level Sentiment Mining for Arabic Texts," in Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, dec. 2010, pp. 1114 –1119.

[11] M. Tkachenko and A. Simanovsky, "Named Entity Recognition: Exploring features," in Proceedings of KONVENS 2012, J.Jancsary, Ed.OGAI, September 2012, pp.118– 127, main track: oral presentations.

[12] R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning," in Proceedings of the 25th International Conference on Machine Learning, ser. ICML '08. New York, NY, USA: ACM, 2008, pp. 160–167.

[13] A. Mnih and G. Hinton, "A Scalable Hierarchical Dis- tributed Language Model," in Advances in Neural Infor- mation Processing Systems, vol. 21, 2008.

[14] M. Lamar, Y. Maron, M. Johnson, and E. Bi- enenstock, "Svd and clustering for unsupervised pos tagging," in Proceedings of the ACL 2010 Conference Short Papers, ser. ACLShort '10. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010, pp. 215–219. [Online].

[15] P. Liang, "Semi-supervised learning for natural lan- guage," in MASTER THESIS, MIT, 2005.

[16] P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, "Class-based n-gram models of natural language," Comput. Linguist., vol. 18, no. 4, pp. 467–479, Dec. 1992. [Online].

[17] L. Ratinov and D. Roth, "Design challenges and misconceptions in named entity recognition," in Proceedings of the Thirteenth Conference on Com- putational Natural Language Learning, ser. CoNLL '09. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009, pp. 147–155.

[18] Carletta, "Assessing agreement on classification tasks: the kappa statistic," Comput. Linguist., vol. 22, no. 2, pp. 249–254, Jun. 1996.

[19] M. Diab, "Second generation tools (AMIRA 2.0): Fast and robust tokenization, pos tagging, and base phrase chunking," in Proceedings of the Second International Conference on Arabic Language Resources and Tools, K. Choukri and B. Maegaard, Eds. Cairo, Egypt: The MEDAR Consortium, April 2009, pp. 285–288.

[20] A. El-Khair, "Effects of stop words elimination for arabic information retrieval: a comparative study," Inter- national Journal of Computing & Information Sciences, vol. 4, no. 3, pp. 119–133, 2006.

[21] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay, "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[22] Pang and L. Lee, "Opinion Mining and Sentiment Analysis," Found. Trends Inf. Retr., vol. 2, no. 1-2, pp. 1–135, Jan. 2008. [Online].

[23] N. Habash, A. Soudi, and T. Buckwalter. On Arabic Transliteration. In A. Soudi, A. d. Bosch, and G. Neumann, editors, Arabic Computational Morphology, volume 38 of Text, Speech and Language Technology, pages 15–22. Springer Netherlands, 2007.