Data Mining for Cyberbullying and Harassment Detection in Arabic Texts

Full Text (PDF, 1088KB), PP.41-50

Views: 0 Downloads: 0


Eman Bashir 1,* Mohamed Bouguessa 2

1. Collage of Computer Sciences and Information Technology, Sudan University of Science and Technology, Khartoum, Sudan

2. Department of Computer Science, University of Quebec at Montreal, Montreal, QC, Canada

* Corresponding author.


Received: 31 Jul. 2021 / Revised: 7 Aug. 2021 / Accepted: 23 Aug. 2021 / Published: 8 Oct. 2021

Index Terms

Cyberbullying, Social network, Arabic text and Deep learning


Broadly cyberbullying is viewed as a severe social danger that influences many individuals around the globe, particularly young people and teenagers. The Arabic world has embraced technology and continues using it in different ways to communicate inside social media platforms. However, the Arabic text has drawbacks for its complexity, challenges, and scarcity of its resources. This paper investigates several questions related to the content of how to protect an Arabic text from cyberbullying/harassment through the information posted on Twitter. To answer this question, we collected the Arab corpus covering the topics with specific words, which will explain in detail. We devised experiments in which we investigated several learning approaches. Our results suggest that deep learning models like LSTM achieve better performance compared to other traditional cyberbullying classifiers with an accuracy of 72%.

Cite This Paper

Eman Bashir, Mohamed Bouguessa, "Data Mining for Cyberbullying and Harassment Detection in Arabic Texts", International Journal of Information Technology and Computer Science(IJITCS), Vol.13, No.5, pp.41-50, 2021. DOI:10.5815/ijitcs.2021.05.04


[1]Abdur Rahman, Mobashir Sadat, Saeed Siddik, "Sentiment Analysis on Twitter Data: Comparative Study on Different Approaches", International Journal of Intelligent Systems and Applications (IJISA), Vol.13, No.4, pp.1-13, 2021. DOI: 10.5815/ijisa.2021.04.01
[2]Marina Azer, Mohamed Taha, Hala H. Zayed, Mahmoud Gadallah, "Credibility Detection on Twitter News Using Machine Learning Approach", International Journal of Intelligent Systems and Applications (IJISA), Vol.13, No.3, pp.1-10, 2021. DOI: 10.5815/ijisa.2021.03.01
[3]Waheed G. Gadallah, Nagwa M. Omar, Hosny M. Ibrahim, "Machine Learning-based Distributed Denial of Service Attacks Detection Technique using New Features in Software-defined Networks", International Journal of Computer Network and Information Security(IJCNIS), Vol.13, No.3, pp.15-27, 2021. DOI: 10.5815/ijcnis.2021.03.02
[4]Chen, Hsinchun. Dark web: Exploring and data mining the dark side of the web. Vol. 30. Springer Science & Business Media, 2011.
[5]Haidar, Batoul, Maroun Chamoun, and Ahmed Serhrouchni. "A multilingual System for Cyberbullying Detection: Arabic Content Detection Using Machine Learning." Advances in Science, Technology and Engineering Systems Journal 2.6 (2017): 275-284.
[6]Mubarak, Hamdy, Kareem Darwish, and Walid Magdy. "Abusive Language Detection on Arabic Social Media." Proceedings of the first workshop on abusive language online. 2017.
[7]Abozinadah, Ehab A., Alex V. Mbaziira, and J. Jones. "Detection of Abusive Accounts with Arabic Tweets." Int. J. Knowl. Eng.-IACSIT 1.2 (2015): 113-119.
[8]Dadvar, M., Trieschnigg, D., Ordelman, R. and de Jong, F. "Improving Cyberbullying Detection with User Context." European Conference on Information Retrieval. Springer, Berlin, Heidelberg, 2013.
[9]Chen, Hao, Susan McKeever, and Sarah Jane Delany. "Abusive Text Detection Using Neural Networks." AICS. 2017.
[10]Djuric, Nemanja, et al. "Hate Speech Detection with Comment Embeddings." Proceedings of the 24th international conference on world wide web. 2015.
[11]Gaydhani, A., Doma, V., Kendre, S. and Bhagwat, L."Detecting Hate Speech and Offensive Language on Twitter Using Machine Learning: An N-Gram and Tfidf Based Approach." arXiv preprint arXiv:1809.08651 (2018).
[12]Özel, S.A., Saraç, E., Akdemir, S. and Aksu, H. "Detection of Cyberbullying on Social Media Messages in Turkish." 2017 International Conference on Computer Science and Engineering (UBMK). IEEE, 2017.
[13]Saroufim, Carl, Akram Almatarky, and Mohammad Abdel Hady. "Language Independent Sentiment Analysis with Sentiment-Specific Word Embeddings." Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2018.
[14]Agrawal, Sweta, and Amit Awekar. "Deep Learning for Detecting Cyberbullying across Multiple Social Media Platforms." European conference on information retrieval. Springer, Cham, 2018.
[15]Al-Twairesh, N., Al-Khalifa, H., Al-Salman, A. and Al-Ohali, Y."Arasenti-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets." Procedia Computer Science 117 (2017): 63-72.
[16]Al-Ayyoub, Mahmoud, et al. "A Comprehensive Survey of Arabic Sentiment Analysis." Information processing & management 56.2 (2019): 320-342.
[17]Goldberg, Yoav. "Neural Network Methods for Natural Language Processing." Synthesis lectures on human language technologies 10.1 (2017): 1-309.
[18]URL: . Accessed Dec. 2019.
[19]Mikolov, T., Chen, K., Corrado, G. and Dean, J. "Efficient Estimation of Word Representations in Vector Space." arXiv preprint arXiv:1301.3781 (2013).
[20]URL: . Accessed Dec. 2020.
[21]Ghosal, Sambuddha, et al. "A Weakly Supervised Deep Learning Framework for Sorghum Head Detection and Counting." Plant Phenomics 2019 (2019).
[22]Lohar, P., Dutta Chowdhury, K., Afli, H., Hasanuzzaman, M. and Way, A."ADAPT at IJCNLP-2017 Task 4: A Multinomial Naive Bayes Classification Approach for Customer Feedback Analysis Task." (2017).
[23]Du, Wei, et al. "A Longitudinal Support Vector Regression for Prediction of ALS Score." 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2015.
[24]URL: Accessed Jan. 2020.
[25]URL: Accessed Jan. 2020.
[26]Diab, Diab M., and Khalil M. El Hindi. "Using Differential Evolution for Fine Tuning Naïve Bayesian Classifiers and Its Application for Text Classification." Applied Soft Computing 54 (2017): 183-199.
[27]Hossam Elzayady, Khaled M. Badran, Gouda I. Salama, "Arabic Opinion Mining Using Combined CNN - LSTM Models", International Journal of Intelligent Systems and Applications (IJISA), Vol.12, No.4, pp.25-36, 2020. DOI: 10.5815/ijisa.2020.04.03
[28]K Srinivasa Rao, G. Lavanya Devi, N. Ramesh, "Air Quality Prediction in Visakhapatnam with LSTM based Recurrent Neural Networks", International Journal of Intelligent Systems and Applications (IJISA), Vol.11, No.2, pp.18-24, 2019. DOI: 10.5815/ijisa.2019.02.03
[29]Munir Ahmad, Shabib Aftab, "Analyzing the Performance of SVM for Polarity Detection with Different Datasets", International Journal of Modern Education and Computer Science (IJMECS), Vol.9, No.10, pp. 29-36, 2017.DOI: 10.5815/ijmecs.2017.10.04
[30]Hilal Almarabeh,"Analysis of Students' Performance by Using Different Data Mining Classifiers", International Journal of Modern Education and Computer Science (IJMECS), Vol.9, No.8, pp.9-15, 2017.DOI: 10.5815/ijmecs.2017.08.02