Lexicon-Based Approach in Sentiment Analysis of Yemeni Dialect for Social Media Network

https://doi.org/10.59628/jast.v2i5.1058

Authors

  • Alaa Abdulkareem Hameed Brihi Department of Computer Science, Faculty of Computer and Information Technology, University of Sana’a, Sana’a, Yemen
  • Mossa Ghurab Department of Computer Science, Faculty of Computer and Information Technology, University of Sana’a, Sana’a, Yemen

Keywords:

Arabic Sentiment Analysis, Yemeni Dialect, Lexicon-Based, Yemeni Lexicon and courps

Abstract

Recently, the number of Yemeni users has been expanding quickly on social media platforms. Most research in Arabic sentiment analysis has gained on Modern Standard Arabic (MSA) and some specific dialects, such as Egyptian, Levantine, and Gulf. However, there is a noticeable gap in Yemeni dialect sentiment analysis research. The reason for that is the lack of reliable Yemeni lexical and corpus and a real dataset for social media sentiment analysis. This research addresses this lack by presenting the Yemeni Dialect sentiment lexicon and corpus. This lexicon and corpus provide valuable resources for researchers and practitioners seeking to analyze sentiment in Yemeni dialect social media content, contributing to a better understanding of Yemeni public opinion, social media monitoring, marketing, cultural understanding, and assisting in efforts to respond to crises in Yemen. The Yemeni Dialect sentiment lexicon is enriched with a reasonable number of words and phrases categorized according to their positive and negative sentiment tendencies. Moreover, we constructed a corpus dataset of more than 54,000 comments built from the Facebook platform. A large dataset of unlabeled comments from the main Yemeni telecommunications companies in Yemen (Yemen Telecom, Yemen Mobile, YOU, and Sabafon), are people commenting on a public issue related to the services provided by those companies. The lexicon-based approach is used to extract the sentiment’s polarity and label each of the provided comments to formulate a corpus dataset as being either positive, negative, or neutral. The evaluation metrics of experiments are accuracy, recall, precision, f-measure, and the confusion matrix. The accuracy result of the lexicon-based labeling approach was calculated through a comparison between the achieved results and the ones achieved through manually labeled comments by three Yemeni experts. Evaluation results using a lexicon-based approach achieved an accuracy of 90.05%.

Downloads

Download data is not yet available.
 The percentage of lexicon word /phrase

Published

2024-10-31

How to Cite

Brihi, A. A. H., & Ghurab, M. (2024). Lexicon-Based Approach in Sentiment Analysis of Yemeni Dialect for Social Media Network. Sana’a University Journal of Applied Sciences and Technology , 2(5), 422–431. https://doi.org/10.59628/jast.v2i5.1058

Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.