Multi-Label Classification of Qur’anic Similes: A Computational Approach to Arabic Rhetorical Theory
Similes (tashb¯ıh) serve as a cornerstone of Qur’anic eloquence, functioning as a vital cognitive tool for conveying complex theological concepts through accessible imagery. Despite the sophisticated taxonomies established by classical Arabic rhetoricians, contemporary computational approaches often rely on single-label classification paradigms. This methodological reductionism fails to capture the inherent “rhetorical overlap” where a single verse embodies multiple, non-mutually exclusive categories, such as being simultaneously explicit (mursal) and representational (tamth¯ıl¯ı). This study addresses this gap by formalizing Qur’anic simile classification as a multi-label learning task, bridging the divide between classical linguistic theory and modern Natural Language Processing. Utilizing an expert annotated dataset of 364 verses grounded in authoritative classical exegeses, we evaluated the performance of several Arabic-specific Transformer models, including AraBERT, CamelBERT, and MARBERT. Quantitative results demonstrate that MARBERT achieved superior performance, reaching a Micro F1 score of 0.7685 and a Macro F1-score of 0.6003, significantly outperforming traditional statistical baselines. The findings revealed a high label density across the corpus, providing empirical validation for the synergistic and multidimensional nature of Qur’anic rhetorical figures. Beyond technical metrics, this study contributes to “computational hermeneutics” by demonstrating that classical rhetorical categories function as structured, learnable knowledge. By successfully modeling overlapping categories, this study offers a novel methodology for Digital Humanities and provides a scalable framework for the automated analysis of highly sophisticated Classical Arabic texts.
المقاييس

هذا العمل مرخص بموجب Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.