158
v1v2 (latest)

MuDRiC: Multi-Dialect Reasoning for Arabic Commonsense Validation

Main:4 Pages
3 Figures
Bibliography:4 Pages
4 Tables
Appendix:3 Pages
Abstract

Commonsense validation evaluates whether a sentence aligns with everyday human understanding, a critical capability for developing robust natural language understanding systems. While substantial progress has been made in English, the task remains underexplored in Arabic, particularly given its rich linguistic diversity. Existing Arabic resources have primarily focused on Modern Standard Arabic (MSA), leaving regional dialects underrepresented despite their prevalence in spoken contexts. To bridge this gap, we present two key contributions. We introduce MuDRiC, an extended Arabic commonsense dataset incorporating multiple dialects. To the best of our knowledge, this is the first Arabic multi-dialect commonsense reasoning dataset. We further propose a novel method adapting Graph Convolutional Networks (GCNs) to Arabic commonsense reasoning, which enhances semantic relationship modeling for improved commonsense validation. Our experimental results demonstrate that this approach consistently outperforms the baseline of direct language model fine-tuning. Overall, our work enhances Arabic natural language understanding by providing a foundational dataset and a new method for handling its complex variations. Data and code are available atthis https URL.

View on arXiv
Comments on this paper