Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English

6 March 2025

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning tasks, leading to their widespread deployment. However, recent studies have highlighted concerning biases in these models, particularly in their handling of dialectal variations like African American English (AAE). In this work, we systematically investigate dialectal disparities in LLM reasoning tasks. We develop an experimental framework comparing LLM performance given Standard American English (SAE) and AAE prompts, combining LLM-based dialect conversion with established linguistic analyses. We find that LLMs consistently produce less accurate responses and simpler reasoning chains and explanations for AAE inputs compared to equivalent SAE questions, with disparities most pronounced in social science and humanities domains. These findings highlight systematic differences in how LLMs process and reason about different language varieties, raising important questions about the development and deployment of these systems in our multilingual and multidialectal world. Our code repository is publicly available atthis https URL.

View on arXiv

@article{zhou2025_2503.04099,
  title={ Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English },
  author={ Runtao Zhou and Guangya Wan and Saadia Gabriel and Sheng Li and Alexander J Gates and Maarten Sap and Thomas Hartvigsen },
  journal={arXiv preprint arXiv:2503.04099},
  year={ 2025 }
}

Comments on this paper