From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

Current research on bias in language models (LMs) predominantly focuses on data quality, with significantly less attention paid to model architecture and temporal influences of data. Even more critically, few studies systematically investigate the origins of bias. We propose a methodology grounded in comparative behavioral theory to interpret the complex interaction between training data and model architecture in bias propagation during language modeling. Building on recent work that relates transformers to n-gram LMs, we evaluate how data, model design choices, and temporal dynamics affect bias propagation. Our findings reveal that: (1) n-gram LMs are highly sensitive to context window size in bias propagation, while transformers demonstrate architectural robustness; (2) the temporal provenance of training data significantly affects bias; and (3) different model architectures respond differentially to controlled bias injection, with certain biases (e.g. sexual orientation) being disproportionately amplified. As language models become ubiquitous, our findings highlight the need for a holistic approach -- tracing bias to its origins across both data and model dimensions, not just symptoms, to mitigate harm.
View on arXiv@article{kabir2025_2505.12381, title={ From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling }, author={ Mohsinul Kabir and Tasfia Tahsin and Sophia Ananiadou }, journal={arXiv preprint arXiv:2505.12381}, year={ 2025 } }