Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing

20 March 2025

Vishnu Asutosh Dasu

ArXiv (abs)PDF HTML Github (6354★)

Main:11 Pages

4 Figures

Bibliography:2 Pages

9 Tables

Abstract

This paper explores pruning attention heads as a post-processing bias mitigation method for large language models (LLMs). Modern AI systems such as LLMs are expanding into sensitive social contexts where fairness concerns become especially crucial. Since LLMs develop decision-making patterns by training on massive datasets of human-generated content, they naturally encode and perpetuate societal biases. While modifying training datasets and algorithms is expensive and requires significant resources; post-processing techniques-such as selectively deactivating neurons and attention heads in pre-trained LLMs-can provide feasible and effective approaches to improve fairness. However, identifying the optimal subset of parameters to prune presents a combinatorial challenge within LLMs' immense parameter space, requiring solutions that efficiently balance competing objectives across the frontiers of model fairness and utility.

View on arXiv

Comments on this paper