Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2504.14492
Cited By
v1
v2 (latest)
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
20 April 2025
Yongbin Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering"
12 / 12 papers shown
Title
Silenced Biases: The Dark Side LLMs Learned to Refuse
Rom Himelstein
Amit Levi
Brit Youngmann
Yaniv Nemcovsky
A. Mendelson
74
1
0
05 Nov 2025
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Hiba Ahsan
Byron C. Wallace
LLMSV
121
0
0
31 Oct 2025
Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu
T. Nguyen
LLMSV
288
3
0
30 Oct 2025
Robust Preference Alignment via Directional Neighborhood Consensus
Ruochen Mao
Yuling Shi
Xiaodong Gu
Jiaheng Wei
135
0
0
23 Oct 2025
Debiasing LLMs by Masking Unfairness-Driving Attention Heads
Tingxu Han
Wei Song
Ziqi Ding
Z. Li
Chunrong Fang
Yuekang Li
Dongfang Liu
Zhenyu Chen
Zhenting Wang
151
0
0
11 Oct 2025
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
Xin Xu
Xunzhi He
Churan Zhi
Ruizhe Chen
Julian McAuley
Zexue He
50
0
0
30 Sep 2025
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them
Sekh Mainul Islam
Nadav Borenstein
Siddhesh Pawar
Haeun Yu
Arnav Arora
Isabelle Augenstein
154
1
0
12 Aug 2025
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models
Xiaoqing Cheng
Ruizhe Chen
Hongying Zan
Yuxiang Jia
Min Peng
247
1
0
28 May 2025
SAEs Are Good for Steering -- If You Select the Right Features
Dana Arad
Aaron Mueller
Yonatan Belinkov
LLMSV
179
19
0
26 May 2025
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
Zirui He
Haoyang Ling
Bo Shen
Ali Payani
Zelong Li
Mengnan Du
LLMSV
348
7
0
22 May 2025
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Amr Hegazy
Mostafa Elhoushi
Amr Alanwar
LLMSV
265
2
0
22 May 2025
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhiting Fan
Ruizhe Chen
Zuozhu Liu
333
4
0
30 Apr 2025
1