ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.14492
  4. Cited By
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
v1v2 (latest)

FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering

Annual Meeting of the Association for Computational Linguistics (ACL), 2025
20 April 2025
Yongbin Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
    LLMSV
ArXiv (abs)PDFHTML

Papers citing "FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering"

12 / 12 papers shown
Title
Silenced Biases: The Dark Side LLMs Learned to Refuse
Silenced Biases: The Dark Side LLMs Learned to Refuse
Rom Himelstein
Amit Levi
Brit Youngmann
Yaniv Nemcovsky
A. Mendelson
74
1
0
05 Nov 2025
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Hiba Ahsan
Byron C. Wallace
LLMSV
121
0
0
31 Oct 2025
Angular Steering: Behavior Control via Rotation in Activation Space
Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu
T. Nguyen
LLMSV
288
3
0
30 Oct 2025
Robust Preference Alignment via Directional Neighborhood Consensus
Robust Preference Alignment via Directional Neighborhood Consensus
Ruochen Mao
Yuling Shi
Xiaodong Gu
Jiaheng Wei
135
0
0
23 Oct 2025
Debiasing LLMs by Masking Unfairness-Driving Attention Heads
Debiasing LLMs by Masking Unfairness-Driving Attention Heads
Tingxu Han
Wei Song
Ziqi Ding
Z. Li
Chunrong Fang
Yuekang Li
Dongfang Liu
Zhenyu Chen
Zhenting Wang
151
0
0
11 Oct 2025
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
Xin Xu
Xunzhi He
Churan Zhi
Ruizhe Chen
Julian McAuley
Zexue He
50
0
0
30 Sep 2025
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them
Sekh Mainul Islam
Nadav Borenstein
Siddhesh Pawar
Haeun Yu
Arnav Arora
Isabelle Augenstein
154
1
0
12 Aug 2025
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models
Xiaoqing Cheng
Ruizhe Chen
Hongying Zan
Yuxiang Jia
Min Peng
247
1
0
28 May 2025
SAEs Are Good for Steering -- If You Select the Right Features
SAEs Are Good for Steering -- If You Select the Right Features
Dana Arad
Aaron Mueller
Yonatan Belinkov
LLMSV
179
19
0
26 May 2025
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
Zirui He
Haoyang Ling
Bo Shen
Ali Payani
Zelong Li
Mengnan Du
LLMSV
348
7
0
22 May 2025
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Amr Hegazy
Mostafa Elhoushi
Amr Alanwar
LLMSV
265
2
0
22 May 2025
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhiting Fan
Ruizhe Chen
Zuozhu Liu
333
4
0
30 Apr 2025
1