Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2505.20309
Cited By
v1
v2 (latest)
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
22 May 2025
Amr Hegazy
Mostafa Elhoushi
Amr Alanwar
LLMSV
Re-assign community
ArXiv (abs)
PDF
HTML
Github (30168★)
Papers citing
"Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs"
8 / 8 papers shown
Title
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
Jeremias Lino Ferrao
Matthijs van der Lende
Ilija Lichkovski
Clement Neo
LLMSV
60
0
0
16 Sep 2025
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
Changqing Li
Tianlin Li
Xiaohan Zhang
Aishan Liu
Li Pan
KELM
LLMSV
48
0
0
09 Aug 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Yongbin Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
161
8
0
20 Apr 2025
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
259
48
0
15 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
213
33
0
30 Sep 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
265
52
0
06 Sep 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
205
33
0
26 May 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
254
502
0
06 Apr 2024
1