ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20309
  4. Cited By
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
v1v2 (latest)

Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs

22 May 2025
Amr Hegazy
Mostafa Elhoushi
Amr Alanwar
    LLMSV
ArXiv (abs)PDFHTMLGithub (30168★)

Papers citing "Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs"

8 / 8 papers shown
Title
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
Jeremias Lino Ferrao
Matthijs van der Lende
Ilija Lichkovski
Clement Neo
LLMSV
60
0
0
16 Sep 2025
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
Changqing Li
Tianlin Li
Xiaohan Zhang
Aishan Liu
Li Pan
KELMLLMSV
48
0
0
09 Aug 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Yongbin Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
161
8
0
20 Apr 2025
Improving Instruction-Following in Language Models through Activation Steering
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
259
48
0
15 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
213
33
0
30 Sep 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
265
52
0
06 Sep 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
205
33
0
26 May 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
254
502
0
06 Apr 2024
1