ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.12877
  4. Cited By
Improving Instruction-Following in Language Models through Activation Steering
v1v2 (latest)

Improving Instruction-Following in Language Models through Activation Steering

15 October 2024
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
    LLMSV
ArXiv (abs)PDFHTML

Papers citing "Improving Instruction-Following in Language Models through Activation Steering"

50 / 97 papers shown
Title
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
Xu Wang
Yan Hu
Benyou Wang
Difan Zou
LLMSV
28
0
0
04 Oct 2025
Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors
Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors
Youxu Shi
Suorong Yang
Dong Liu
MLLMVLM
20
0
0
26 Sep 2025
The Rogue Scalpel: Activation Steering Compromises LLM Safety
The Rogue Scalpel: Activation Steering Compromises LLM Safety
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Ivan Oseledets
Elena Tutubalina
LLMSVAAML
0
0
0
26 Sep 2025
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
Aayush Mishra
Daniel Khashabi
Anqi Liu
36
0
0
26 Sep 2025
Regulating the Agency of LLM-based Agents
Regulating the Agency of LLM-based Agents
Seán Boddy
Joshua Joseph
ELM
8
0
0
25 Sep 2025
When Instructions Multiply: Measuring and Estimating LLM Capabilities of Multiple Instructions Following
When Instructions Multiply: Measuring and Estimating LLM Capabilities of Multiple Instructions Following
Keno Harada
Yudai Yamazaki
Masachika Taniguchi
Edison Marrese-Taylor
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
ALM
24
0
0
25 Sep 2025
A Comparative Analysis of Sparse Autoencoder and Activation Difference in Language Model Steering
A Comparative Analysis of Sparse Autoencoder and Activation Difference in Language Model Steering
Jiaqing Xie
LLMSV
27
0
0
24 Sep 2025
DISCO: Disentangled Communication Steering for Large Language Models
DISCO: Disentangled Communication Steering for Large Language Models
Max Torop
A. Masoomi
Masih Eskandar
Jennifer Dy
LLMSV
24
0
0
20 Sep 2025
ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance
ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance
Hannah Sterz
Fabian David Schmidt
Goran Glavaš
Ivan Vulić
MoMeLLMSV
24
0
0
18 Sep 2025
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
Xinyu Tang
Zhenduo Zhang
Y. Liu
Wayne Xin Zhao
Zujie Wen
Zhiqiang Zhang
Jun Zhou
OffRL
24
0
0
01 Sep 2025
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
Yiwei Guo
Bohan Li
Hankun Wang
Zhihan Li
Shuai Wang
Xie Chen
K. Yu
AuLLM
48
0
0
01 Sep 2025
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
Jinwei Gan
Zifeng Cheng
Zhiwei Jiang
Cong Wang
Yafeng Yin
Xiang Luo
Yuchen Fu
Qing Gu
KELMLLMSV
44
0
0
25 Aug 2025
Enhancing Supervised Composed Image Retrieval via Reasoning-Augmented Representation Engineering
Enhancing Supervised Composed Image Retrieval via Reasoning-Augmented Representation Engineering
Jun Li
Kai Li
Shaoguo Liu
Tingting Gao
LRM
24
0
0
15 Aug 2025
KV Cache Steering for Controlling Frozen LLMs
KV Cache Steering for Controlling Frozen LLMs
Max Belitsky
D. J. Kopiczko
Michael Dorkenwald
M. Jehanzeb Mirza
James R. Glass
Cees G. M. Snoek
Yuki M. Asano
LLMSVLRM
77
0
0
11 Jul 2025
MemOS: A Memory OS for AI System
MemOS: A Memory OS for AI System
Z. Li
Shichao Song
Chenyang Xi
Hanyu Wang
Chen Tang
...
Hongkang Yang
Wentao Zhang
Zhi-Qin John Xu
S. Chen
Feiyu Xiong
KELMRALM
173
10
0
04 Jul 2025
Generalizing Verifiable Instruction Following
Generalizing Verifiable Instruction Following
Valentina Pyatkin
Saumya Malik
Victoria Graf
Hamish Ivison
Shengyi Huang
Pradeep Dasigi
Nathan Lambert
Hannaneh Hajishirzi
ALM
74
12
0
03 Jul 2025
Transferring Features Across Language Models With Model Stitching
Transferring Features Across Language Models With Model Stitching
Alan Chen
Jack Merullo
Alessandro Stolfo
Ellie Pavlick
103
1
0
07 Jun 2025
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
Zeming Wei
Chengcan Wu
Meng Sun
124
2
0
02 Jun 2025
SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs
Shaona Ghosh
Amrita Bhattacharjee
Yftah Ziser
Christopher Parisien
LLMSV
83
4
0
01 Jun 2025
Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models
Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models
Femi Bello
Anubrata Das
Fanzhi Zeng
Fangcong Yin
Liu Leqi
LLMSV
184
1
0
31 May 2025
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Yu Zhang
Jinlong Ma
Yongshuai Hou
Xuefeng Bai
Kehai Chen
Yang Xiang
Jun Yu
Min Zhang
162
4
0
27 May 2025
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Amr Hegazy
Mostafa Elhoushi
Amr Alanwar
LLMSV
135
2
0
22 May 2025
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
Haiyan Zhao
Xuansheng Wu
Fan Yang
Bo Shen
Ninghao Liu
Mengnan Du
LLMSV
115
2
0
21 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
303
1
0
18 May 2025
Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering
Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering
Prince Kumar
Danish Contractor
LLMSVLRM
120
4
0
17 May 2025
Do different prompting methods yield a common task representation in language models?
Do different prompting methods yield a common task representation in language models?
Guy Davidson
Todd M. Gureckis
Brenden M. Lake
Adina Williams
119
3
0
17 May 2025
Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering
Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering
Jessica Y. Bo
Tianyu Xu
Ishan Chatterjee
Katrina Passarella-Ward
Achin Kulshrestha
D Shin
LLMSV
199
3
0
07 May 2025
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
Ziwen Xu
Shuxun Wang
Kewei Xu
Haoming Xu
Mengru Wang
Xinle Deng
Yunzhi Yao
Guozhou Zheng
Ningyu Zhang
Xin Xu
KELMLLMSV
734
4
0
21 Apr 2025
Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering
Xinyu Tang
Xiaolei Wang
Zhihao Lv
Yingqian Min
Wayne Xin Zhao
Binbin Hu
Ziqi Liu
Qing Cui
LRM
233
16
0
14 Mar 2025
SAKE: Steering Activations for Knowledge Editing
SAKE: Steering Activations for Knowledge Editing
Marco Scialanga
Thibault Laugel
Vincent Grari
Marcin Detyniecki
KELMLLMSV
206
2
0
03 Mar 2025
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschlager
Jannes Elstner
Simon Geisler
Vincent Cohen-Addad
Stephan Günnemann
Johannes Gasteiger
LLMSV
145
16
0
24 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
244
2
0
24 Feb 2025
Steering LLMs for Formal Theorem Proving
Steering LLMs for Formal Theorem Proving
Shashank Kirtania
Arun Shankar Iyer
LLMSV
752
0
0
21 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Jundong Li
LLMSV
174
14
0
17 Feb 2025
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Blake Bullwinkel
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
236
31
0
18 Nov 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
Emanuele Marconato
Sébastien Lachapelle
Sebastian Weichwald
Luigi Gresele
220
5
0
30 Oct 2024
Attention Speaks Volumes: Localizing and Mitigating Bias in Language
  Models
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Rishabh Adiga
Besmira Nushi
Varun Chandrasekaran
135
2
0
29 Oct 2024
Survey of User Interface Design and Interaction Techniques in Generative
  AI Applications
Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Reuben Luera
Ryan Rossi
Alexa F. Siu
Franck Dernoncourt
Tong Yu
...
Hanieh Salehy
Jian Zhao
Samyadeep Basu
Puneet Mathur
Nedim Lipka
AI4TS
186
2
0
28 Oct 2024
Recurrent Neural Networks Learn to Store and Generate Sequences using
  Non-Linear Representations
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Róbert Csordás
Christopher Potts
Christopher D. Manning
Atticus Geiger
GAN
106
30
0
20 Aug 2024
Multi-property Steering of Large Language Models with Dynamic Activation
  Composition
Multi-property Steering of Large Language Models with Dynamic Activation Composition
Daniel Scalena
Gabriele Sarti
Malvina Nissim
KELMLLMSVAI4CE
119
16
0
25 Jun 2024
Steering Without Side Effects: Improving Post-Deployment Control of
  Language Models
Steering Without Side Effects: Improving Post-Deployment Control of Language Models
Asa Cooper Stickland
Alexander Lyzhov
Jacob Pfau
Salsabila Mahdi
Samuel R. Bowman
LLMSVAAML
152
30
0
21 Jun 2024
Refusal in Language Models Is Mediated by a Single Direction
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
Oscar Obeso
Aaquib Syed
Daniel Paleka
Nina Panickssery
Wes Gurnee
Neel Nanda
214
313
0
17 Jun 2024
Controlling Large Language Model Agents with Entropic Activation
  Steering
Controlling Large Language Model Agents with Entropic Activation Steering
Nate Rahn
P. DÓro
Marc G. Bellemare
LLMSV
108
14
0
01 Jun 2024
Personalized Steering of Large Language Models: Versatile Steering
  Vectors Through Bi-directional Preference Optimization
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Yuanpu Cao
Tianrong Zhang
Bochuan Cao
Ziyi Yin
Lu Lin
Fenglong Ma
Jinghui Chen
LLMSV
155
55
0
28 May 2024
Spectral Editing of Activations for Large Language Model Alignment
Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu
Zheng Zhao
Yftah Ziser
Anna Korhonen
Edoardo Ponti
Shay B. Cohen
KELMLLMSV
159
31
0
15 May 2024
Extending Activation Steering to Broad Skills and Multiple Behaviours
Extending Activation Steering to Broad Skills and Multiple Behaviours
Teun van der Weij
Massimo Poesio
Nandi Schoots
LLMSV
127
21
0
09 Mar 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
  Tracking
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash
Tamar Rott Shaham
Tal Haklay
Yonatan Belinkov
David Bau
126
84
0
22 Feb 2024
Investigating Bias Representations in Llama 2 Chat via Activation
  Steering
Investigating Bias Representations in Llama 2 Chat via Activation Steering
Dawn Lu
Nina Rimsky
LLMSV
80
13
0
01 Feb 2024
On Prompt-Driven Safeguarding for Large Language Models
On Prompt-Driven Safeguarding for Large Language Models
Chujie Zheng
Fan Yin
Hao Zhou
Fandong Meng
Jie Zhou
Kai-Wei Chang
Minlie Huang
Nanyun Peng
AAML
217
82
0
31 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
  and Toxicity
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
194
143
0
03 Jan 2024
12
Next