ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.03813
  4. Cited By
Improving Activation Steering in Language Models with Mean-Centring

Improving Activation Steering in Language Models with Mean-Centring

6 December 2023
Ole Jorgensen
Dylan R. Cope
Nandi Schoots
Murray Shanahan
    LLMSV
ArXiv (abs)PDFHTMLGithub

Papers citing "Improving Activation Steering in Language Models with Mean-Centring"

27 / 27 papers shown
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
Anna Hedström
Salim I. Amoukou
Tom Bewley
Saumitra Mishra
Manuela Veloso
LLMSV
286
10
0
15 Oct 2025
BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation
BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation
Tsung-Min Pai
Jui-I Wang
Li-Chun Lu
Shao-Hua Sun
Hung-yi Lee
Kai-Wei Chang
MoMe
278
4
0
11 Oct 2025
Multimodal Function Vectors for Spatial Relations
Multimodal Function Vectors for Spatial Relations
Shuhao Fu
Esther Goldberg
Ying Nian Wu
Hongjing Lu
134
0
0
02 Oct 2025
Who is In Charge? Dissecting Role Conflicts in Instruction Following
Who is In Charge? Dissecting Role Conflicts in Instruction Following
Siqi Zeng
176
0
0
23 Sep 2025
ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance
ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance
Hannah Sterz
Fabian David Schmidt
Goran Glavaš
Ivan Vulić
MoMeLLMSV
203
4
0
18 Sep 2025
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
Chi Zhang
Changjia Zhu
Junjie Xiong
Xiaoran Xu
Jinkui Chi
Yao Liu
Zhuo Lu
ELM
310
5
0
07 Aug 2025
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Jingze Zhu
Y. Wu
Wenbo Zhu
Jiawang Cao
Y. Zheng
Jiawei Chen
Xu Yang
Bernt Schiele
Jonas Fischer
Xinting Hu
OffRL
217
1
0
06 Jul 2025
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
Jingtong Su
Julia Kempe
Karen Ullrich
406
3
0
20 Jun 2025
Probing the Robustness of Large Language Models Safety to Latent Perturbations
Probing the Robustness of Large Language Models Safety to Latent Perturbations
Tianle Gu
Kexin Huang
Zongqi Wang
Yixu Wang
Jie Li
Yuanqi Yao
Yang Yao
Yujiu Yang
Yan Teng
Yingchun Wang
AAMLLLMSV
326
4
0
19 Jun 2025
Linear Spatial World Models Emerge in Large Language Models
Linear Spatial World Models Emerge in Large Language Models
Matthieu Tehenan
Christian Moya
Tenghai Long
Guang Lin
LRM
240
1
0
03 Jun 2025
IF-GUIDE: Influence Function-Guided Detoxification of LLMs
IF-GUIDE: Influence Function-Guided Detoxification of LLMs
Zachary Coalson
Juhan Bae
Nicholas Carlini
Sanghyun Hong
TDI
513
1
0
02 Jun 2025
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Zirui He
Haoyang Ling
Bo Shen
Ali Payani
Zelong Li
Mengnan Du
LLMSV
610
11
0
22 May 2025
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
Haiyan Zhao
Xuansheng Wu
Fan Yang
Bo Shen
Ninghao Liu
Mengnan Du
LLMSV
375
5
0
21 May 2025
Risk Assessment Framework for Code LLMs via Leveraging Internal States
Risk Assessment Framework for Code LLMs via Leveraging Internal States
Yuheng Huang
Lei Ma
Keizaburo Nishikino
Takumi Akazaki
279
5
0
20 Apr 2025
Representation Bending for Large Language Model Safety
Representation Bending for Large Language Model SafetyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ashkan Yousefpour
Taeheon Kim
Ryan S. Kwon
Seungbeen Lee
Wonje Jeung
Seungju Han
Alvin Wan
Harrison Ngan
Youngjae Yu
Jonghyun Choi
AAMLALMKELM
479
17
0
02 Apr 2025
Inference-Time Intervention in Large Language Models for Reliable Requirement Verification
Inference-Time Intervention in Large Language Models for Reliable Requirement Verification
Paul Darm
James Xie
A. Riccardi
232
0
0
18 Mar 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach
Towards Understanding Distilled Reasoning Models: A Representational Approach
David D. Baek
Max Tegmark
LRM
400
21
0
05 Mar 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Jundong Li
LLMSV
378
18
0
17 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
330
5
0
17 Feb 2025
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented ApproachAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
J. Yang
Dapeng Chen
Yajing Sun
Rongjun Li
Zhiyong Feng
Wei Peng
358
17
0
19 Jan 2025
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus
Steven Abreu
LLMSV
826
18
0
09 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian DistributionInternational Conference on Learning Representations (ICLR), 2024
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
517
20
0
30 Sep 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation SteeringInternational Conference on Learning Representations (ICLR), 2024
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
569
106
0
06 Sep 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
531
63
0
26 May 2024
Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Stephen Casper
Lennart Schulze
Oam Patel
Dylan Hadfield-Menell
AAML
817
69
0
08 Mar 2024
Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods
Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods
Yotam Wolf
Noam Wies
Dorin Shteyman
Binyamin Rothberg
Yoav Levine
Amnon Shashua
LLMSV
810
22
0
29 Jan 2024
LEACE: Perfect linear concept erasure in closed form
LEACE: Perfect linear concept erasure in closed formNeural Information Processing Systems (NeurIPS), 2023
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Robert Bamler
Edward Raff
Stella Biderman
KELMMU
937
189
0
06 Jun 2023
1
Page 1 of 1