Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03813
Cited By
Improving Activation Steering in Language Models with Mean-Centring
6 December 2023
Ole Jorgensen
Dylan R. Cope
Nandi Schoots
Murray Shanahan
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Activation Steering in Language Models with Mean-Centring"
5 / 5 papers shown
Title
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
J. Yang
Dapeng Chen
Yajing Sun
Rongjun Li
Zhiyong Feng
Wei Peng
35
5
0
19 Jan 2025
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus
Steven Abreu
LLMSV
44
1
0
09 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
55
2
0
30 Sep 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
87
13
0
06 Sep 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
34
7
0
26 May 2024
1