Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering

v1v2 (latest)

Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering

21 May 2025

ArXiv (abs)PDF HTML

Papers citing "Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering"

5 / 5 papers shown

Title
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement Anyi Wang Xuansheng Wu Dong Shu Yunpu Ma Ninghao Liu LLMSV 0 0 0 28 Sep 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models Z. He Haiyan Zhao Yiran Qiao Fan Yang Ali Payani Jing Ma Jundong Li LLMSV 174 14 0 17 Feb 2025
Improving Instruction-Following in Language Models through Activation Steering Alessandro Stolfo Vidhisha Balachandran Safoora Yousefi Eric Horvitz Besmira Nushi LLMSV 247 48 0 15 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution Haiyan Zhao Heng Zhao Bo Shen Ali Payani Fan Yang Mengnan Du 221 12 0 30 Sep 2024
Uncovering Latent Chain of Thought Vectors in Language Models Jason Zhang Scott Viteri LLMSV LRM 227 7 0 21 Sep 2024