ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15038
117
2
v1v2 (latest)

Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering

21 May 2025
Haiyan Zhao
Xuansheng Wu
Fan Yang
Bo Shen
Ninghao Liu
Mengnan Du
    LLMSV
ArXiv (abs)PDFHTML
Main:4 Pages
4 Figures
Bibliography:3 Pages
4 Tables
Appendix:5 Pages
Abstract

Linear concept vectors effectively steer LLMs, but existing methods suffer from noisy features in diverse datasets that undermine steering robustness. We propose Sparse Autoencoder-Denoised Concept Vectors (SDCV), which selectively keep the most discriminative SAE latents while reconstructing hidden representations. Our key insight is that concept-relevant signals can be explicitly separated from dataset noise by scaling up activations of top-k latents that best differentiate positive and negative samples. Applied to linear probing and difference-in-mean, SDCV consistently improves steering success rates by 4-16\% across six challenging concepts, while maintaining topic relevance.

View on arXiv
Comments on this paper