ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16188
  4. Cited By
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
v1v2 (latest)

SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models

22 May 2025
Zirui He
Haoyang Ling
Bo Shen
Ali Payani
Zelong Li
Mengnan Du
    LLMSV
ArXiv (abs)PDFHTMLGithub (4★)

Papers citing "SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models"

15 / 15 papers shown
SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models
SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models
Jiaojiao Han
Wujiang Xu
Mingyu Jin
Mengnan Du
LRM
105
0
0
25 Nov 2025
SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning
Wei Xia
Zhi-Hong Deng
ALM
266
0
0
20 Nov 2025
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
Anyi Wang
Xuansheng Wu
Dong Shu
Yunpu Ma
Ninghao Liu
LLMSV
176
0
0
28 Sep 2025
Evaluating Sparse Autoencoders for Monosemantic Representation
Evaluating Sparse Autoencoders for Monosemantic Representation
Moghis Fereidouni
Muhammad Umair Haider
Peizhong Ju
A.B. Siddique
136
0
0
20 Aug 2025
Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder
Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder
Yingji Zhang
Danilo S. Carvalho
André Freitas
CoGe
394
0
0
25 Jun 2025
Improving LLM Reasoning through Interpretable Role-Playing Steering
Improving LLM Reasoning through Interpretable Role-Playing Steering
Anyi Wang
Dong Shu
Yifan Wang
Yunpu Ma
Mengnan Du
LLMSVLRM
219
3
0
09 Jun 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation SteeringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yongbin Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
317
18
0
20 Apr 2025
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders
Xuansheng Wu
Jiayi Yuan
Wenlin Yao
Xiaoming Zhai
Ninghao Liu
LLMSV
430
19
0
24 Feb 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
348
44
0
23 Feb 2025
Sparse Autoencoder Features for Classifications and Transferability
Sparse Autoencoder Features for Classifications and Transferability
Jack Gallifant
Shan Chen
Kuleen Sasse
Hugo J. W. L. Aerts
Thomas Hartvigsen
Danielle S. Bitterman
283
13
0
17 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Jundong Li
LLMSV
298
16
0
17 Feb 2025
A Unified Understanding and Evaluation of Steering Methods
A Unified Understanding and Evaluation of Steering Methods
Shawn Im
Yixuan Li
LLMSV
268
19
0
04 Feb 2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Javier Ferrando
Oscar Obeso
Senthooran Rajamanoharan
Neel Nanda
484
71
0
21 Nov 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian DistributionInternational Conference on Learning Representations (ICLR), 2024
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
418
16
0
30 Sep 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
424
51
0
26 May 2024
1