Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2503.18878
Cited By
v1
v2 (latest)
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
24 March 2025
Andrey V. Galichin
Alexey Dontsov
Polina Druzhinina
Anton Razzhigaev
Oleg Y. Rogov
Elena Tutubalina
Ivan Oseledets
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (120 upvotes)
Papers citing
"I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders"
12 / 12 papers shown
Title
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
Krish Patel
Dingkun Zhou
Ajay Kankipati
Akshaj Gupta
Zeyi Austin Li
...
Guan-Ting Lin
Kan Jen Cheng
Huang-Cheng Chou
Jiachen Lian
Gopala Anumanchipalli
AuLLM
132
3
0
08 Oct 2025
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Yuchen Cai
Ding Cao
Xin Xu
Zijun Yao
Yuqing Huang
Zhenyu Tan
Benyi Zhang
Guiquan Liu
Junfeng Fang
119
0
0
01 Oct 2025
From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
Jue Zhang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LRM
101
0
0
28 Sep 2025
The Rogue Scalpel: Activation Steering Compromises LLM Safety
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Ivan Oseledets
Elena Tutubalina
LLMSV
AAML
128
0
0
26 Sep 2025
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Katharina Simbeck
Mariam Mahran
MILM
LLMSV
124
1
0
22 Sep 2025
Meta-R1: Empowering Large Reasoning Models with Metacognition
Haonan Dong
Haoran Ye
Wenhao Zhu
Kehan Jiang
Guojie Song
ReLM
LRM
AI4CE
112
2
0
24 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAML
SILM
137
0
0
14 Aug 2025
Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
Boyi Deng
Yu Wan
Baosong Yang
Fei Huang
Wenjie Wang
Fuli Feng
136
0
0
20 Jul 2025
KV Cache Steering for Controlling Frozen LLMs
Max Belitsky
D. J. Kopiczko
Michael Dorkenwald
M. Jehanzeb Mirza
James R. Glass
Cees G. M. Snoek
Yuki M. Asano
LLMSV
LRM
251
0
0
11 Jul 2025
Get Experience from Practice: LLM Agents with Record & Replay
Erhu Feng
Wenbo Zhou
Zibin Liu
Le Chen
Yunpeng Dong
...
Yisheng Zhao
Dong Du
Zhichao Hua
Yubin Xia
Haibo Chen
359
6
0
23 May 2025
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Zihao Li
Xu Wang
Yuzhe Yang
Ziyu Yao
Haoyi Xiong
Jundong Li
LLMSV
LRM
514
11
0
21 May 2025
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Dong Shu
Xuansheng Wu
Haiyan Zhao
Jundong Li
Ninghao Liu
LLMSV
365
2
0
12 May 2025
1