ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.18878
  4. Cited By
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
v1v2 (latest)

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

24 March 2025
Andrey V. Galichin
Alexey Dontsov
Polina Druzhinina
Anton Razzhigaev
Oleg Y. Rogov
Elena Tutubalina
Ivan Oseledets
    LRM
ArXiv (abs)PDFHTMLHuggingFace (120 upvotes)

Papers citing "I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders"

12 / 12 papers shown
Title
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
Krish Patel
Dingkun Zhou
Ajay Kankipati
Akshaj Gupta
Zeyi Austin Li
...
Guan-Ting Lin
Kan Jen Cheng
Huang-Cheng Chou
Jiachen Lian
Gopala Anumanchipalli
AuLLM
132
3
0
08 Oct 2025
On Predictability of Reinforcement Learning Dynamics for Large Language Models
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Yuchen Cai
Ding Cao
Xin Xu
Zijun Yao
Yuqing Huang
Zhenyu Tan
Benyi Zhang
Guiquan Liu
Junfeng Fang
119
0
0
01 Oct 2025
From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
Jue Zhang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LRM
101
0
0
28 Sep 2025
The Rogue Scalpel: Activation Steering Compromises LLM Safety
The Rogue Scalpel: Activation Steering Compromises LLM Safety
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Ivan Oseledets
Elena Tutubalina
LLMSVAAML
128
0
0
26 Sep 2025
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Katharina Simbeck
Mariam Mahran
MILMLLMSV
124
1
0
22 Sep 2025
Meta-R1: Empowering Large Reasoning Models with Metacognition
Meta-R1: Empowering Large Reasoning Models with Metacognition
Haonan Dong
Haoran Ye
Wenhao Zhu
Kehan Jiang
Guojie Song
ReLMLRMAI4CE
112
2
0
24 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAMLSILM
137
0
0
14 Aug 2025
Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
Boyi Deng
Yu Wan
Baosong Yang
Fei Huang
Wenjie Wang
Fuli Feng
136
0
0
20 Jul 2025
KV Cache Steering for Controlling Frozen LLMs
KV Cache Steering for Controlling Frozen LLMs
Max Belitsky
D. J. Kopiczko
Michael Dorkenwald
M. Jehanzeb Mirza
James R. Glass
Cees G. M. Snoek
Yuki M. Asano
LLMSVLRM
251
0
0
11 Jul 2025
Get Experience from Practice: LLM Agents with Record & Replay
Get Experience from Practice: LLM Agents with Record & Replay
Erhu Feng
Wenbo Zhou
Zibin Liu
Le Chen
Yunpeng Dong
...
Yisheng Zhao
Dong Du
Zhichao Hua
Yubin Xia
Haibo Chen
359
6
0
23 May 2025
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Zihao Li
Xu Wang
Yuzhe Yang
Ziyu Yao
Haoyi Xiong
Jundong Li
LLMSVLRM
514
11
0
21 May 2025
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Dong Shu
Xuansheng Wu
Haiyan Zhao
Jundong Li
Ninghao Liu
LLMSV
365
2
0
12 May 2025
1