Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.18167
Cited By
v1
v2
v3
v4 (latest)
Understanding Reasoning in Thinking Language Models via Steering Vectors
22 June 2025
Constantin Venhoff
Iván Arcuschin
Philip Torr
Arthur Conmy
Neel Nanda
LLMSV
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29★)
Papers citing
"Understanding Reasoning in Thinking Language Models via Steering Vectors"
32 / 32 papers shown
SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
Shourya Batra
Pierce Tillman
Samarth Gaggar
Shashank Kesineni
Kevin Zhu
Sunishchal Dev
Ashwinee Panda
Vasu Sharma
Maheep Chaudhary
KELM
PILM
LLMSV
LRM
ELM
580
1
0
11 Nov 2025
Rank-1 LoRAs Encode Interpretable Reasoning Signals
Jake Ward
P. Riechers
A. Shai
ReLM
LRM
347
0
0
10 Nov 2025
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
Jingyu Hu
Shu Yang
Xilin Gong
H. Wang
Weiru Liu
Di Wang
LRM
138
2
0
09 Nov 2025
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought
Jiachen Zhao
Yiyou Sun
Weiyan Shi
Dawn Song
LRM
103
0
0
28 Oct 2025
Modeling Hierarchical Thinking in Large Reasoning Models
G M Shahariar
Ali Nazari
Erfan Shayegani
Nael B. Abu-Ghazaleh
LRM
AI4CE
126
0
0
25 Oct 2025
Mapping Faithful Reasoning in Language Models
Jiazheng Li
Andreas Damianou
J Rosser
José Luis Redondo García
Konstantina Palla
LRM
104
0
0
25 Oct 2025
Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?
Mike Thelwall
Ehsan Mohammadi
LRM
80
1
0
25 Oct 2025
Stream: Scaling up Mechanistic Interpretability to Long Context in LLMs via Sparse Attention
J Rosser
José Luis Redondo García
Gustavo Penha
Konstantina Palla
Hugues Bouchard
96
0
0
22 Oct 2025
A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring
Julian Schulz
LRM
130
0
0
22 Oct 2025
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
Yang Li
Z. Dong
Yuhan Sun
Weixun Wang
Shaopan Xiong
...
Han Lu
Jiamang Wang
Wenbo Su
Bo Zheng
Junchi Yan
LRM
113
4
0
15 Oct 2025
ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization
Sunzhu Li
Zhiyu Lin
Shuling Yang
Jiale Zhao
Wei Chen
LRM
101
0
0
14 Oct 2025
The Geometry of Reasoning: Flowing Logics in Representation Space
Yufa Zhou
Yixiao Wang
Xunjian Yin
Shuyan Zhou
Anru R. Zhang
LRM
AI4CE
120
2
0
10 Oct 2025
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
Krish Patel
Dingkun Zhou
Ajay Kankipati
Akshaj Gupta
Zeyi Austin Li
...
Guan-Ting Lin
Kan Jen Cheng
Huang-Cheng Chou
Jiachen Lian
Gopala Anumanchipalli
AuLLM
155
5
0
08 Oct 2025
Internal states before wait modulate reasoning patterns
Dmitrii Troitskii
Koyena Pal
Chris Wendler
Callum McDougall
Neel Nanda
LRM
101
1
1
05 Oct 2025
MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering
Chenlu Ding
Jiancan Wu
Leheng Sheng
Fan Zhang
Yancheng Yuan
Xiang Wang
Xiangnan He
MU
KELM
249
0
0
05 Oct 2025
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Akshat Ramachandran
Marina Neseem
Charbel Sakr
Rangharajan Venkatesan
Brucek Khailany
Tushar Krishna
MQ
LRM
VLM
150
1
1
01 Oct 2025
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
Anyi Wang
Xuansheng Wu
Dong Shu
Yunpu Ma
Ninghao Liu
LLMSV
183
0
0
28 Sep 2025
From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
Jue Zhang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LRM
108
0
0
28 Sep 2025
Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions
Yoonah Park
Haesung Pyun
Yohan Jo
KELM
375
0
0
28 Sep 2025
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Kohsei Matsutani
Shota Takashiro
Gouki Minegishi
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
OffRL
ReLM
LRM
207
6
0
25 Sep 2025
DISCO: Disentangled Communication Steering for Large Language Models
Max Torop
A. Masoomi
Masih Eskandar
Jennifer Dy
LLMSV
182
0
0
20 Sep 2025
Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors
Viacheslav Sinii
Nikita Balagansky
Gleb Gerasimov
Daniil Laptev
Yaroslav Aksenov
Vadim Kurochkin
Alexey Gorbatovski
Boris Shaposhnikov
Daniil Gavrilov
LLMSV
183
1
0
08 Sep 2025
Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
Yik Siu Chan
Zheng-Xin Yong
Stephen H. Bach
LRM
254
8
0
16 Jul 2025
PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
Krishna Kanth Nakka
Xue Jiang
Dmitrii Usynin
Xuebing Zhou
LLMSV
250
0
0
03 Jul 2025
Adversarial Manipulation of Reasoning Models using Internal Representations
Kureha Yamaguchi
Benjamin Etheridge
Andy Arditi
AAML
LRM
141
3
0
03 Jul 2025
Thought Anchors: Which LLM Reasoning Steps Matter?
Paul C. Bogdan
Uzay Macar
Neel Nanda
Arthur Conmy
LRM
364
50
0
23 Jun 2025
Latent Concept Disentanglement in Transformer-based Language Models
Guan Zhe Hong
Bhavya Vasudeva
Willie Neiswanger
Cyrus Rashtchian
Prabhakar Raghavan
Rina Panigrahy
ReLM
LRM
341
2
0
20 Jun 2025
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Leheng Sheng
Changshuo Shen
Weixiang Zhao
Junfeng Fang
Xiaohao Liu
Zhenkai Liang
Xiang Wang
An Zhang
Tat-Seng Chua
LLMSV
154
7
0
08 Jun 2025
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi
Hiroki Furuta
Takeshi Kojima
Yusuke Iwasawa
Y. Matsuo
LRM
1.1K
12
0
06 Jun 2025
Steering LLM Reasoning Through Bias-Only Adaptation
Viacheslav Sinii
Alexey Gorbatovski
Artem Cherepanov
Boris Shaposhnikov
Nikita Balagansky
Daniil Gavrilov
LLMSV
LRM
278
2
0
24 May 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
423
3
0
19 Apr 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRL
AI4TS
LRM
ReLM
VLM
1.2K
5,342
0
22 Jan 2025
1
Page 1 of 1