An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs

18 June 2024

Papers citing "An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs"

8 / 8 papers shown

Title
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes Bryan R Christ Zack Gottesman Jonathan Kropko Thomas Hartvigsen LRM 49 2 0 20 Feb 2025
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 73 18 0 02 Jul 2024
Universal Neurons in GPT2 Language Models Wes Gurnee Theo Horsley Zifan Carl Guo Tara Rezaei Kheirkhah Qinyi Sun Will Hathaway Neel Nanda Dimitris Bertsimas MILM 92 37 0 22 Jan 2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 153 186 0 02 May 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango Aman Madaan Amir Yazdanbakhsh LRM 141 115 0 16 Sep 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 291 4,048 0 24 May 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 221 402 0 24 Feb 2021