How does GPT-2 compute greater-than?: Interpreting mathematical
abilities in a pre-trained language model

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

30 April 2023

Alexandre Variengien

Papers citing "How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model"

13 / 13 papers shown

Title
Round and Round We Go! What makes Rotary Positional Encodings useful? Federico Barbero Alex Vitvitskyi Christos Perivolaropoulos Razvan Pascanu Petar Veličković 38 16 0 14 May 2025
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition Zhengfu He J. Wang Rui Lin Xuyang Ge Wentao Shu Qiong Tang J. Zhang Xipeng Qiu 68 0 0 29 Apr 2025
Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures Fu-Chieh Chang Pei-Yuan Wu Pei-Yuan Wu LRM 73 1 0 25 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu Sophia Ananiadou 35 0 0 17 Nov 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit Zeqing He Zhibo Wang Zhixuan Chu Huiyu Xu Rui Zheng Kui Ren Chun Chen 34 3 0 17 Nov 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability ZhongXiang Sun Xiaoxue Zang Kai Zheng Yang Song Jun Xu Xiao Zhang Weijie Yu Yang Song Han Li 32 6 0 15 Oct 2024
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning Wei Chen Zhen Huang Liang Xie Binbin Lin Houqiang Li ... Deng Cai Yonggang Zhang Wenxiao Wang Xu Shen Jieping Ye 20 6 0 03 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 47 18 0 02 Jul 2024
Finding Transformer Circuits with Edge Pruning Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen 41 14 0 24 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Jack Merullo Carsten Eickhoff Ellie Pavlick 30 2 0 13 Jun 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing Michael A. Lepori Thomas Serre Ellie Pavlick 49 7 0 07 Nov 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 205 486 0 01 Nov 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 216 291 0 24 Feb 2021