ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.08744
  4. Cited By
Circuit Component Reuse Across Tasks in Transformer Language Models

Circuit Component Reuse Across Tasks in Transformer Language Models

12 October 2023
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
ArXivPDFHTML

Papers citing "Circuit Component Reuse Across Tasks in Transformer Language Models"

50 / 58 papers shown
Title
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
86
0
0
22 Apr 2025
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
48
0
0
21 Apr 2025
Deep Learning with Pretrained Ínternal World' Layers: A Gemma 3-Based Modular Architecture for Wildfire Prediction
Deep Learning with Pretrained Ínternal World' Layers: A Gemma 3-Based Modular Architecture for Wildfire Prediction
Ayoub Jadouli
Chaker El Amrani
KELM
AI4TS
76
0
0
20 Apr 2025
MIB: A Mechanistic Interpretability Benchmark
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
41
1
0
17 Apr 2025
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Vishnu Kabir Chhabra
Mohammad Mahdi Khalili
AI4CE
28
0
0
05 Apr 2025
Shared Global and Local Geometry of Language Model Embeddings
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee
Melanie Weber
F. Viégas
Martin Wattenberg
FedML
74
1
0
27 Mar 2025
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
Ying Shen
Lifu Huang
47
1
0
20 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
45
0
0
14 Mar 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
44
0
0
04 Mar 2025
Re-evaluating Theory of Mind evaluation in large language models
Re-evaluating Theory of Mind evaluation in large language models
Jennifer Hu
Felix Sosa
T. Ullman
40
0
0
28 Feb 2025
Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification
Vishnu Kabir Chhabra
Ding Zhu
Mohammad Mahdi Khalili
37
2
0
27 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELM
MU
46
0
0
27 Feb 2025
Repetition Neurons: How Do Language Models Produce Repetitions?
Repetition Neurons: How Do Language Models Produce Repetitions?
Tatsuya Hiraoka
Kentaro Inui
MILM
73
6
0
21 Feb 2025
LLMs as a synthesis between symbolic and continuous approaches to language
LLMs as a synthesis between symbolic and continuous approaches to language
Gemma Boleda
SyDa
69
0
0
17 Feb 2025
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
Yixin Ou
Yunzhi Yao
N. Zhang
Hui Jin
Jiacheng Sun
Shumin Deng
Z. Li
H. Chen
KELM
CLL
49
0
0
16 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
49
0
0
08 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
80
4
0
31 Dec 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
109
0
0
17 Nov 2024
How Transformers Solve Propositional Logic Problems: A Mechanistic
  Analysis
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Guan Zhe Hong
Nishanth Dikkala
Enming Luo
Cyrus Rashtchian
Xin Wang
Rina Panigrahy
OffRL
LRM
NAI
29
0
0
06 Nov 2024
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng
Xidong Wang
Juhao Liang
Nuo Chen
Yuping Zheng
Benyou Wang
MoE
30
5
0
14 Oct 2024
The Same But Different: Structural Similarities and Differences in
  Multilingual Language Modeling
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling
Ruochen Zhang
Qinan Yu
Matianyu Zang
Carsten Eickhoff
Ellie Pavlick
45
1
0
11 Oct 2024
Round and Round We Go! What makes Rotary Positional Encodings useful?
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
75
16
0
08 Oct 2024
Mechanistic?
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
21
9
0
07 Oct 2024
Activation Scaling for Steering and Interpreting Language Models
Activation Scaling for Steering and Interpreting Language Models
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Ryan Cotterell
Aaron Schein
LLMSV
LRM
29
4
0
07 Oct 2024
Listening to the Wise Few: Select-and-Copy Attention Heads for
  Multiple-Choice QA
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA
Eduard Tulchinskii
Laida Kushnareva
Kristian Kuznetsov
Anastasia Voznyuk
Andrei Andriiainen
Irina Piontkovskaya
Evgeny Burnaev
Serguei Barannikov
65
1
0
03 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Philipp Mondorf
Sondre Wold
Barbara Plank
29
0
0
02 Oct 2024
PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances
  Retrieval-Augmented Generation with Zero Inference Overhead
PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead
Tao Tan
Yining Qian
Ang Lv
Hongzhan Lin
Songhao Wu
Yongbo Wang
Feng Wang
Jingtong Wu
Xin Lu
Rui Yan
22
1
0
29 Sep 2024
Optimal ablation for interpretability
Optimal ablation for interpretability
Maximilian Li
Lucas Janson
FAtt
44
2
0
16 Sep 2024
Attention Heads of Large Language Models: A Survey
Attention Heads of Large Language Models: A Survey
Zifan Zheng
Yezhaohui Wang
Yuxin Huang
Shichao Song
Mingchuan Yang
Bo Tang
Feiyu Xiong
Zhiyu Li
LRM
52
21
0
05 Sep 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical
  Grounding of Causal Interpretability
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
42
18
0
02 Aug 2024
Dissecting Multiplication in Transformers: Insights into LLMs
Dissecting Multiplication in Transformers: Insights into LLMs
Luyu Qiu
Jianing Li
Chi Su
C. Zhang
Lei Chen
32
3
0
22 Jul 2024
LLM Circuit Analyses Are Consistent Across Training and Scale
LLM Circuit Analyses Are Consistent Across Training and Scale
Curt Tigges
Michael Hanna
Qinan Yu
Stella Biderman
31
10
0
15 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
The Remarkable Robustness of LLMs: Stages of Inference?
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
Wes Gurnee
Max Tegmark
33
33
0
27 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
31
1
0
24 Jun 2024
Beyond the Doors of Perception: Vision Transformers Represent Relations
  Between Objects
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
Michael A. Lepori
Alexa R. Tartaglini
Wai Keen Vong
Thomas Serre
Brenden Lake
Ellie Pavlick
34
2
0
22 Jun 2024
When Parts are Greater Than Sums: Individual LLM Components Can
  Outperform Full Models
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Ting-Yun Chang
Jesse Thomason
Robin Jia
40
4
0
19 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
56
13
0
13 Jun 2024
Position: An Inner Interpretability Framework for AI Inspired by Lessons
  from Cognitive Neuroscience
Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Martina G. Vilas
Federico Adolfi
David Poeppel
Gemma Roig
40
5
0
03 Jun 2024
Knowledge Circuits in Pretrained Transformers
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
64
20
0
28 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep
  neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
61
3
0
24 May 2024
Learned feature representations are biased by complexity, learning
  order, position, and more
Learned feature representations are biased by complexity, learning order, position, and more
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Katherine Hermann
AI4CE
FaML
SSL
OOD
32
6
0
09 May 2024
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice
  Questions
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
Ruizhe Li
Yanjun Gao
KELM
27
5
0
06 May 2024
How to use and interpret activation patching
How to use and interpret activation patching
Stefan Heimersheim
Neel Nanda
25
37
0
23 Apr 2024
Interpreting Key Mechanisms of Factual Recall in Transformer-Based
  Language Models
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
Ang Lv
Yuhan Chen
Kaiyi Zhang
Yulong Wang
Lifeng Liu
Ji-Rong Wen
Jian Xie
Rui Yan
KELM
32
16
0
28 Mar 2024
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding
  Model Mechanisms
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
51
34
0
26 Mar 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to
  components
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
43
42
0
01 Mar 2024
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step
  Reasoning Task
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
Jannik Brinkmann
Abhay Sheshadri
Victor Levoso
Paul Swoboda
Christian Bartelt
LRM
27
21
0
19 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
79
0
07 Feb 2024
Rethinking Interpretability in the Era of Large Language Models
Rethinking Interpretability in the Era of Large Language Models
Chandan Singh
J. Inala
Michel Galley
Rich Caruana
Jianfeng Gao
LRM
AI4CE
75
61
0
30 Jan 2024
12
Next