ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06981
  4. Cited By
Thinking Like Transformers

Thinking Like Transformers

13 June 2021
Gail Weiss
Yoav Goldberg
Eran Yahav
    AI4CE
ArXivPDFHTML

Papers citing "Thinking Like Transformers"

50 / 109 papers shown
Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
25
0
0
13 May 2025
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
38
0
0
21 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
41
0
0
29 Mar 2025
Function Alignment: A New Theory of Mind and Intelligence, Part I: Foundations
Function Alignment: A New Theory of Mind and Intelligence, Part I: Foundations
Gus G. Xia
39
0
0
27 Mar 2025
Meta-Learning Neural Mechanisms rather than Bayesian Priors
Meta-Learning Neural Mechanisms rather than Bayesian Priors
Michael Goodale
Salvador Mascarenhas
Yair Lakretz
38
0
0
20 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
64
1
0
26 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
37
0
0
24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
91
18
0
21 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Utkarsh Tiwari
Aviral Gupta
Michael Hahn
132
0
0
03 Feb 2025
An In-depth Investigation of Sparse Rate Reduction in Transformer-like
  Models
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu
Difan Zou
Dong Xu
71
1
0
26 Nov 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced
  Extrapolation in LLMs
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Xin Ma
Yang Liu
J. Liu
Xiaoxu Ma
16
1
0
21 Oct 2024
How Numerical Precision Affects Mathematical Reasoning Capabilities of
  LLMs
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Z. Li
Liwei Wang
LRM
35
5
0
17 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
27
4
0
17 Oct 2024
Hypothesis Testing the Circuit Hypothesis in LLMs
Hypothesis Testing the Circuit Hypothesis in LLMs
Claudia Shi
Nicolas Beltran-Velez
Achille Nazaret
Carolina Zheng
Adrià Garriga-Alonso
Andrew Jesson
Maggie Makar
David M. Blei
37
6
0
16 Oct 2024
Learning Linear Attention in Polynomial Time
Learning Linear Attention in Polynomial Time
Morris Yau
Ekin Akyürek
Jiayuan Mao
Joshua B. Tenenbaum
Stefanie Jegelka
Jacob Andreas
17
2
0
14 Oct 2024
Can Transformers Reason Logically? A Study in SAT Solving
Can Transformers Reason Logically? A Study in SAT Solving
Leyan Pan
Vijay Ganesh
Jacob Abernethy
Chris Esposo
Wenke Lee
ReLM
LRM
31
0
0
09 Oct 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in
  Large Language Models
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMat
LRM
58
129
0
07 Oct 2024
Mechanistic?
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
21
9
0
07 Oct 2024
Autoregressive Large Language Models are Computationally Universal
Autoregressive Large Language Models are Computationally Universal
Dale Schuurmans
Hanjun Dai
Francesco Zanini
33
2
0
04 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined
  Local Learning Coefficient
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
15
7
0
03 Oct 2024
Bayes' Power for Explaining In-Context Learning Generalizations
Bayes' Power for Explaining In-Context Learning Generalizations
Samuel G. Müller
Noah Hollmann
Frank Hutter
BDL
39
1
0
02 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Philipp Mondorf
Sondre Wold
Barbara Plank
34
0
0
02 Oct 2024
ENTP: Encoder-only Next Token Prediction
ENTP: Encoder-only Next Token Prediction
Ethan Ewer
Daewon Chae
Thomas Zeng
Jinkyu Kim
Kangwook Lee
36
3
0
02 Oct 2024
Neural Decompiling of Tracr Transformers
Neural Decompiling of Tracr Transformers
Hannes Thurnherr
Kaspar Riesen
ViT
23
1
0
29 Sep 2024
Towards Narrowing the Generalization Gap in Deep Boolean Networks
Towards Narrowing the Generalization Gap in Deep Boolean Networks
Youngsung Kim
NAI
AI4CE
28
0
0
06 Sep 2024
How transformers learn structured data: insights from hierarchical
  filtering
How transformers learn structured data: insights from hierarchical filtering
Jerome Garnier-Brun
Marc Mézard
Emanuele Moscato
Luca Saglietti
24
5
0
27 Aug 2024
Learning Randomized Algorithms with Transformers
Learning Randomized Algorithms with Transformers
J. Oswald
Seijin Kobayashi
Yassir Akram
Angelika Steger
AAML
40
0
0
20 Aug 2024
Your Context Is Not an Array: Unveiling Random Access Limitations in
  Transformers
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MohammadReza Ebrahimi
Sunny Panchal
Roland Memisevic
33
5
0
10 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A
  Case Study on Sparse Linear Regression
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
41
6
0
08 Aug 2024
Transformers on Markov Data: Constant Depth Suffices
Transformers on Markov Data: Constant Depth Suffices
Nived Rajaraman
Marco Bondaschi
Kannan Ramchandran
Michael C. Gastpar
Ashok Vardhan Makkuva
31
4
0
25 Jul 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic
  Interpretability Techniques
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Rohan Gupta
Iván Arcuschin
Thomas Kwa
Adrià Garriga-Alonso
47
3
0
19 Jul 2024
Mechanistically Interpreting a Transformer-based 2-SAT Solver: An
  Axiomatic Approach
Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach
Nils Palumbo
Ravi Mangal
Zifan Wang
Saranya Vijayakumar
Corina S. Pasareanu
Somesh Jha
41
1
0
18 Jul 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
61
1
0
15 Jul 2024
Transformer Circuit Faithfulness Metrics are not Robust
Transformer Circuit Faithfulness Metrics are not Robust
Joseph Miller
Bilal Chughtai
William Saunders
45
7
0
11 Jul 2024
Algorithmic Language Models with Neurally Compiled Libraries
Algorithmic Language Models with Neurally Compiled Libraries
Lucas Saldyt
Subbarao Kambhampati
LRM
51
0
0
06 Jul 2024
Universal Length Generalization with Turing Programs
Universal Length Generalization with Turing Programs
Kaiying Hou
David Brandfonbrener
Sham Kakade
Samy Jelassi
Eran Malach
42
7
0
03 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
48
11
0
01 Jul 2024
Revisiting Random Walks for Learning on Graphs
Revisiting Random Walks for Learning on Graphs
Jinwoo Kim
Olga Zaghen
Ayhan Suleymanzade
Youngmin Ryou
Seunghoon Hong
54
0
0
01 Jul 2024
MatText: Do Language Models Need More than Text & Scale for Materials
  Modeling?
MatText: Do Language Models Need More than Text & Scale for Materials Modeling?
Nawaf Alampara
Santiago Miret
K. Jablonka
48
8
0
25 Jun 2024
Finding Transformer Circuits with Edge Pruning
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
58
16
0
24 Jun 2024
Separations in the Representational Capabilities of Transformers and
  Recurrent Architectures
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
36
9
0
13 Jun 2024
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by
  Learning from Floor Plans
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans
Ludvig Ericson
Patric Jensfelt
34
7
0
13 Jun 2024
Universal In-Context Approximation By Prompting Fully Recurrent Models
Universal In-Context Approximation By Prompting Fully Recurrent Models
Aleksandar Petrov
Tom A. Lamb
Alasdair Paren
Philip H. S. Torr
Adel Bibi
LRM
32
0
0
03 Jun 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Andis Draguns
Andrew Gritsevskiy
S. Motwani
Charlie Rogers-Smith
Jeffrey Ladish
Christian Schroeder de Witt
40
2
0
03 Jun 2024
Language Models Need Inductive Biases to Count Inductively
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
32
5
0
30 May 2024
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification
  in Language Models
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
Charles OÑeill
Thang Bui
30
5
0
21 May 2024
Natural Language Processing RELIES on Linguistics
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
44
7
0
09 May 2024
123
Next