ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.16213
  4. Cited By
Saturated Transformers are Constant-Depth Threshold Circuits
v1v2v3 (latest)

Saturated Transformers are Constant-Depth Threshold Circuits

Transactions of the Association for Computational Linguistics (TACL), 2021
30 June 2021
William Merrill
Ashish Sabharwal
Noah A. Smith
ArXiv (abs)PDFHTMLGithub

Papers citing "Saturated Transformers are Constant-Depth Threshold Circuits"

50 / 97 papers shown
Rectifying LLM Thought from Lens of Optimization
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
189
2
0
01 Dec 2025
Generalizable Insights for Graph Transformers in Theory and Practice
Generalizable Insights for Graph Transformers in Theory and Practice
Timo Stoll
Luis Muller
Christopher Morris
188
0
0
11 Nov 2025
ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
Yue Min
Shaobo Wang
Jiaze Li
Tianle Niu
Junxin Fan
Yongliang Miao
Lijin Yang
Linfeng Zhang
DD
361
0
0
11 Nov 2025
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Sean McLeish
Ang Li
John Kirchenbauer
Dayal Singh Kalra
Brian Bartoldson
B. Kailkhura
Avi Schwarzschild
Jonas Geiping
Tom Goldstein
Micah Goldblum
338
9
0
10 Nov 2025
Scaling Laws and In-Context Learning: A Unified Theoretical Framework
Scaling Laws and In-Context Learning: A Unified Theoretical Framework
Sushant Mehta
Ishan Gupta
141
1
0
09 Nov 2025
Next-Latent Prediction Transformers Learn Compact World Models
Next-Latent Prediction Transformers Learn Compact World Models
Jayden Teoh
Manan Tomar
Kwangjun Ahn
E. Hu
Pratyusha Sharma
Riashat Islam
Alex Lamb
John Langford
218
6
0
08 Nov 2025
Allocation of Parameters in Transformers
Allocation of Parameters in Transformers
Ruoxi Yu
Haotian Jiang
Jingpu Cheng
Penghao Yu
Qianxiao Li
Zhong Li
MoE
198
0
0
04 Oct 2025
The Transformer Cookbook
The Transformer Cookbook
Andy Yang
Christopher Watson
Anton Xue
S. Bhattamishra
Jose Llarena
William Merrill
Emile Dos Santos Ferreira
Anej Svete
David Chiang
181
3
0
01 Oct 2025
Realizable Circuit Complexity: Embedding Computation in Space-Time
Realizable Circuit Complexity: Embedding Computation in Space-Time
Benjamin Prada
Ankur Mali
169
0
0
23 Sep 2025
Fast attention mechanisms: a tale of parallelism
Fast attention mechanisms: a tale of parallelism
Jingwen Liu
Hantao Yu
Clayton Sanford
Alexandr Andoni
Daniel J. Hsu
180
1
0
10 Sep 2025
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Ivan Rodkin
Daniil Orel
Konstantin Smirnov
Arman Bolatov
Bilal Elbouardi
...
Aydar Bulatov
Preslav Nakov
Timothy Baldwin
Artem Shelmanov
Mikhail Burtsev
ReLMELMLRM
282
1
0
22 Aug 2025
Towards High-Order Mean Flow Generative Models: Feasibility, Expressivity, and Provably Efficient Criteria
Towards High-Order Mean Flow Generative Models: Feasibility, Expressivity, and Provably Efficient Criteria
Yang Cao
Yubin Chen
Zhao Song
Jiahao Zhang
319
9
0
09 Aug 2025
A Rose by Any Other Name Would Smell as Sweet: Categorical Homotopy Theory for Large Language Models
A Rose by Any Other Name Would Smell as Sweet: Categorical Homotopy Theory for Large Language Models
Sridhar Mahadevan
165
1
0
07 Aug 2025
Topos Theory for Generative AI and LLMs
Topos Theory for Generative AI and LLMs
Sridhar Mahadevan
179
0
0
05 Aug 2025
BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning
BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning
Jinan Zhou
Rajat Ghosh
Vaishnavi Bhargava
Debojyoti Dutta
Aryan Singhal
232
0
0
31 Jul 2025
The Serial Scaling Hypothesis
The Serial Scaling Hypothesis
Yuxi Liu
Konpat Preechakul
Kananart Kuwaranancharoen
Yutong Bai
LRM
267
6
0
16 Jul 2025
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
349
0
0
13 Jun 2025
Data Shifts Hurt CoT: A Theoretical Study
Data Shifts Hurt CoT: A Theoretical Study
Lang Yin
Debangshu Banerjee
Gagandeep Singh
335
3
0
12 Jun 2025
Comparison of different Unique hard attention transformer models by the formal languages they can recognize
Comparison of different Unique hard attention transformer models by the formal languages they can recognize
Leonid Ryvkin
135
0
0
03 Jun 2025
Characterizing the Expressivity of Fixed-Precision Transformer Language Models
Characterizing the Expressivity of Fixed-Precision Transformer Language Models
Jiaoda Li
Robert Bamler
270
5
0
29 May 2025
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
MLT
572
8
0
26 May 2025
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Junnan Liu
Hongwei Liu
Linchen Xiao
Shudong Liu
Taolin Zhang
Zihan Ma
Songyang Zhang
Kai Chen
LRM
532
3
0
26 May 2025
Exact Expressive Power of Transformers with Padding
Exact Expressive Power of Transformers with Padding
William Merrill
Ashish Sabharwal
479
12
0
25 May 2025
The Counting Power of Transformers
The Counting Power of Transformers
Marco Sälzer
Chris Köcher
Anthony Widjaja Lin
Georg Zetzsche
Anthony Widjaja Lin
433
0
0
16 May 2025
Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent
Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent
Bo Chen
Zhenmei Shi
Zhao Song
Jiahao Zhang
NAILRMAI4CE
408
10
0
07 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
641
1
0
29 Mar 2025
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Sophie Hao
ELMAI4CE
279
0
0
25 Mar 2025
Unique Hard Attention: A Tale of Two Sides
Unique Hard Attention: A Tale of Two SidesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Selim Jerad
Anej Svete
Jiaoda Li
Robert Bamler
393
7
0
18 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
William Merrill
Ashish Sabharwal
578
38
0
05 Mar 2025
(How) Do Language Models Track State?
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
475
20
0
04 Mar 2025
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
Gilad Yehudai
Noah Amsel
Joan Bruna
LRM
348
3
0
03 Mar 2025
On Computational Limits of FlowAR Models: Expressivity and Efficiency
On Computational Limits of FlowAR Models: Expressivity and Efficiency
Chengyue Gong
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Zhao Song
525
10
0
23 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable ComputersInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
696
22
0
21 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoEAI4CE
620
15
0
13 Feb 2025
Circuit Complexity Bounds for Visual Autoregressive Model
Circuit Complexity Bounds for Visual Autoregressive Model
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
478
16
0
08 Jan 2025
Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based
  Tensor Attention Transformers
Theoretical Constraints on the Expressive Power of RoPE\mathsf{RoPE}RoPE-based Tensor Attention Transformers
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Mingda Wan
739
13
0
23 Dec 2024
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal LanguagesInternational Conference on Learning Representations (ICLR), 2024
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Robert Bamler
Brian DuSell
NAI
686
20
0
11 Nov 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
364
13
0
17 Oct 2024
Learning Linear Attention in Polynomial Time
Learning Linear Attention in Polynomial Time
Morris Yau
Ekin Akyürek
Jiayuan Mao
Joshua B. Tenenbaum
Stefanie Jegelka
Jacob Andreas
585
4
0
14 Oct 2024
Can Transformers Reason Logically? A Study in SAT Solving
Can Transformers Reason Logically? A Study in SAT Solving
Leyan Pan
Vijay Ganesh
Jacob Abernethy
Chris Esposo
Wenke Lee
ReLMLRM
518
13
0
09 Oct 2024
Mechanistic?
Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024
Naomi Saphra
Sarah Wiegreffe
AI4CE
329
43
0
07 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Fundamental Limitations on Subquadratic Alternatives to TransformersInternational Conference on Learning Representations (ICLR), 2024
Josh Alman
Hantao Yu
554
7
0
05 Oct 2024
ENTP: Encoder-only Next Token Prediction
ENTP: Encoder-only Next Token Prediction
Ethan Ewer
Daewon Chae
Thomas Zeng
Jinkyu Kim
Kangwook Lee
412
7
0
02 Oct 2024
Transformers in Uniform TC$^0$
Transformers in Uniform TC0^00
David Chiang
512
0
0
20 Sep 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
475
2
0
15 Jul 2024
Algorithmic Language Models with Neurally Compiled Libraries
Algorithmic Language Models with Neurally Compiled Libraries
Lucas Saldyt
Subbarao Kambhampati
LRM
394
0
0
06 Jul 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Robert Bamler
ReLMLRM
445
27
0
20 Jun 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Nadav Borenstein
Anej Svete
R. Chan
Josef Valvoda
Franz Nowak
Isabelle Augenstein
Eleanor Chodroff
Robert Bamler
957
22
0
06 Jun 2024
Contextual Counting: A Mechanistic Study of Transformers on a
  Quantitative Task
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Siavash Golkar
Alberto Bietti
Mariel Pettee
Michael Eickenberg
M. Cranmer
...
Ruben Ohana
Liam Parker
Bruno Régaldo-Saint Blancard
Kyunghyun Cho
Shirley Ho
233
6
0
30 May 2024
Language Models Need Inductive Biases to Count Inductively
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
380
22
0
30 May 2024
12
Next
Page 1 of 2