Saturated Transformers are Constant-Depth Threshold Circuits

30 June 2021

Papers citing "Saturated Transformers are Constant-Depth Threshold Circuits"

17 / 17 papers shown

Title
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 41 0 0 29 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers William Merrill Ashish Sabharwal 53 4 0 05 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song Yufa Zhou 91 18 0 21 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections Da Xiao Qingye Meng Shengping Li Xingyuan Yuan MoE AI4CE 54 1 0 13 Feb 2025
Training Neural Networks as Recognizers of Formal Languages Alexandra Butoi Ghazal Khalighinejad Anej Svete Josef Valvoda Ryan Cotterell Brian DuSell NAI 36 2 0 11 Nov 2024
Representing Rule-based Chatbots with Transformers Dan Friedman Abhishek Panigrahi Danqi Chen 61 1 0 15 Jul 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning Franz Nowak Anej Svete Alexandra Butoi Ryan Cotterell ReLM LRM 46 12 0 20 Jun 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages Nadav Borenstein Anej Svete R. Chan Josef Valvoda Franz Nowak Isabelle Augenstein Eleanor Chodroff Ryan Cotterell 42 11 0 06 Jun 2024
Transformers as Transducers Lena Strobl Dana Angluin David Chiang Jonathan Rawski Ashish Sabharwal 27 4 0 02 Apr 2024
Transformers are Expressive, But Are They Expressive Enough for Regression? Swaroop Nath H. Khadilkar Pushpak Bhattacharyya 26 3 0 23 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 39 1 0 01 Feb 2024
On The Expressivity of Recurrent Neural Cascades Nadezda A. Knorozova Alessandro Ronca 18 1 0 14 Dec 2023
Recurrent Neural Language Models as Probabilistic Finite-state Automata Anej Svete Ryan Cotterell 32 2 0 08 Oct 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? T. Kajitsuka Issei Sato 29 16 0 26 Jul 2023
Tighter Bounds on the Expressivity of Transformer Encoders David Chiang Peter A. Cholak A. Pillay 27 53 0 25 Jan 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought Abulhair Saparov He He ELM LRM ReLM 121 275 0 03 Oct 2022
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity Sophie Hao Dana Angluin Robert Frank 11 70 0 13 Apr 2022