A Transformer with Stack Attention

7 May 2024

Papers citing "A Transformer with Stack Attention"

6 / 6 papers shown

Title
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? Zhenheng Tang Xiang Liu Qian Wang Peijie Dong Bingsheng He Xiaowen Chu Bo Li LRM 50 1 0 24 Feb 2025
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 34 1 0 01 Feb 2024
A Logic for Expressing Log-Precision Transformers William Merrill Ashish Sabharwal ReLM NAI LRM 48 46 0 06 Oct 2022
Neural Networks and the Chomsky Hierarchy Grégoire Delétang Anian Ruoss Jordi Grau-Moya Tim Genewein L. Wenliang ... Chris Cundy Marcus Hutter Shane Legg Joel Veness Pedro A. Ortega UQCV 94 129 0 05 Jul 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 242 690 0 27 Aug 2021
A Decomposable Attention Model for Natural Language Inference Ankur P. Parikh Oscar Täckström Dipanjan Das Jakob Uszkoreit 196 1,358 0 06 Jun 2016