Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

3 October 2023

Papers citing "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"

6 / 6 papers shown

Title
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 39 0 0 29 Mar 2025
Training Neural Networks as Recognizers of Formal Languages Alexandra Butoi Ghazal Khalighinejad Anej Svete Josef Valvoda Ryan Cotterell Brian DuSell NAI 36 2 0 11 Nov 2024
Finding path and cycle counting formulae in graphs with Deep Reinforcement Learning Jason Piquenot Maxime Bérar Pierre Héroux Jean-Yves Ramel R. Raveaux Sébastien Adam 16 0 0 02 Oct 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 34 1 0 01 Feb 2024
The Surprising Computational Power of Nondeterministic Stack RNNs Brian DuSell David Chiang LRM 31 4 0 04 Oct 2022
Transformers Generalize Linearly Jackson Petty Robert Frank AI4CE 208 16 0 24 Sep 2021