Softmax Attention with Constant Cost per Token

8 April 2024

Papers citing "Softmax Attention with Constant Cost per Token"

3 / 3 papers shown

Title
Repeat After Me: Transformers are Better than State Space Models at Copying Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 95 78 0 01 Feb 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 248 1,986 0 31 Dec 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 238 579 0 12 Mar 2020