Linear Transformers with Learnable Kernel Functions are Better In-Context Models

16 February 2024

Papers citing "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"

7 / 7 papers shown

Title
Simple linear attention language models balance the recall-throughput tradeoff Simran Arora Sabri Eyuboglu Michael Zhang Aman Timalsina Silas Alberti Dylan Zinsley James Zou Atri Rudra Christopher Ré 39 18 0 28 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at Copying Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 92 77 0 01 Feb 2024
Zoology: Measuring and Improving Recall in Efficient Language Models Simran Arora Sabri Eyuboglu Aman Timalsina Isys Johnson Michael Poli James Zou Atri Rudra Christopher Ré 56 65 0 08 Dec 2023
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 453 0 24 Sep 2022
How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections Albert Gu Isys Johnson Aman Timalsina Atri Rudra Christopher Ré Mamba 93 88 0 24 Jun 2022
Expected Validation Performance and Estimation of a Random Variable's Maximum Jesse Dodge Suchin Gururangan Dallas Card Roy Schwartz Noah A. Smith 33 9 0 01 Oct 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 242 1,977 0 31 Dec 2020