Pretraining Without Attention

20 December 2022

Papers citing "Pretraining Without Attention"

6 / 6 papers shown

Title
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks Jerome Sieber Carmen Amo Alonso A. Didier M. Zeilinger Antonio Orvieto AAML 42 7 0 24 May 2024
Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models Mohammad Shahab Sepehri Zalan Fabian Mahdi Soltanolkotabi 29 5 0 26 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling Mahdi Karami Ali Ghodsi VLM 29 6 0 28 Feb 2024
Focus Your Attention (with Adaptive IIR Filters) Shahar Lutati Itamar Zimerman Lior Wolf 24 9 0 24 May 2023
Transformer Quality in Linear Time Weizhe Hua Zihang Dai Hanxiao Liu Quoc V. Le 71 220 0 21 Feb 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018