Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.10544
Cited By
Pretraining Without Attention
20 December 2022
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pretraining Without Attention"
6 / 6 papers shown
Title
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
42
7
0
24 May 2024
Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models
Mohammad Shahab Sepehri
Zalan Fabian
Mahdi Soltanolkotabi
29
5
0
26 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
29
6
0
28 Feb 2024
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
24
9
0
24 May 2023
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
220
0
21 Feb 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1