Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.12459
Cited By
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
24 February 2021
Tao Lei
RALM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute"
11 / 11 papers shown
Title
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
24
15
0
28 Sep 2023
Circling Back to Recurrent Models of Language
Gábor Melis
27
0
0
03 Nov 2022
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Wenhan Xiong
Anchit Gupta
Shubham Toshniwal
Yashar Mehdad
Wen-tau Yih
RALM
VLM
49
30
0
21 Sep 2022
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
26
231
0
27 Jun 2022
Simple Recurrence Improves Masked Language Models
Tao Lei
Ran Tian
Jasmijn Bastings
Ankur P. Parikh
53
4
0
23 May 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
16
94
0
11 Mar 2022
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
Jing Pan
Tao Lei
Kwangyoun Kim
Kyu Jeong Han
Shinji Watanabe
VLM
17
9
0
11 Oct 2021
Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design
Wengong Jin
Jeremy Wohlwend
Regina Barzilay
Tommi Jaakkola
6
136
0
09 Oct 2021
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
219
88
0
31 Dec 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
2,009
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
578
0
12 Mar 2020
1