ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.12459
  4. Cited By
When Attention Meets Fast Recurrence: Training Language Models with
  Reduced Compute
v1v2v3 (latest)

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
24 February 2021
Tao Lei
    RALMVLM
ArXiv (abs)PDFHTML

Papers citing "When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute"

32 / 32 papers shown
Recurrence Meets Transformers for Universal Multimodal Retrieval
Recurrence Meets Transformers for Universal Multimodal Retrieval
Davide Caffagni
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
176
1
0
10 Sep 2025
Energy-Based Models for Predicting Mutational Effects on Proteins
Energy-Based Models for Predicting Mutational Effects on Proteins
Patrick Soga
Zhenyu Lei
Yinhan He
Camille Bilodeau
Jundong Li
62
0
0
14 Aug 2025
Thought calibration: Efficient and confident test-time scaling
Thought calibration: Efficient and confident test-time scaling
Menghua Wu
Cai Zhou
Stephen Bates
Tommi Jaakkola
LRM
281
3
0
23 May 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
356
0
0
02 Feb 2025
SkipSNN: Efficiently Classifying Spike Trains with Event-attention
SkipSNN: Efficiently Classifying Spike Trains with Event-attentionBigData Congress [Services Society] (BSS), 2024
Hang Yin
Yao Su
Liping Liu
Thomas Hartvigsen
Xin Dai
Xiangnan Kong
129
0
0
29 Oct 2024
Cottention: Linear Transformers With Cosine Attention
Cottention: Linear Transformers With Cosine Attention
Gabriel Mongaras
Trevor Dohm
Eric C. Larson
165
2
0
27 Sep 2024
DenseMamba: State Space Models with Dense Hidden Connection for
  Efficient Large Language Models
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
Wei He
Kai Han
Yehui Tang
Chengcheng Wang
Yujie Yang
Tianyu Guo
Yunhe Wang
Mamba
309
36
0
26 Feb 2024
Improving Machine Translation with Large Language Models: A Preliminary
  Study with Cooperative Decoding
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
278
14
0
06 Nov 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023
Albert Mohwald
248
26
0
28 Sep 2023
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model
  Pre-Training Research
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training ResearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Alham Fikri Aji
Genta Indra Winata
Radityo Eko Prasojo
Phil Blunsom
A. Kuncoro
190
9
0
05 Jun 2023
Multi-Head State Space Model for Speech Recognition
Multi-Head State Space Model for Speech RecognitionInterspeech (Interspeech), 2023
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
Junteng Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark Gales
160
19
0
21 May 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast InferenceNeural Information Processing Systems (NeurIPS), 2023
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Yue Liu
Yu Zhang
Ming-Wei Chang
BDLAI4CE
223
80
0
11 Apr 2023
Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's
  Rotation Equation
Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation EquationNeural Information Processing Systems (NeurIPS), 2023
Wengong Jin
Siranush Sarkizova
Xun Chen
N. Hacohen
Caroline Uhler
146
35
0
25 Jan 2023
Circling Back to Recurrent Models of Language
Circling Back to Recurrent Models of Language
Gábor Melis
234
0
0
03 Nov 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
Fine-Tuning Pre-trained Transformers into Decaying Fast WeightsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
H. H. Mao
246
26
0
09 Oct 2022
Reprogramming Pretrained Language Models for Antibody Sequence Infilling
Reprogramming Pretrained Language Models for Antibody Sequence InfillingInternational Conference on Machine Learning (ICML), 2022
Igor Melnyk
Vijil Chenthamarakshan
Pin-Yu Chen
Payel Das
Amit Dhurandhar
Inkit Padhi
Devleena Das
227
39
0
05 Oct 2022
Mega: Moving Average Equipped Gated Attention
Mega: Moving Average Equipped Gated AttentionInternational Conference on Learning Representations (ICLR), 2022
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
331
217
0
21 Sep 2022
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Adapting Pretrained Text-to-Text Models for Long Text SequencesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Wenhan Xiong
Anchit Gupta
Shubham Toshniwal
Yashar Mehdad
Anuj Kumar
RALMVLM
177
33
0
21 Sep 2022
Exploiting Expert Knowledge for Assigning Firms to Industries: A Novel
  Deep Learning Method
Exploiting Expert Knowledge for Assigning Firms to Industries: A Novel Deep Learning Method
Xiaohang Zhao
Xiao Fang
Jing He
Lihua Huang
162
9
0
11 Sep 2022
Confident Adaptive Language Modeling
Confident Adaptive Language ModelingNeural Information Processing Systems (NeurIPS), 2022
Tal Schuster
Adam Fisch
Jai Gupta
Mostafa Dehghani
Dara Bahri
Vinh Q. Tran
Yi Tay
Donald Metzler
750
221
0
14 Jul 2022
Antibody-Antigen Docking and Design via Hierarchical Equivariant
  Refinement
Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement
Wengong Jin
Regina Barzilay
Tommi Jaakkola
125
29
0
14 Jul 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
525
331
0
27 Jun 2022
Training Language Models with Memory Augmentation
Training Language Models with Memory AugmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
728
144
0
25 May 2022
Simple Recurrence Improves Masked Language Models
Simple Recurrence Improves Masked Language Models
Tao Lei
Ran Tian
Jasmijn Bastings
Ankur P. Parikh
204
4
0
23 May 2022
Implicit N-grams Induced by Recurrence
Implicit N-grams Induced by RecurrenceNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Xiaobing Sun
Wei Lu
197
4
0
05 May 2022
Block-Recurrent Transformers
Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
448
131
0
11 Mar 2022
Mukayese: Turkish NLP Strikes Back
Mukayese: Turkish NLP Strikes BackFindings (Findings), 2022
Ali Safaya
Emirhan Kurtulucs
Arda Goktougan
Deniz Yuret
232
28
0
02 Mar 2022
Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wenhan Xiong
Barlas Ouguz
Anchit Gupta
Xilun Chen
Diana Liskovich
Omer Levy
Anuj Kumar
Yashar Mehdad
229
31
0
14 Dec 2021
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
SRU++: Pioneering Fast Recurrence with Attention for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Jing Pan
Tao Lei
Kwangyoun Kim
Kyu Jeong Han
Shinji Watanabe
VLM
104
12
0
11 Oct 2021
Iterative Refinement Graph Neural Network for Antibody
  Sequence-Structure Co-design
Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-designInternational Conference on Learning Representations (ICLR), 2021
Wengong Jin
Jeremy Wohlwend
Regina Barzilay
Tommi Jaakkola
227
157
0
09 Oct 2021
Efficient Inference for Multilingual Neural Machine Translation
Efficient Inference for Multilingual Neural Machine Translation
Alexandre Berard
Dain Lee
Stéphane Clinchant
K. Jung
Vassilina Nikoulina
317
12
0
14 Sep 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
317
81
0
24 Mar 2021
1