v1v2v3 (latest)

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

24 February 2021

Papers citing "When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute"

32 / 32 papers shown

Recurrence Meets Transformers for Universal Multimodal Retrieval

176

10 Sep 2025

Energy-Based Models for Predicting Mutational Effects on Proteins

14 Aug 2025

Thought calibration: Efficient and confident test-time scaling

281

23 May 2025

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Gabriel Lindenmaier

Sean Papay

Sebastian Padó

356

02 Feb 2025

SkipSNN: Efficiently Classifying Spike Trains with Event-attentionBigData Congress [Services Society] (BSS), 2024

129

29 Oct 2024

Cottention: Linear Transformers With Cosine Attention

Gabriel Mongaras

Trevor Dohm

Eric C. Larson

165

27 Sep 2024

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

309

26 Feb 2024

Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

278

06 Nov 2023

Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023

Albert Mohwald

248

28 Sep 2023

On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training ResearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Made Nindyatama Nityasya

Haryo Akbarianto Wibowo

190

05 Jun 2023

Multi-Head State Space Model for Speech RecognitionInterspeech (Interspeech), 2023

...

Ozlem Kalinli

160

21 May 2023

Conditional Adapters: Parameter-efficient Transfer Learning with Fast InferenceNeural Information Processing Systems (NeurIPS), 2023

Joshua Ainslie

...

223

11 Apr 2023

Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation EquationNeural Information Processing Systems (NeurIPS), 2023

146

25 Jan 2023

Circling Back to Recurrent Models of Language

Gábor Melis

234

03 Nov 2022

Fine-Tuning Pre-trained Transformers into Decaying Fast WeightsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

H. H. Mao

246

09 Oct 2022

Reprogramming Pretrained Language Models for Antibody Sequence InfillingInternational Conference on Machine Learning (ICML), 2022

Igor Melnyk

Vijil Chenthamarakshan

227

05 Oct 2022

Mega: Moving Average Equipped Gated AttentionInternational Conference on Learning Representations (ICLR), 2022

Graham Neubig

Luke Zettlemoyer

331

217

21 Sep 2022

Adapting Pretrained Text-to-Text Models for Long Text SequencesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Yashar Mehdad

177

21 Sep 2022

Exploiting Expert Knowledge for Assigning Firms to Industries: A Novel Deep Learning Method

162

11 Sep 2022

Confident Adaptive Language ModelingNeural Information Processing Systems (NeurIPS), 2022

750

221

14 Jul 2022

Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement

Wengong Jin

Regina Barzilay

Tommi Jaakkola

125

14 Jul 2022

Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022

525

331

27 Jun 2022

Training Language Models with Memory AugmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

728

144

25 May 2022

Simple Recurrence Improves Masked Language Models

204

23 May 2022

Implicit N-grams Induced by RecurrenceNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Xiaobing Sun

Wei Lu

197

05 May 2022

Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022

448

131

11 Mar 2022

Mukayese: Turkish NLP Strikes BackFindings (Findings), 2022

232

02 Mar 2022

Simple Local Attentions Remain Competitive for Long-Context Tasks

Yashar Mehdad

229

14 Dec 2021

SRU++: Pioneering Fast Recurrence with Attention for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Kwangyoun Kim

104

11 Oct 2021

Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-designInternational Conference on Learning Representations (ICLR), 2021

227

157

09 Oct 2021

Efficient Inference for Multilingual Neural Machine Translation

317

14 Sep 2021

Finetuning Pretrained Transformers into RNNsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Hao Peng

317

24 Mar 2021