v1v2 (latest)

Adaptive Attention Span in Transformers

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

19 May 2019

Papers citing "Adaptive Attention Span in Transformers"

50 / 201 papers shown

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

315

30 Sep 2021

UFO-ViT: High Performance Linear Vision Transformer without Softmax

Jeonggeun Song

ViT

325

29 Sep 2021

Do Long-Range Language Models Actually Use Long-Range Context?

Simeng Sun

Kalpesh Krishna

Andrew Mattarella-Micke

Mohit Iyyer

RALM

259

100

19 Sep 2021

Adaptive Multi-Resolution Attention with Linear ComplexityIEEE International Joint Conference on Neural Network (IJCNN), 2021

Yao Zhang

Yunpu Ma

T. Seidl

Volker Tresp

118

10 Aug 2021

Making Transformers Solve Compositional TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Santiago Ontañón

Joshua Ainslie

Vaclav Cvicek

Zachary Kenneth Fisher

270

09 Aug 2021

Lyapunov-based uncertainty-aware safe reinforcement learning

Ashkan B. Jeddi

Nariman L. Dehghani

A. Shafieezadeh

147

29 Jul 2021

Long-Short Transformer: Efficient Transformers for Language and Vision

442

162

05 Jul 2021

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

Rahma Chaabouni

Roberto Dessì

Eugene Kharitonov

262

03 Jul 2021

XCiT: Cross-Covariance Image TransformersNeural Information Processing Systems (NeurIPS), 2021

...

Natalia Neverova

446

614

17 Jun 2021

An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality EstimatesComputer Speech and Language (CSL), 2021

Shrikanth Narayanan

141

15 Jun 2021

A Survey of TransformersAI Open (AO), 2021

Tianyang Lin

Yuxin Wang

Xiangyang Liu

Xipeng Qiu

ViT

456

1,396

08 Jun 2021

Staircase Attention for Recurrent Processing of SequencesNeural Information Processing Systems (NeurIPS), 2021

Da Ju

Stephen Roller

Sainbayar Sukhbaatar

Jason Weston

160

08 Jun 2021

An Attention Free Transformer

409

164

28 May 2021

Sound Event Detection with Adaptive Frequency SelectionIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021

202

17 May 2021

Not All Memories are Created Equal: Learning to Forget by ExpiringInternational Conference on Machine Learning (ICML), 2021

Jason Weston

Angela Fan

CLL

239

13 May 2021

Lawformer: A Pre-trained Language Model for Chinese Legal Long DocumentsAI Open (AO), 2021

Chaojun Xiao

Maosong Sun

257

302

09 May 2021

Adapting Long Context NLM for ASR Rescoring in Conversational AgentsInterspeech (Interspeech), 2021

241

21 Apr 2021

Improving Transformer-Kernel Ranking Model Using Conformer and Query Term IndependenceAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021

189

19 Apr 2021

Go Forth and Prosper: Language Modeling with Ancient Textual History

Rik Koncel-Kedziorski

Noah A. Smith

KELM

127

18 Apr 2021

Revisiting Simple Neural Probabilistic Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Simeng Sun

Mohit Iyyer

167

08 Apr 2021

Efficient Attentions for Long Document SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

L. Huang

Shuyang Cao

Nikolaus Nova Parulian

Heng Ji

Lu Wang

330

366

05 Apr 2021

Attention, please! A survey of Neural Attention Models in Deep LearningArtificial Intelligence Review (AIR), 2021

Alana de Santana Correia

Esther Luna Colombini

HAI

337

259

31 Mar 2021

A Practical Survey on Faster and Lighter TransformersACM Computing Surveys (CSUR), 2021

Quentin Fournier

G. Caron

Daniel Aloise

387

139

26 Mar 2021

Mask Attention Networks: Rethinking and Strengthen TransformerNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Dayiheng Liu

Xuanjing Huang

154

25 Mar 2021

Finetuning Pretrained Transformers into RNNsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Hao Peng

325

24 Mar 2021

ConViT: Improving Vision Transformers with Soft Convolutional Inductive BiasesInternational Conference on Machine Learning (ICML), 2021

447

963

19 Mar 2021

Perceiver: General Perception with Iterative AttentionInternational Conference on Machine Learning (ICML), 2021

585

1,273

04 Mar 2021

Random Feature AttentionInternational Conference on Learning Representations (ICLR), 2021

Hao Peng

Lingpeng Kong

350

409

03 Mar 2021

When Attention Meets Fast Recurrence: Training Language Models with Reduced ComputeConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Tao Lei

RALM VLM

337

24 Feb 2021

Provably Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning

165

22 Feb 2021

Evolving Attention with Residual ConvolutionsInternational Conference on Machine Learning (ICML), 2021

Yujing Wang

Jing Bai

Gao Huang

219

20 Feb 2021

Transformer Language Models with LSTM-based Cross-utterance Information RepresentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

G. Sun

Chuxu Zhang

P. Woodland

234

12 Feb 2021

Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile DevicesDesign Automation Conference (DAC), 2021

Caiwen Ding

159

12 Feb 2021

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021

577

3,178

11 Jan 2021

Shortformer: Better Language Modeling using Shorter InputsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Ofir Press

Noah A. Smith

M. Lewis

667

31 Dec 2020

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural LanguageIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

198

04 Dec 2020

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP InferenceMicro (MICRO), 2020

Coleman Hooper

...

David Brooks

437

149

28 Nov 2020

General Multi-label Image Classification with TransformersComputer Vision and Pattern Recognition (CVPR), 2020

223

326

27 Nov 2020

Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction

Ethan M. Rudd

Ahmed Abdallah

140

05 Nov 2020

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers

596

353

05 Nov 2020

Long Document Ranking with Query-Directed Sparse TransformerFindings (Findings), 2020

182

23 Oct 2020

Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries

Jiwei Li

226

14 Oct 2020

Zero-shot Entity Linking with Efficient Long Range Sequence Modeling

230

12 Oct 2020

SMYRF: Efficient Attention using Asymmetric Clustering

250

11 Oct 2020

Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

108

06 Oct 2020

Transformers for Modeling Physical SystemsNeural Networks (NN), 2020

N. Geneva

N. Zabaras

AI4CE

614

194

04 Oct 2020

Which *BERT? A Survey Organizing Contextualized EncodersConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Patrick Xia

Shijie Wu

Benjamin Van Durme

227

02 Oct 2020

Grounded Compositional Outputs for Adaptive Language ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

238

24 Sep 2020

Current Limitations of Language Models: What You Need is Retrieval

Aran Komatsuzaki

LRM

129

15 Sep 2020

Efficient Transformers: A SurveyACM Computing Surveys (ACM CSUR), 2020

876

1,370

14 Sep 2020