ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.07799
  4. Cited By
Adaptive Attention Span in Transformers
v1v2 (latest)

Adaptive Attention Span in Transformers

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
ArXiv (abs)PDFHTML

Papers citing "Adaptive Attention Span in Transformers"

50 / 201 papers shown
Redesigning the Transformer Architecture with Insights from
  Multi-particle Dynamical Systems
Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
Subhabrata Dutta
Tanya Gautam
Soumen Chakrabarti
Tanmoy Chakraborty
315
25
0
30 Sep 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
325
27
0
29 Sep 2021
Do Long-Range Language Models Actually Use Long-Range Context?
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
259
100
0
19 Sep 2021
Adaptive Multi-Resolution Attention with Linear Complexity
Adaptive Multi-Resolution Attention with Linear ComplexityIEEE International Joint Conference on Neural Network (IJCNN), 2021
Yao Zhang
Yunpu Ma
T. Seidl
Volker Tresp
118
2
0
10 Aug 2021
Making Transformers Solve Compositional Tasks
Making Transformers Solve Compositional TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Santiago Ontañón
Joshua Ainslie
Vaclav Cvicek
Zachary Kenneth Fisher
270
85
0
09 Aug 2021
Lyapunov-based uncertainty-aware safe reinforcement learning
Lyapunov-based uncertainty-aware safe reinforcement learning
Ashkan B. Jeddi
Nariman L. Dehghani
A. Shafieezadeh
147
10
0
29 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViTVLM
442
162
0
05 Jul 2021
Can Transformers Jump Around Right in Natural Language? Assessing
  Performance Transfer from SCAN
Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN
Rahma Chaabouni
Roberto Dessì
Eugene Kharitonov
262
20
0
03 Jul 2021
XCiT: Cross-Covariance Image Transformers
XCiT: Cross-Covariance Image TransformersNeural Information Processing Systems (NeurIPS), 2021
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Edouard Grave
ViT
446
614
0
17 Jun 2021
An Automated Quality Evaluation Framework of Psychotherapy Conversations
  with Local Quality Estimates
An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality EstimatesComputer Speech and Language (CSL), 2021
Zhuohao Chen
Nikolaos Flemotomos
Karan Singla
Torrey A. Creed
David C. Atkins
Shrikanth Narayanan
141
9
0
15 Jun 2021
A Survey of Transformers
A Survey of TransformersAI Open (AO), 2021
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
456
1,396
0
08 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of SequencesNeural Information Processing Systems (NeurIPS), 2021
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
160
13
0
08 Jun 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
409
164
0
28 May 2021
Sound Event Detection with Adaptive Frequency Selection
Sound Event Detection with Adaptive Frequency SelectionIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021
Zhepei Wang
Jonah Casebeer
Adam Clemmitt
Efthymios Tzinis
Paris Smaragdis
202
2
0
17 May 2021
Not All Memories are Created Equal: Learning to Forget by Expiring
Not All Memories are Created Equal: Learning to Forget by ExpiringInternational Conference on Machine Learning (ICML), 2021
Sainbayar Sukhbaatar
Da Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
CLL
239
36
0
13 May 2021
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
Lawformer: A Pre-trained Language Model for Chinese Legal Long DocumentsAI Open (AO), 2021
Chaojun Xiao
Xueyu Hu
Zhiyuan Liu
Cunchao Tu
Maosong Sun
AILawELM
257
302
0
09 May 2021
Adapting Long Context NLM for ASR Rescoring in Conversational Agents
Adapting Long Context NLM for ASR Rescoring in Conversational AgentsInterspeech (Interspeech), 2021
Ashish Shenoy
S. Bodapati
Monica Sunkara
S. Ronanki
Katrin Kirchhoff
241
21
0
21 Apr 2021
Improving Transformer-Kernel Ranking Model Using Conformer and Query
  Term Independence
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term IndependenceAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
189
9
0
19 Apr 2021
Go Forth and Prosper: Language Modeling with Ancient Textual History
Go Forth and Prosper: Language Modeling with Ancient Textual History
Rik Koncel-Kedziorski
Noah A. Smith
KELM
127
0
0
18 Apr 2021
Revisiting Simple Neural Probabilistic Language Models
Revisiting Simple Neural Probabilistic Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Simeng Sun
Mohit Iyyer
167
15
0
08 Apr 2021
Efficient Attentions for Long Document Summarization
Efficient Attentions for Long Document SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
L. Huang
Shuyang Cao
Nikolaus Nova Parulian
Heng Ji
Lu Wang
330
366
0
05 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep LearningArtificial Intelligence Review (AIR), 2021
Alana de Santana Correia
Esther Luna Colombini
HAI
337
259
0
31 Mar 2021
A Practical Survey on Faster and Lighter Transformers
A Practical Survey on Faster and Lighter TransformersACM Computing Surveys (CSUR), 2021
Quentin Fournier
G. Caron
Daniel Aloise
387
139
0
26 Mar 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Mask Attention Networks: Rethinking and Strengthen TransformerNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
154
78
0
25 Mar 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
325
81
0
24 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive
  Biases
ConViT: Improving Vision Transformers with Soft Convolutional Inductive BiasesInternational Conference on Machine Learning (ICML), 2021
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
447
963
0
19 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative AttentionInternational Conference on Machine Learning (ICML), 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLMViTMDE
585
1,273
0
04 Mar 2021
Random Feature Attention
Random Feature AttentionInternational Conference on Learning Representations (ICLR), 2021
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
350
409
0
03 Mar 2021
When Attention Meets Fast Recurrence: Training Language Models with
  Reduced Compute
When Attention Meets Fast Recurrence: Training Language Models with Reduced ComputeConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Tao Lei
RALMVLM
337
54
0
24 Feb 2021
Provably Improved Context-Based Offline Meta-RL with Attention and
  Contrastive Learning
Provably Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
Lanqing Li
Yuanhao Huang
Mingzhe Chen
Siteng Luo
Dijun Luo
Junzhou Huang
OffRL
165
3
0
22 Feb 2021
Evolving Attention with Residual Convolutions
Evolving Attention with Residual ConvolutionsInternational Conference on Machine Learning (ICML), 2021
Yujing Wang
Yaming Yang
Jiangang Bai
Mingliang Zhang
Jing Bai
Jiahao Yu
Ce Zhang
Gao Huang
Yunhai Tong
ViT
219
41
0
20 Feb 2021
Transformer Language Models with LSTM-based Cross-utterance Information
  Representation
Transformer Language Models with LSTM-based Cross-utterance Information RepresentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
G. Sun
Chuxu Zhang
P. Woodland
234
35
0
12 Feb 2021
Dancing along Battery: Enabling Transformer with Run-time
  Reconfigurability on Mobile Devices
Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile DevicesDesign Automation Conference (DAC), 2021
Yuhong Song
Weiwen Jiang
Bingbing Li
Panjie Qi
Qingfeng Zhuge
E. Sha
Sakyasingha Dasgupta
Yiyu Shi
Caiwen Ding
159
21
0
12 Feb 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
577
3,178
0
11 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter InputsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Ofir Press
Noah A. Smith
M. Lewis
667
96
0
31 Dec 2020
Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with
  Natural Language
Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural LanguageIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Songyang Zhang
Houwen Peng
Jianlong Fu
Yijuan Lu
Jiebo Luo
198
64
0
04 Dec 2020
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
  Multi-Task NLP Inference
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP InferenceMicro (MICRO), 2020
Thierry Tambe
Coleman Hooper
Lillian Pentecost
Tianyu Jia
En-Yu Yang
...
Victor Sanh
P. Whatmough
Alexander M. Rush
David Brooks
Gu-Yeon Wei
437
149
0
28 Nov 2020
General Multi-label Image Classification with Transformers
General Multi-label Image Classification with TransformersComputer Vision and Pattern Recognition (CVPR), 2020
Jack Lanchantin
Tianlu Wang
Vicente Ordonez
Yanjun Qi
ViT
223
326
0
27 Nov 2020
Training Transformers for Information Security Tasks: A Case Study on
  Malicious URL Prediction
Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction
Ethan M. Rudd
Ahmed Abdallah
140
7
0
05 Nov 2020
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence
  Perspective with Transformers
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
Zhaoshuo Li
Xingtong Liu
Nathan G. Drenkow
Andy S Ding
Francis X. Creighton
Russell H. Taylor
Mathias Unberath
MDEViT
596
353
0
05 Nov 2020
Long Document Ranking with Query-Directed Sparse Transformer
Long Document Ranking with Query-Directed Sparse TransformerFindings (Findings), 2020
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
182
27
0
23 Oct 2020
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical
  Supervision from Extractive Summaries
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
226
24
0
14 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
230
24
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
250
49
0
11 Oct 2020
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for
  Low-Latency Inference in NLP Applications
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications
Matthew Khoury
Rumen Dangovski
L. Ou
Preslav Nakov
Yichen Shen
L. Jing
108
0
0
06 Oct 2020
Transformers for Modeling Physical Systems
Transformers for Modeling Physical SystemsNeural Networks (NN), 2020
N. Geneva
N. Zabaras
AI4CE
614
194
0
04 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Which *BERT? A Survey Organizing Contextualized EncodersConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Patrick Xia
Shijie Wu
Benjamin Van Durme
227
53
0
02 Oct 2020
Grounded Compositional Outputs for Adaptive Language Modeling
Grounded Compositional Outputs for Adaptive Language ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Nikolaos Pappas
Phoebe Mulcaire
Noah A. Smith
KELM
238
8
0
24 Sep 2020
Current Limitations of Language Models: What You Need is Retrieval
Current Limitations of Language Models: What You Need is Retrieval
Aran Komatsuzaki
LRM
129
3
0
15 Sep 2020
Efficient Transformers: A Survey
Efficient Transformers: A SurveyACM Computing Surveys (ACM CSUR), 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
876
1,370
0
14 Sep 2020
Previous
12345
Next
Page 3 of 5