ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.07799
  4. Cited By
Adaptive Attention Span in Transformers
v1v2 (latest)

Adaptive Attention Span in Transformers

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
ArXiv (abs)PDFHTML

Papers citing "Adaptive Attention Span in Transformers"

50 / 201 papers shown
A Quantitative Review on Language Model Efficiency Research
A Quantitative Review on Language Model Efficiency Research
Meng Jiang
Hy Dang
Lingbo Tong
206
0
0
28 May 2023
Landmark Attention: Random-Access Infinite Context Length for
  Transformers
Landmark Attention: Random-Access Infinite Context Length for TransformersNeural Information Processing Systems (NeurIPS), 2023
Amirkeivan Mohtashami
Martin Jaggi
LLMAG
344
197
0
25 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT OperatorAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziwei He
Meng Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
346
21
0
24 May 2023
Leveraging Synthetic Targets for Machine Translation
Leveraging Synthetic Targets for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
147
2
0
07 May 2023
Leveraging BERT Language Model for Arabic Long Document Classification
Leveraging BERT Language Model for Arabic Long Document Classification
Muhammad Al-Qurishi
182
1
0
04 May 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
290
2
0
17 Apr 2023
Accelerating Trajectory Generation for Quadrotors Using Transformers
Accelerating Trajectory Generation for Quadrotors Using TransformersConference on Learning for Dynamics & Control (L4DC), 2023
Srinath Tankasala
Mitch Pryor
125
2
0
27 Mar 2023
Real-time speech enhancement with dynamic attention span
Real-time speech enhancement with dynamic attention spanIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chengyu Zheng
Yuan-yuan Zhou
Xiulian Peng
Yuanyuan Zhang
Yan Lu
187
4
0
21 Feb 2023
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial
  Images
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial ImagesIsprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023
Kun Li
G. Vosselman
M. Yang
219
19
0
23 Jan 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory
  Systems
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
230
7
0
23 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022
Jonas Geiping
Tom Goldstein
MoE
276
103
0
28 Dec 2022
EIT: Enhanced Interactive Transformer
EIT: Enhanced Interactive TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
295
3
0
20 Dec 2022
Convolution-enhanced Evolving Attention Networks
Convolution-enhanced Evolving Attention NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yujing Wang
Yaming Yang
Zhuowan Li
Jiangang Bai
Mingliang Zhang
Xiangtai Li
Jiahao Yu
Ce Zhang
Gao Huang
Yu Tong
ViT
309
9
0
16 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
332
37
0
15 Dec 2022
Transformers for End-to-End InfoSec Tasks: A Feasibility Study
Transformers for End-to-End InfoSec Tasks: A Feasibility Study
Ethan M. Rudd
Mohammad Saidur Rahman
Philip Tully
212
6
0
05 Dec 2022
Fast Inference from Transformers via Speculative Decoding
Fast Inference from Transformers via Speculative DecodingInternational Conference on Machine Learning (ICML), 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
688
1,191
0
30 Nov 2022
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text
  Generation via Concentrating Attention
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Wenhao Li
Xiaoyuan Yi
Jinyi Hu
Maosong Sun
Xing Xie
241
2
0
14 Nov 2022
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer InferenceConference on Machine Learning and Systems (MLSys), 2022
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
351
492
0
09 Nov 2022
Conversation-oriented ASR with multi-look-ahead CBS architecture
Conversation-oriented ASR with multi-look-ahead CBS architectureIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Huaibo Zhao
S. Fujie
Tetsuji Ogawa
Jin Sakuma
Yusuke Kida
Tetsunori Kobayashi
247
3
0
02 Nov 2022
Salience Allocation as Guidance for Abstractive Summarization
Salience Allocation as Guidance for Abstractive SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Fei Wang
Kaiqiang Song
Hongming Zhang
Lifeng Jin
Sangwoo Cho
Wenlin Yao
Xiaoyang Wang
Muhao Chen
Dong Yu
180
43
0
22 Oct 2022
Breaking BERT: Evaluating and Optimizing Sparsified Attention
Breaking BERT: Evaluating and Optimizing Sparsified Attention
Siddhartha Brahma
Polina Zablotskaia
David M. Mimno
163
1
0
07 Oct 2022
Document-aware Positional Encoding and Linguistic-guided Encoding for
  Abstractive Multi-document Summarization
Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document SummarizationIEEE International Joint Conference on Neural Network (IJCNN), 2022
Congbo Ma
Wei Emma Zhang
Pitawelayalage Dasun Dileepa Pitawela
Yutong Qu
Haojie Zhuang
Hu Wang
262
3
0
13 Sep 2022
Horizontal and Vertical Attention in Transformers
Horizontal and Vertical Attention in Transformers
Litao Yu
Shuai Liu
ViT
148
1
0
10 Jul 2022
Efficient Representation Learning via Adaptive Context Pooling
Efficient Representation Learning via Adaptive Context PoolingInternational Conference on Machine Learning (ICML), 2022
Chen Huang
Walter A. Talbott
Navdeep Jaitly
J. Susskind
206
9
0
05 Jul 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
341
40
0
01 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
851
3,482
0
27 May 2022
X-ViT: High Performance Linear Vision Transformer without Softmax
X-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
Heung-Chang Lee
ViT
111
3
0
27 May 2022
Training Language Models with Memory Augmentation
Training Language Models with Memory AugmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
744
145
0
25 May 2022
Adaptable Adapters
Adaptable AdaptersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
203
20
0
03 May 2022
A survey on attention mechanisms for medical applications: are we moving
  towards better algorithms?
A survey on attention mechanisms for medical applications: are we moving towards better algorithms?IEEE Access (IEEE Access), 2022
Tiago Gonçalves
Isabel Rio-Torto
Luís F. Teixeira
J. S. Cardoso
OODMedIm
214
54
0
26 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language TasksIEEE Transactions on Image Processing (IEEE TIP), 2022
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
158
57
0
16 Apr 2022
LaMemo: Language Modeling with Look-Ahead Memory
LaMemo: Language Modeling with Look-Ahead MemoryNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Haozhe Ji
Rongsheng Zhang
Zhenyu Yang
Zhipeng Hu
Shiyu Huang
KELMRALMCLL
168
4
0
15 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It StopsInternational Conference on Language Resources and Evaluation (LREC), 2022
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
295
10
0
11 Apr 2022
COOL, a Context Outlooker, and its Application to Question Answering and
  other Natural Language Processing Tasks
COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing TasksInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Fangyi Zhu
See-Kiong Ng
S. Bressan
LRM
164
1
0
01 Apr 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value MemoryConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yizhe Zhang
Deng Cai
326
6
0
23 Mar 2022
DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set
  Feature Extraction
DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature ExtractionAAAI Conference on Artificial Intelligence (AAAI), 2022
Jiajun Fei
Ziyu Zhu
Wenlei Liu
Zhidong Deng
Mingyang Li
Huanjun Deng
Shuo Zhang
3DPC
248
6
0
08 Mar 2022
Mukayese: Turkish NLP Strikes Back
Mukayese: Turkish NLP Strikes BackFindings (Findings), 2022
Ali Safaya
Emirhan Kurtulucs
Arda Goktougan
Deniz Yuret
239
28
0
02 Mar 2022
Benchmark Assessment for DeepSpeed Optimization Library
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
168
3
0
12 Feb 2022
Learning strides in convolutional neural networks
Learning strides in convolutional neural networksInternational Conference on Learning Representations (ICLR), 2022
Rachid Riad
O. Teboul
David Grangier
Neil Zeghidour
164
50
0
03 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
  Long-Term Video Recognition
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionComputer Vision and Pattern Recognition (CVPR), 2022
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
489
245
0
20 Jan 2022
SMDT: Selective Memory-Augmented Neural Document Translation
SMDT: Selective Memory-Augmented Neural Document Translation
Xu Zhang
Jian Yang
Haoyang Huang
Shuming Ma
Dongdong Zhang
Jinlong Li
Furu Wei
121
2
0
05 Jan 2022
Adaptive Token Sampling For Efficient Vision Transformers
Adaptive Token Sampling For Efficient Vision Transformers
Mohsen Fayyaz
Soroush Abbasi Koohpayegani
F. Jafari
Sunando Sengupta
Hamid Reza Vaezi Joze
Eric Sommerlade
Hamed Pirsiavash
Juergen Gall
ViT
379
222
0
30 Nov 2021
Sparse is Enough in Scaling Transformers
Sparse is Enough in Scaling Transformers
Sebastian Jaszczur
Aakanksha Chowdhery
Afroz Mohiuddin
Lukasz Kaiser
Wojciech Gajewski
Henryk Michalewski
Jonni Kanerva
MoE
160
120
0
24 Nov 2021
Local Multi-Head Channel Self-Attention for Facial Expression
  Recognition
Local Multi-Head Channel Self-Attention for Facial Expression Recognition
Roberto Pecoraro
Valerio Basile
Viviana Bono
Sara Gallo
ViT
316
62
0
14 Nov 2021
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Scatterbrain: Unifying Sparse and Low-rank Attention ApproximationNeural Information Processing Systems (NeurIPS), 2021
Beidi Chen
Tri Dao
Eric Winsor
Zhao Song
Atri Rudra
Christopher Ré
177
152
0
28 Oct 2021
Hierarchical Transformers Are More Efficient Language Models
Hierarchical Transformers Are More Efficient Language Models
Piotr Nawrot
Szymon Tworkowski
Michał Tyrolski
Lukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
296
97
0
26 Oct 2021
An Investigation of Enhancing CTC Model for Triggered Attention-based
  Streaming ASR
An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASRAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021
Huaibo Zhao
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
97
4
0
20 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Leilei Gan
Jiwei Li
LRM
544
45
0
17 Oct 2021
Efficient Training of Audio Transformers with Patchout
Efficient Training of Audio Transformers with PatchoutInterspeech (Interspeech), 2021
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
552
357
0
11 Oct 2021
Layer-wise Pruning of Transformer Attention Heads for Efficient Language
  Modeling
Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling
Kyuhong Shim
Iksoo Choi
Wonyong Sung
Jungwook Choi
126
20
0
07 Oct 2021
Previous
12345
Next
Page 2 of 5