Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.07799
Cited By
v1
v2 (latest)
Adaptive Attention Span in Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adaptive Attention Span in Transformers"
50 / 201 papers shown
A Quantitative Review on Language Model Efficiency Research
Meng Jiang
Hy Dang
Lingbo Tong
206
0
0
28 May 2023
Landmark Attention: Random-Access Infinite Context Length for Transformers
Neural Information Processing Systems (NeurIPS), 2023
Amirkeivan Mohtashami
Martin Jaggi
LLMAG
344
197
0
25 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziwei He
Meng Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
346
21
0
24 May 2023
Leveraging Synthetic Targets for Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
147
2
0
07 May 2023
Leveraging BERT Language Model for Arabic Long Document Classification
Muhammad Al-Qurishi
182
1
0
04 May 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
290
2
0
17 Apr 2023
Accelerating Trajectory Generation for Quadrotors Using Transformers
Conference on Learning for Dynamics & Control (L4DC), 2023
Srinath Tankasala
Mitch Pryor
125
2
0
27 Mar 2023
Real-time speech enhancement with dynamic attention span
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chengyu Zheng
Yuan-yuan Zhou
Xiulian Peng
Yuanyuan Zhang
Yan Lu
187
4
0
21 Feb 2023
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images
Isprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023
Kun Li
G. Vosselman
M. Yang
219
19
0
23 Jan 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
230
7
0
23 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
International Conference on Machine Learning (ICML), 2022
Jonas Geiping
Tom Goldstein
MoE
276
103
0
28 Dec 2022
EIT: Enhanced Interactive Transformer
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
295
3
0
20 Dec 2022
Convolution-enhanced Evolving Attention Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yujing Wang
Yaming Yang
Zhuowan Li
Jiangang Bai
Mingliang Zhang
Xiangtai Li
Jiahao Yu
Ce Zhang
Gao Huang
Yu Tong
ViT
309
9
0
16 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
332
37
0
15 Dec 2022
Transformers for End-to-End InfoSec Tasks: A Feasibility Study
Ethan M. Rudd
Mohammad Saidur Rahman
Philip Tully
212
6
0
05 Dec 2022
Fast Inference from Transformers via Speculative Decoding
International Conference on Machine Learning (ICML), 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
688
1,191
0
30 Nov 2022
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Wenhao Li
Xiaoyuan Yi
Jinyi Hu
Maosong Sun
Xing Xie
241
2
0
14 Nov 2022
Efficiently Scaling Transformer Inference
Conference on Machine Learning and Systems (MLSys), 2022
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
351
492
0
09 Nov 2022
Conversation-oriented ASR with multi-look-ahead CBS architecture
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Huaibo Zhao
S. Fujie
Tetsuji Ogawa
Jin Sakuma
Yusuke Kida
Tetsunori Kobayashi
247
3
0
02 Nov 2022
Salience Allocation as Guidance for Abstractive Summarization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Fei Wang
Kaiqiang Song
Hongming Zhang
Lifeng Jin
Sangwoo Cho
Wenlin Yao
Xiaoyang Wang
Muhao Chen
Dong Yu
180
43
0
22 Oct 2022
Breaking BERT: Evaluating and Optimizing Sparsified Attention
Siddhartha Brahma
Polina Zablotskaia
David M. Mimno
163
1
0
07 Oct 2022
Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization
IEEE International Joint Conference on Neural Network (IJCNN), 2022
Congbo Ma
Wei Emma Zhang
Pitawelayalage Dasun Dileepa Pitawela
Yutong Qu
Haojie Zhuang
Hu Wang
262
3
0
13 Sep 2022
Horizontal and Vertical Attention in Transformers
Litao Yu
Shuai Liu
ViT
148
1
0
10 Jul 2022
Efficient Representation Learning via Adaptive Context Pooling
International Conference on Machine Learning (ICML), 2022
Chen Huang
Walter A. Talbott
Navdeep Jaitly
J. Susskind
206
9
0
05 Jul 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
341
40
0
01 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Neural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
851
3,482
0
27 May 2022
X-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
Heung-Chang Lee
ViT
111
3
0
27 May 2022
Training Language Models with Memory Augmentation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
744
145
0
25 May 2022
Adaptable Adapters
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
203
20
0
03 May 2022
A survey on attention mechanisms for medical applications: are we moving towards better algorithms?
IEEE Access (IEEE Access), 2022
Tiago Gonçalves
Isabel Rio-Torto
Luís F. Teixeira
J. S. Cardoso
OOD
MedIm
214
54
0
26 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
IEEE Transactions on Image Processing (IEEE TIP), 2022
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
158
57
0
16 Apr 2022
LaMemo: Language Modeling with Look-Ahead Memory
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Haozhe Ji
Rongsheng Zhang
Zhenyu Yang
Zhipeng Hu
Shiyu Huang
KELM
RALM
CLL
168
4
0
15 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
International Conference on Language Resources and Evaluation (LREC), 2022
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
295
10
0
11 Apr 2022
COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing Tasks
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Fangyi Zhu
See-Kiong Ng
S. Bressan
LRM
164
1
0
01 Apr 2022
Linearizing Transformer with Key-Value Memory
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yizhe Zhang
Deng Cai
326
6
0
23 Mar 2022
DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature Extraction
AAAI Conference on Artificial Intelligence (AAAI), 2022
Jiajun Fei
Ziyu Zhu
Wenlei Liu
Zhidong Deng
Mingyang Li
Huanjun Deng
Shuo Zhang
3DPC
248
6
0
08 Mar 2022
Mukayese: Turkish NLP Strikes Back
Findings (Findings), 2022
Ali Safaya
Emirhan Kurtulucs
Arda Goktougan
Deniz Yuret
239
28
0
02 Mar 2022
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
168
3
0
12 Feb 2022
Learning strides in convolutional neural networks
International Conference on Learning Representations (ICLR), 2022
Rachid Riad
O. Teboul
David Grangier
Neil Zeghidour
164
50
0
03 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
489
245
0
20 Jan 2022
SMDT: Selective Memory-Augmented Neural Document Translation
Xu Zhang
Jian Yang
Haoyang Huang
Shuming Ma
Dongdong Zhang
Jinlong Li
Furu Wei
121
2
0
05 Jan 2022
Adaptive Token Sampling For Efficient Vision Transformers
Mohsen Fayyaz
Soroush Abbasi Koohpayegani
F. Jafari
Sunando Sengupta
Hamid Reza Vaezi Joze
Eric Sommerlade
Hamed Pirsiavash
Juergen Gall
ViT
379
222
0
30 Nov 2021
Sparse is Enough in Scaling Transformers
Sebastian Jaszczur
Aakanksha Chowdhery
Afroz Mohiuddin
Lukasz Kaiser
Wojciech Gajewski
Henryk Michalewski
Jonni Kanerva
MoE
160
120
0
24 Nov 2021
Local Multi-Head Channel Self-Attention for Facial Expression Recognition
Roberto Pecoraro
Valerio Basile
Viviana Bono
Sara Gallo
ViT
316
62
0
14 Nov 2021
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Neural Information Processing Systems (NeurIPS), 2021
Beidi Chen
Tri Dao
Eric Winsor
Zhao Song
Atri Rudra
Christopher Ré
177
152
0
28 Oct 2021
Hierarchical Transformers Are More Efficient Language Models
Piotr Nawrot
Szymon Tworkowski
Michał Tyrolski
Lukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
296
97
0
26 Oct 2021
An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021
Huaibo Zhao
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
97
4
0
20 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Leilei Gan
Jiwei Li
LRM
544
45
0
17 Oct 2021
Efficient Training of Audio Transformers with Patchout
Interspeech (Interspeech), 2021
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
552
357
0
11 Oct 2021
Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling
Kyuhong Shim
Iksoo Choi
Wonyong Sung
Jungwook Choi
126
20
0
07 Oct 2021
Previous
1
2
3
4
5
Next
Page 2 of 5