Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.07799
Cited By
v1
v2 (latest)
Adaptive Attention Span in Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adaptive Attention Span in Transformers"
50 / 201 papers shown
Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
Subhabrata Dutta
Tanya Gautam
Soumen Chakrabarti
Tanmoy Chakraborty
315
25
0
30 Sep 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
325
27
0
29 Sep 2021
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
259
100
0
19 Sep 2021
Adaptive Multi-Resolution Attention with Linear Complexity
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Yao Zhang
Yunpu Ma
T. Seidl
Volker Tresp
118
2
0
10 Aug 2021
Making Transformers Solve Compositional Tasks
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Santiago Ontañón
Joshua Ainslie
Vaclav Cvicek
Zachary Kenneth Fisher
270
85
0
09 Aug 2021
Lyapunov-based uncertainty-aware safe reinforcement learning
Ashkan B. Jeddi
Nariman L. Dehghani
A. Shafieezadeh
147
10
0
29 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
442
162
0
05 Jul 2021
Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN
Rahma Chaabouni
Roberto Dessì
Eugene Kharitonov
262
20
0
03 Jul 2021
XCiT: Cross-Covariance Image Transformers
Neural Information Processing Systems (NeurIPS), 2021
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Edouard Grave
ViT
446
614
0
17 Jun 2021
An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates
Computer Speech and Language (CSL), 2021
Zhuohao Chen
Nikolaos Flemotomos
Karan Singla
Torrey A. Creed
David C. Atkins
Shrikanth Narayanan
141
9
0
15 Jun 2021
A Survey of Transformers
AI Open (AO), 2021
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
456
1,396
0
08 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Neural Information Processing Systems (NeurIPS), 2021
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
160
13
0
08 Jun 2021
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
409
164
0
28 May 2021
Sound Event Detection with Adaptive Frequency Selection
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021
Zhepei Wang
Jonah Casebeer
Adam Clemmitt
Efthymios Tzinis
Paris Smaragdis
202
2
0
17 May 2021
Not All Memories are Created Equal: Learning to Forget by Expiring
International Conference on Machine Learning (ICML), 2021
Sainbayar Sukhbaatar
Da Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
CLL
239
36
0
13 May 2021
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
AI Open (AO), 2021
Chaojun Xiao
Xueyu Hu
Zhiyuan Liu
Cunchao Tu
Maosong Sun
AILaw
ELM
257
302
0
09 May 2021
Adapting Long Context NLM for ASR Rescoring in Conversational Agents
Interspeech (Interspeech), 2021
Ashish Shenoy
S. Bodapati
Monica Sunkara
S. Ronanki
Katrin Kirchhoff
241
21
0
21 Apr 2021
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
189
9
0
19 Apr 2021
Go Forth and Prosper: Language Modeling with Ancient Textual History
Rik Koncel-Kedziorski
Noah A. Smith
KELM
127
0
0
18 Apr 2021
Revisiting Simple Neural Probabilistic Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Simeng Sun
Mohit Iyyer
167
15
0
08 Apr 2021
Efficient Attentions for Long Document Summarization
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
L. Huang
Shuyang Cao
Nikolaus Nova Parulian
Heng Ji
Lu Wang
330
366
0
05 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Artificial Intelligence Review (AIR), 2021
Alana de Santana Correia
Esther Luna Colombini
HAI
337
259
0
31 Mar 2021
A Practical Survey on Faster and Lighter Transformers
ACM Computing Surveys (CSUR), 2021
Quentin Fournier
G. Caron
Daniel Aloise
387
139
0
26 Mar 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
154
78
0
25 Mar 2021
Finetuning Pretrained Transformers into RNNs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
325
81
0
24 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
International Conference on Machine Learning (ICML), 2021
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
447
963
0
19 Mar 2021
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
585
1,273
0
04 Mar 2021
Random Feature Attention
International Conference on Learning Representations (ICLR), 2021
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
350
409
0
03 Mar 2021
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Tao Lei
RALM
VLM
337
54
0
24 Feb 2021
Provably Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
Lanqing Li
Yuanhao Huang
Mingzhe Chen
Siteng Luo
Dijun Luo
Junzhou Huang
OffRL
165
3
0
22 Feb 2021
Evolving Attention with Residual Convolutions
International Conference on Machine Learning (ICML), 2021
Yujing Wang
Yaming Yang
Jiangang Bai
Mingliang Zhang
Jing Bai
Jiahao Yu
Ce Zhang
Gao Huang
Yunhai Tong
ViT
219
41
0
20 Feb 2021
Transformer Language Models with LSTM-based Cross-utterance Information Representation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
G. Sun
Chuxu Zhang
P. Woodland
234
35
0
12 Feb 2021
Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices
Design Automation Conference (DAC), 2021
Yuhong Song
Weiwen Jiang
Bingbing Li
Panjie Qi
Qingfeng Zhuge
E. Sha
Sakyasingha Dasgupta
Yiyu Shi
Caiwen Ding
159
21
0
12 Feb 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Journal of machine learning research (JMLR), 2021
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
577
3,178
0
11 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Ofir Press
Noah A. Smith
M. Lewis
667
96
0
31 Dec 2020
Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Songyang Zhang
Houwen Peng
Jianlong Fu
Yijuan Lu
Jiebo Luo
198
64
0
04 Dec 2020
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
Micro (MICRO), 2020
Thierry Tambe
Coleman Hooper
Lillian Pentecost
Tianyu Jia
En-Yu Yang
...
Victor Sanh
P. Whatmough
Alexander M. Rush
David Brooks
Gu-Yeon Wei
437
149
0
28 Nov 2020
General Multi-label Image Classification with Transformers
Computer Vision and Pattern Recognition (CVPR), 2020
Jack Lanchantin
Tianlu Wang
Vicente Ordonez
Yanjun Qi
ViT
223
326
0
27 Nov 2020
Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction
Ethan M. Rudd
Ahmed Abdallah
140
7
0
05 Nov 2020
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
Zhaoshuo Li
Xingtong Liu
Nathan G. Drenkow
Andy S Ding
Francis X. Creighton
Russell H. Taylor
Mathias Unberath
MDE
ViT
596
353
0
05 Nov 2020
Long Document Ranking with Query-Directed Sparse Transformer
Findings (Findings), 2020
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
182
27
0
23 Oct 2020
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
226
24
0
14 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
230
24
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
250
49
0
11 Oct 2020
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications
Matthew Khoury
Rumen Dangovski
L. Ou
Preslav Nakov
Yichen Shen
L. Jing
108
0
0
06 Oct 2020
Transformers for Modeling Physical Systems
Neural Networks (NN), 2020
N. Geneva
N. Zabaras
AI4CE
614
194
0
04 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Patrick Xia
Shijie Wu
Benjamin Van Durme
227
53
0
02 Oct 2020
Grounded Compositional Outputs for Adaptive Language Modeling
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Nikolaos Pappas
Phoebe Mulcaire
Noah A. Smith
KELM
238
8
0
24 Sep 2020
Current Limitations of Language Models: What You Need is Retrieval
Aran Komatsuzaki
LRM
129
3
0
15 Sep 2020
Efficient Transformers: A Survey
ACM Computing Surveys (ACM CSUR), 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
876
1,370
0
14 Sep 2020
Previous
1
2
3
4
5
Next
Page 3 of 5