Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.07799
Cited By
v1
v2 (latest)
Adaptive Attention Span in Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adaptive Attention Span in Transformers"
50 / 201 papers shown
Learning to Focus: Focal Attention for Selective and Scalable Transformers
Dhananjay Ram
Wei Xia
Stefano Soatto
293
0
0
10 Nov 2025
Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning
Daniel De Dios Allegue
J. He
F. Oliehoek
OffRL
288
0
0
10 Nov 2025
BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization
D. Hagos
Legand L. Burge
Anietie Andy
Anis Yazidi
Vladimir Vlassov
190
0
0
31 Oct 2025
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Nikhil Bhendawade
K. Nishu
Arnav Kundu
Chris Bartels
Minsik Cho
Irina Belousova
LRM
336
0
0
15 Oct 2025
Language Model Planning from an Information Theoretic Perspective
Muhammed Ustaomeroglu
Baris Askin
Gauri Joshi
Carlee Joe-Wong
Guannan Qu
143
0
0
28 Sep 2025
HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics
Dong Liu
Yanxuan Yu
VLM
137
3
0
17 Sep 2025
MoGU V2: Toward a Higher Pareto Frontier Between Model Usability and Security
Yanrui Du
Fenglei Fan
Sendong Zhao
Jiawei Cao
Ting Liu
Bing Qin
121
0
0
08 Sep 2025
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
690
2
0
13 May 2025
AttentionDrop: A Novel Regularization Method for Transformer Models
Mirza Samad Ahmed Baig
Syeda Anshrah Gillani
Abdul Akbar Khan
Shahid Munir Shah
Muhammad Omer Khan
256
0
0
16 Apr 2025
L
2
^2
2
M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
328
5
0
06 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2025
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
567
48
0
04 Mar 2025
Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment
Jianzhou Chen
Xiumei Wang
Jinyang Sun
Xi Chen
Heyu Chu
Guo Song
Yuji Luo
Xingping Zhou
Rong Gu
146
0
0
23 Feb 2025
Enhancing RWKV-based Language Models for Long-Sequence Text Generation
Xinghan Pan
332
1
0
21 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
366
0
0
02 Feb 2025
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
384
6
0
13 Dec 2024
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
Qianhan Feng
Wenshuo Li
Tong Lin
Xinghao Chen
VLM
331
8
0
02 Dec 2024
On Fine-Grained I/O Complexity of Attention Backward Passes
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Yufa Zhou
Jiahao Zhang
259
19
0
12 Oct 2024
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU
Zhenyu Ning
Jieru Zhao
Qihao Jin
Wenchao Ding
Minyi Guo
76
17
0
11 Sep 2024
Pre-Trained Language Models for Keyphrase Prediction: A Review
ICT express (IE), 2024
Muhammad Umair
Tangina Sultana
Young-Koo Lee
319
8
0
02 Sep 2024
HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
European Conference on Computer Vision (ECCV), 2024
Sakib Reza
Yuexi Zhang
Mohsen Moghaddam
Mario Sznaier
233
5
0
12 Aug 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
236
52
0
24 Jun 2024
"Forgetting" in Machine Learning and Beyond: A Survey
Alyssa Shuang Sha
Bernardo Pereira Nunes
Armin Haller
MU
KELM
297
2
0
31 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
209
68
0
27 May 2024
Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and Gated Residual Connections
Sahil Rajesh Dhayalkar
134
1
0
22 May 2024
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
AAAI Conference on Artificial Intelligence (AAAI), 2024
Zhufeng Li
S. S. Cranganore
Nicholas D. Youngblut
Niki Kilbertus
330
5
0
09 May 2024
Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
Zhiwei Yang
Jing Liu
Peng Wu
256
70
0
12 Apr 2024
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Zihao Wang
Shaoduo Gan
251
14
0
07 Apr 2024
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang
Rongyao Fang
Aiping Zhang
Guanglu Song
Si Liu
Yu Liu
Hongsheng Li
DiffM
265
51
0
19 Mar 2024
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Conference on Machine Learning and Systems (MLSys), 2024
Muhammad Adnan
Akhil Arunkumar
Gaurav Jain
Shiyang Chen
Ilya Soloveychik
Purushotham Kamath
352
112
0
14 Mar 2024
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta
Shufan Li
Tyler Lixuan Zhu
Jitendra Malik
Trevor Darrell
K. Mangalam
ViT
240
8
0
04 Mar 2024
Exploiting Adaptive Contextual Masking for Aspect-Based Sentiment Analysis
S M Rafiuddin
Mohammed Rakib
Sadia Kamal
A. Bagavathi
367
2
0
21 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
301
90
0
15 Feb 2024
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit
Fanfei Meng
Lele Zhang
Yu Chen
Yuxin Wang
231
10
0
05 Dec 2023
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
385
102
0
21 Nov 2023
Memory-efficient Stochastic methods for Memory-based Transformers
Vishwajit Kumar Vishnu
C. Sekhar
119
0
0
14 Nov 2023
Large Human Language Models: A Need and the Challenges
Nikita Soni
H. Andrew Schwartz
João Sedoc
Niranjan Balasubramanian
ALM
AI4CE
277
16
0
09 Nov 2023
Ultra-Long Sequence Distributed Transformer
Xiao Wang
Isaac Lyngaas
A. Tsaris
Peng Chen
Sajal Dash
Mayanka Chandra Shekar
Tao Luo
Hong-Jun Yoon
Mohamed Wahib
Ravi Tandon
346
5
0
04 Nov 2023
The Expressibility of Polynomial based Attention Scheme
Zhao Song
Guangyi Xu
Junze Yin
329
6
0
30 Oct 2023
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haofei Yu
Cunxiang Wang
Yue Zhang
Wei Bi
RALM
304
5
0
24 Oct 2023
A Framework for Inference Inspired by Human Memory Mechanisms
International Conference on Learning Representations (ICLR), 2023
Xiangyu Zeng
Jie Lin
Piao Hu
Ruizheng Huang
Zhicheng Zhang
204
4
0
01 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
International Conference on Learning Representations (ICLR), 2023
Albert Mohwald
258
27
0
28 Sep 2023
Reasonable Anomaly Detection in Long Sequences
Yalong Jiang
Changkang Li
AI4TS
235
0
0
06 Sep 2023
Fast Training of NMT Model with Data Sorting
Daniela N. Rim
Kimera Richard
Heeyoul Choi
109
0
0
16 Aug 2023
Bayesian Flow Networks
Alex Graves
R. Srivastava
Timothy James Atkinson
Faustino J. Gomez
BDL
659
64
0
14 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
297
3
0
07 Aug 2023
Learning to Group Auxiliary Datasets for Molecule
Neural Information Processing Systems (NeurIPS), 2023
Ting Huang
Ziniu Hu
Rex Ying
262
1
0
08 Jul 2023
Sparse Modular Activation for Efficient Sequence Modeling
Neural Information Processing Systems (NeurIPS), 2023
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
282
17
0
19 Jun 2023
FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Tianshuo Peng
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
194
15
0
19 Jun 2023
Improving Long Context Document-Level Machine Translation
Christian Herold
Hermann Ney
175
14
0
08 Jun 2023
Recasting Self-Attention with Holographic Reduced Representations
International Conference on Machine Learning (ICML), 2023
Mohammad Mahmudul Alam
Edward Raff
Stella Biderman
Tim Oates
James Holt
188
15
0
31 May 2023
1
2
3
4
5
Next
Page 1 of 5