v1v2 (latest)

Adaptive Attention Span in Transformers

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

19 May 2019

Papers citing "Adaptive Attention Span in Transformers"

50 / 201 papers shown

Learning to Focus: Focal Attention for Selective and Scalable Transformers

Dhananjay Ram

Wei Xia

Stefano Soatto

293

10 Nov 2025

Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning

Daniel De Dios Allegue

J. He

F. Oliehoek

OffRL

288

10 Nov 2025

BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization

190

31 Oct 2025

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

336

15 Oct 2025

Language Model Planning from an Information Theoretic Perspective

Muhammed Ustaomeroglu

143

28 Sep 2025

HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics

Dong Liu

Yanxuan Yu

VLM

137

17 Sep 2025

MoGU V2: Toward a Higher Pareto Frontier Between Model Usability and Security

121

08 Sep 2025

Lost in Transmission: When and Why LLMs Fail to Reason Globally

690

13 May 2025

AttentionDrop: A Novel Regularization Method for Transformer Models

Mirza Samad Ahmed Baig

Syeda Anshrah Gillani

Abdul Akbar Khan

Shahid Munir Shah

Muhammad Omer Khan

256

16 Apr 2025

^2

M: Mutual Information Scaling Law for Long-Context Language Modeling

328

06 Mar 2025

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2025

567

04 Mar 2025

Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment

146

23 Feb 2025

Enhancing RWKV-based Language Models for Long-Sequence Text Generation

Xinghan Pan

332

21 Feb 2025

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Gabriel Lindenmaier

Sean Papay

Sebastian Padó

366

02 Feb 2025

Advances in Transformers for Robotic Applications: A Review

Nikunj Sanghai

Nik Bear Brown

AI4CE

384

13 Dec 2024

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model

331

02 Dec 2024

On Fine-Grained I/O Complexity of Attention Backward Passes

Jiahao Zhang

259

12 Oct 2024

Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU

Zhenyu Ning

Jieru Zhao

Qihao Jin

Wenchao Ding

Minyi Guo

11 Sep 2024

Pre-Trained Language Models for Keyphrase Prediction: A ReviewICT express (IE), 2024

Muhammad Umair

Tangina Sultana

Young-Koo Lee

319

02 Sep 2024

HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationEuropean Conference on Computer Vision (ECCV), 2024

233

12 Aug 2024

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

Zilong Zheng

236

24 Jun 2024

"Forgetting" in Machine Learning and Beyond: A Survey

Alyssa Shuang Sha

Bernardo Pereira Nunes

Armin Haller

MU KELM

297

31 May 2024

Transformers Can Do Arithmetic with the Right Embeddings

...

209

27 May 2024

Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and Gated Residual Connections

Sahil Rajesh Dhayalkar

134

22 May 2024

Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat SpecificityAAAI Conference on Artificial Intelligence (AAAI), 2024

Zhufeng Li

S. S. Cranganore

Nicholas D. Youngblut

Niki Kilbertus

330

09 May 2024

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang

Jing Liu

Peng Wu

256

12 Apr 2024

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Zihao Wang

Shaoduo Gan

251

07 Apr 2024

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

265

19 Mar 2024

Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative InferenceConference on Machine Learning and Systems (MLSys), 2024

352

112

14 Mar 2024

xT: Nested Tokenization for Larger Context in Large Images

240

04 Mar 2024

Exploiting Adaptive Contextual Masking for Aspect-Based Sentiment Analysis

367

21 Feb 2024

Model Compression and Efficient Inference for Large Language Models: A Survey

301

15 Feb 2024

Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit

231

05 Dec 2023

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

...

385

102

21 Nov 2023

Memory-efficient Stochastic methods for Memory-based Transformers

Vishwajit Kumar Vishnu

C. Sekhar

119

14 Nov 2023

Large Human Language Models: A Need and the Challenges

Nikita Soni

H. Andrew Schwartz

João Sedoc

Niranjan Balasubramanian

ALM AI4CE

277

09 Nov 2023

Ultra-Long Sequence Distributed Transformer

Mayanka Chandra Shekar

346

04 Nov 2023

The Expressibility of Polynomial based Attention Scheme

Zhao Song

Guangyi Xu

Junze Yin

329

30 Oct 2023

TRAMS: Training-free Memory Selection for Long-range Language ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Haofei Yu

Cunxiang Wang

Yue Zhang

Wei Bi

RALM

304

24 Oct 2023

A Framework for Inference Inspired by Human Memory MechanismsInternational Conference on Learning Representations (ICLR), 2023

204

01 Oct 2023

Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023

Albert Mohwald

258

28 Sep 2023

Reasonable Anomaly Detection in Long Sequences

Yalong Jiang

Changkang Li

AI4TS

235

06 Sep 2023

Fast Training of NMT Model with Data Sorting

Daniela N. Rim

Kimera Richard

Heeyoul Choi

109

16 Aug 2023

Bayesian Flow Networks

Alex Graves

R. Srivastava

Timothy James Atkinson

Faustino J. Gomez

BDL

659

14 Aug 2023

RecycleGPT: An Autoregressive Language Model with Recyclable Module

297

07 Aug 2023

Learning to Group Auxiliary Datasets for MoleculeNeural Information Processing Systems (NeurIPS), 2023

Ting Huang

Ziniu Hu

Rex Ying

262

08 Jul 2023

Sparse Modular Activation for Efficient Sequence ModelingNeural Information Processing Systems (NeurIPS), 2023

Liliang Ren

Yang Liu

Shuohang Wang

Yichong Xu

Chenguang Zhu

Chengxiang Zhai

282

19 Jun 2023

FSUIE: A Novel Fuzzy Span Mechanism for Universal Information ExtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Bo Du

194

19 Jun 2023

Improving Long Context Document-Level Machine Translation

Christian Herold

Hermann Ney

175

08 Jun 2023

Recasting Self-Attention with Holographic Reduced RepresentationsInternational Conference on Machine Learning (ICML), 2023

Mohammad Mahmudul Alam

188

31 May 2023