Generating Long Sequences with Sparse Transformers

23 April 2019

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,282 papers shown

Efficient Pretraining Length Scaling

1.1K

21 Apr 2025

CacheFormer: High Attention-Based Segment CachingApplied Informatics (AI), 2025

Sushant Singh

A. Mahmood

221

18 Apr 2025

AttentionDrop: A Novel Regularization Method for Transformer Models

Mirza Samad Ahmed Baig

Syeda Anshrah Gillani

Abdul Akbar Khan

Shahid Munir Shah

Muhammad Omer Khan

244

16 Apr 2025

Analysis of Attention in Video Diffusion Transformers

278

14 Apr 2025

Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD

Byunggun Kim

Younghun Kwon

MedIm

12 Apr 2025

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives

564

01 Apr 2025

SQuat: Subspace-orthogonal KV Cache Quantization

381

31 Mar 2025

TRA: Better Length Generalisation with Threshold Relative Attention

547

29 Mar 2025

DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers

Mingzhu Shen Yibo Fan

Shengen Yan

Guohao Dai

Yu Wang

303

28 Mar 2025

Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap

Tong Nie

Jian Sun

Wei Ma

564

27 Mar 2025

DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding

297

20 Mar 2025

XAttention: Block Sparse Attention with Antidiagonal Scoring

336

20 Mar 2025

Intra-neuronal attention within language models Relationships between activation and semantics

Corbet Alois Georgeon

Michael Veillet-Guillem

MILM

256

17 Mar 2025

CAKE: Cascading and Adaptive KV Cache Eviction with Layer PreferencesInternational Conference on Learning Representations (ICLR), 2025

281

16 Mar 2025

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

332

13 Mar 2025

Learning to Inference Adaptively for Multimodal Large Language Models

431

13 Mar 2025

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse AttentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

332

11 Mar 2025

TokenButler: Token Importance is Predictable

Yash Akhauri

Ahmed F. AbouElhamayed

Yifei Gao

Chi-chih Chang

Nilesh Jain

Mohamed S. Abdelfattah

196

10 Mar 2025

eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference

854

10 Mar 2025

Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts

286

09 Mar 2025

Spectral Informed Mamba for Robust Point Cloud ProcessingComputer Vision and Pattern Recognition (CVPR), 2025

Milad Cheraghalikhani

325

06 Mar 2025

SED2AM: Solving Multi-Trip Time-Dependent Vehicle Routing Problem using Deep Reinforcement LearningACM Transactions on Knowledge Discovery from Data (TKDD), 2025

407

06 Mar 2025

^2

M: Mutual Information Scaling Law for Long-Context Language Modeling

311

06 Mar 2025

Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer

361

04 Mar 2025

Boltzmann Attention Sampling for Image Analysis with Small ObjectsComputer Vision and Pattern Recognition (CVPR), 2025

444

04 Mar 2025

Attention Condensation via Sparsity Induced Regularized Training

1.0K

03 Mar 2025

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in StructuresComputer Vision and Pattern Recognition (CVPR), 2025

476

03 Mar 2025

Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak LearnersInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025

215

03 Mar 2025

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

372

28 Feb 2025

Reasoning is Periodicity? Improving Large Language Models Through Effective Periodicity Modeling

...

562

28 Feb 2025

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence InferenceInternational Conference on Learning Representations (ICLR), 2025

303

28 Feb 2025

Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation

K. A. Kinfu

René Vidal

ViT

278

28 Feb 2025

Beyond Worst-Case Dimensionality Reduction for Sparse VectorsInternational Conference on Learning Representations (ICLR), 2025

Sandeep Silwal

David P. Woodruff

Qiuyi Zhang

260

27 Feb 2025

Sliding Window Attention Training for Efficient Large Language Models

472

26 Feb 2025

296

25 Feb 2025

The Role of Sparsity for Length Generalization in Transformers

237

24 Feb 2025

Protein Large Language Models: A Comprehensive Survey

...

426

21 Feb 2025

RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse AttentionPattern Recognition (Pattern Recogn.), 2024

354

21 Feb 2025

Compression Barriers for Autoregressive Transformers

Themistoklis Haris

Krzysztof Onak

170

21 Feb 2025

Neural Attention Search

Difan Deng

Marius Lindauer

543

18 Feb 2025

Continuous Diffusion Model for Language Modeling

Jaehyeong Jo

Sung Ju Hwang

213

17 Feb 2025

Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization

277

14 Feb 2025

A Survey on Mamba Architecture for Vision Applications

432

11 Feb 2025

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

618

10 Feb 2025

Context-Aware Hierarchical Merging for Long Document SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Litu Ou

Mirella Lapata

MoMe

1.1K

03 Feb 2025

Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing TechniquesIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2025

Nathaniel Tomczak

Sanmukh Kuppannagari

608

31 Jan 2025

ZETA: Leveraging Z-order Curves for Efficient Top-k AttentionInternational Conference on Learning Representations (ICLR), 2025

570

24 Jan 2025

Parallel Sequence Modeling via Generalized Spatial Propagation NetworkComputer Vision and Pattern Recognition (CVPR), 2025

837

21 Jan 2025

Episodic Memories Generation and Evaluation Benchmark for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025

221

21 Jan 2025

ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language ModelsInternational Conference on Computational Linguistics (COLING), 2024

468

20 Jan 2025