Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1904.10509
Cited By

Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

23 April 2019

ArXiv (abs)PDF HTML

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,282 papers shown

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

124

0

0

03 Dec 2025

Nexus: Higher-Order Attention Mechanisms in Transformers

Nexus: Higher-Order Attention Mechanisms in Transformers

311

0

0

03 Dec 2025

HTTM: Head-wise Temporal Token Merging for Faster VGGT

HTTM: Head-wise Temporal Token Merging for Faster VGGT

Cecilia De La Parra

167

0

0

26 Nov 2025

Length-MAX Tokenizer for Language Models

Length-MAX Tokenizer for Language Models

199

0

0

25 Nov 2025

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

319

0

0

25 Nov 2025

Re-Key-Free, Risky-Free: Adaptable Model Usage Control

Re-Key-Free, Risky-Free: Adaptable Model Usage Control

166

0

0

24 Nov 2025

Rethinking Vision Transformer Depth via Structural Reparameterization

Rethinking Vision Transformer Depth via Structural Reparameterization

Vipin Chaudhary

112

0

0

24 Nov 2025

DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams

DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams

Ginés Carreto Picón

Alexandros Iosifidis

196

0

0

21 Nov 2025

Joint Semantic-Channel Coding and Modulation for Token Communications

Joint Semantic-Channel Coding and Modulation for Token Communications

71

0

0

19 Nov 2025

Attention Via Convolutional Nearest Neighbors

Attention Via Convolutional Nearest Neighbors

Jeová Farias Sales Rocha Neto

197

1

0

18 Nov 2025

QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention

QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention

162

0

0

17 Nov 2025

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space

189

0

0

15 Nov 2025

KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference

KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference

349

1

0

14 Nov 2025

Galactification: painting galaxies onto dark matter only simulations using a transformer-based model

Galactification: painting galaxies onto dark matter only simulations using a transformer-based model

Christopher C. Lovell

Benjamin Dan Wandelt

108

0

0

11 Nov 2025

Learning to Focus: Focal Attention for Selective and Scalable Transformers

Learning to Focus: Focal Attention for Selective and Scalable Transformers

288

0

0

10 Nov 2025

CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models

CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models

Peyman Hosseini

Umberto Michieli

136

0

0

09 Nov 2025

How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy

How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy

105

0

0

08 Nov 2025

Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving

Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving

217

1

0

08 Nov 2025

BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models

BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models

Chandra Vamsi Krishna Alla

Harish Naidu Gaddam

285

0

0

07 Nov 2025

Attention and Compression is all you need for Controllably Efficient Language Models

Attention and Compression is all you need for Controllably Efficient Language Models

Rajesh Ranganath

467

0

0

07 Nov 2025

Neural Beamforming with Doppler-Aware Sparse Attention for High Mobility Environments

Neural Beamforming with Doppler-Aware Sparse Attention for High Mobility Environments

Cemil Vahapoglu

Timothy J. O'Shea

141

0

0

05 Nov 2025

AILA--First Experiments with Localist Language Models

AILA--First Experiments with Localist Language Models

Joachim Diederich

52

1

0

05 Nov 2025

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

Evgenii Iuliugin

Magnus Vesterlund

Christian Haggstrom

...

348

0

0

05 Nov 2025

SALS: Sparse Attention in Latent Space for KV cache Compression

SALS: Sparse Attention in Latent Space for KV cache Compression

83

0

0

28 Oct 2025

Large language model-based task planning for service robots: A review

Large language model-based task planning for service robots: A review

204

0

0

27 Oct 2025

Transformers from Compressed Representations

Transformers from Compressed Representations

Juan Carlos León Alcázar

Mohammad Saatialsoruji

Alejandro Pardo

136

0

0

26 Oct 2025

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

124

1

0

25 Oct 2025

Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity

Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity

153

0

0

23 Oct 2025

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction

Philip N. Garner

258

0

0

23 Oct 2025

GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data

GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data

125

0

0

21 Oct 2025

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

122

1

0

20 Oct 2025

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

153

0

0

19 Oct 2025

FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution

FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution

Syed Rifat Raiyan

Md Farhan Ishmam

Abdullah Al Imran

Mohammad Ali Moni

155

0

0

18 Oct 2025

Stability of Transformers under Layer Normalization

Stability of Transformers under Layer Normalization

Benjamin J. Zhang

Markos A. Katsoulakis

109

1

0

10 Oct 2025

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning

Hossein Entezari Zarch

Murali Annavarm

85

0

0

10 Oct 2025

Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models

Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models

Muntaha Nujat Khan

142

0

0

09 Oct 2025

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Artificial Hippocampus Networks for Efficient Long-Context Modeling

145

2

0

08 Oct 2025

Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors

Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors

Vasileios Titopoulos

K. Alexandridis

G. Dimitrakopoulos

111

0

0

08 Oct 2025

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

Alexander Fichtl

131

0

0

06 Oct 2025

Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving

Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving

...

115

1

0

06 Oct 2025

Emergent Coordination in Multi-Agent Language Models

Emergent Coordination in Multi-Agent Language Models

Christoph Riedl

107

1

0

05 Oct 2025

Towards Sampling Data Structures for Tensor Products in Turnstile Streams

Towards Sampling Data Structures for Tensor Products in Turnstile Streams

143

0

0

04 Oct 2025

Accelerating Attention with Basis Decomposition

Accelerating Attention with Basis Decomposition

154

0

0

02 Oct 2025

Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation

Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation

326

0

0

02 Oct 2025

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

...

181

0

0

01 Oct 2025

TASP: Topology-aware Sequence Parallelism

TASP: Topology-aware Sequence Parallelism

156

0

0

30 Sep 2025

HilbertA: Hilbert Attention for Image Generation with Diffusion Models

HilbertA: Hilbert Attention for Image Generation with Diffusion Models

93

0

0

30 Sep 2025

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

...

163

5

0

29 Sep 2025

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers

144

0

0

29 Sep 2025

Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

LLMAG KELM RALM OffRL CLL LRM

134

2

0

27 Sep 2025

1 2 3 4...24 25 26