Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1904.10509
Cited By

Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

23 April 2019

ArXiv (abs)PDF HTML

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown

Towards Robust Knowledge Tracing Models via k-Sparse Attention

Towards Robust Knowledge Tracing Models via k-Sparse Attention

Zitao Liu

Xiangyu Zhao

207

44

0

24 Jul 2024

Evaluating Long Range Dependency Handling in Code Generation LLMs

Evaluating Long Range Dependency Handling in Code Generation LLMs

Yannick Assogba

254

1

0

23 Jul 2024

Mamba meets crack segmentation

Mamba meets crack segmentation

253

5

0

22 Jul 2024

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
Sequences Training

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

266

5

0

22 Jul 2024

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

243

58

0

22 Jul 2024

Recent Advances in Generative AI and Large Language Models: Current
Status, Challenges, and Perspectives

Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives

Danda B. Rawat

493

84

0

20 Jul 2024

DeepGate3: Towards Scalable Circuit Representation Learning

DeepGate3: Towards Scalable Circuit Representation Learning

310

27

0

15 Jul 2024

Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception

Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception

Phillip Mueller

396

5

0

15 Jul 2024

MaskMoE: Boosting Token-Level Learning via Routing Mask in
Mixture-of-Experts

MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

...

Hui Chen

380

7

0

13 Jul 2024

Beyond KV Caching: Shared Attention for Efficient LLMs

Beyond KV Caching: Shared Attention for Efficient LLMs

Danilo Vasconcellos Vargas

214

9

0

13 Jul 2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and
Low-precision

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Ganesh Bikshandi

517

324

0

11 Jul 2024

HDT: Hierarchical Document Transformer

HDT: Hierarchical Document Transformer

233

3

0

11 Jul 2024

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

456

7

0

11 Jul 2024

Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for
Text-to-Video Generation Task

Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task

Jinchao Zhang

192

1

0

09 Jul 2024

How Effective are State Space Models for Machine Translation?

How Effective are State Space Models for Machine Translation?

Marcos Vinícius Treviso

André F. T. Martins

204

3

0

07 Jul 2024

The Mysterious Case of Neuron 1512: Injectable Realignment Architectures
Reveal Internal Characteristics of Meta's Llama 2 Model

The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model

218

0

0

04 Jul 2024

Let the Code LLM Edit Itself When You Edit the Code

Let the Code LLM Edit Itself When You Edit the Code

Jingjing Xu

Zongzhang Zhang

276

3

0

03 Jul 2024

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
Dynamic Sparse Attention

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Chengruidong Zhang

...

348

229

0

02 Jul 2024

Neurocache: Efficient Vector Retrieval for Long-range Language Modeling

Neurocache: Efficient Vector Retrieval for Long-range Language Modeling

210

0

0

02 Jul 2024

Efficient Sparse Attention needs Adaptive Token Release

Efficient Sparse Attention needs Adaptive Token Release

Zihao Li

227

7

0

02 Jul 2024

LPViT: Low-Power Semi-structured Pruning for Vision Transformers

LPViT: Low-Power Semi-structured Pruning for Vision Transformers

Zhe Wang

Min Wu

Xiaoli Li

Weisi Lin

664

18

0

02 Jul 2024

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
Enhancing Performance and Reducing Inference Costs

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

Enshu Liu

Matthew B. Blaschko

Huazhong Yang

Yu Wang

242

22

0

01 Jul 2024

InfiniGen: Efficient Generative Inference of Large Language Models with
Dynamic KV Cache Management

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

186

182

0

28 Jun 2024

Fibottention: Inceptive Visual Representation Learning with Diverse
Attention Across Heads

Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads

Ali Khaleghi Rahimian

Manish Kumar Govind

Dominick Reilly

Christian Kummerle

237

1

0

27 Jun 2024

From Efficient Multimodal Models to World Models: A Survey

From Efficient Multimodal Models to World Models: A Survey

Haoran Wang

Yan Wang

306

14

0

27 Jun 2024

Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data
Imputation

Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation

Colin Samplawski

Benjamin M. Marlin

169

4

0

27 Jun 2024

Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

Mao Ye

Jianwei Zhang

336

0

0

26 Jun 2024

Pamba: Enhancing Global Interaction in Point Clouds via State Space Model

Pamba: Enhancing Global Interaction in Point Clouds via State Space Model

178

0

0

25 Jun 2024

Long Context Transfer from Language to Vision

Long Context Transfer from Language to Vision

Bo Li

Guangtao Zeng

Chunyuan Li

Ziwei Liu

318

349

0

24 Jun 2024

Sparser is Faster and Less is More: Efficient Sparse Attention for
Long-Range Transformers

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

Zilong Zheng

233

51

0

24 Jun 2024

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing
Backpropagation

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

Yuchen Yang

Jun Xu

237

3

0

24 Jun 2024

SimSMoE: Solving Representational Collapse via Similarity Measure

SimSMoE: Solving Representational Collapse via Similarity Measure

281

3

0

22 Jun 2024

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

...

363

41

0

21 Jun 2024

In Tree Structure Should Sentence Be Generated

In Tree Structure Should Sentence Be Generated

113

0

0

20 Jun 2024

A Primal-Dual Framework for Transformers and Neural Networks

A Primal-Dual Framework for Transformers and Neural Networks

Tan M. Nguyen

Tam Nguyen

Andrea L. Bertozzi

Richard G. Baraniuk

Stanley J. Osher

198

16

0

19 Jun 2024

In-Context Former: Lightning-fast Compressing Context for Large Language
Model

In-Context Former: Lightning-fast Compressing Context for Large Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Xiangfeng Wang

Enhong Chen

211

9

0

19 Jun 2024

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

343

27

0

17 Jun 2024

Taking a Deep Breath: Enhancing Language Modeling of Large Language
Models with Sentinel Tokens

Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Heming Xia

Weikang Wang

Tianyu Liu

Zhifang Sui

150

2

0

16 Jun 2024

Hierarchical Compression of Text-Rich Graphs via Large Language Models

Hierarchical Compression of Text-Rich Graphs via Large Language Models

Christos Faloutsos

George Karypis

Yizhou Sun

319

3

0

13 Jun 2024

Short-Long Convolutions Help Hardware-Efficient Linear Attention to
Focus on Long Sequences

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Zicheng Liu

Stan Z. Li

268

10

0

12 Jun 2024

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

Jingyao Li

Zhenguo Li

202

4

0

11 Jun 2024

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Yadong Lu

Weizhu Chen

388

115

0

11 Jun 2024

Recurrent Context Compression: Efficiently Expanding the Context Window
of LLM

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

223

3

0

10 Jun 2024

What Can We Learn from State Space Models for Machine Learning on
Graphs?

What Can We Learn from State Space Models for Machine Learning on Graphs?

Pan Li

248

11

0

09 Jun 2024

SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context
Large Language Models

SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models

Hengyu Zhang

213

6

0

09 Jun 2024

LoCoCo: Dropping In Convolutions for Long Context Compression

LoCoCo: Dropping In Convolutions for Long Context CompressionInternational Conference on Machine Learning (ICML), 2024

Zhangyang Wang

192

15

0

08 Jun 2024

FILS: Self-Supervised Video Feature Prediction In Semantic Language
Space

FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

Andrew Gilbert

333

2

0

05 Jun 2024

Training of Physical Neural Networks

Training of Physical Neural Networks

Logan G. Wright

Peter L. McMahon

...

323

69

0

05 Jun 2024

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal
Learning

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Alex Jinpeng Wang

Mike Zheng Shou

284

10

0

04 Jun 2024

Loki: Low-Rank Keys for Efficient Sparse Attention

Loki: Low-Rank Keys for Efficient Sparse Attention

Prajwal Singhania

Siddharth Singh

263

47

0

04 Jun 2024

1 2 3...6 7 8...24 25 26

Page 7 of 26

Pageof 26