ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.10509
  4. Cited By
Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
ArXiv (abs)PDFHTML

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown
Towards Robust Knowledge Tracing Models via k-Sparse Attention
Towards Robust Knowledge Tracing Models via k-Sparse Attention
Shuyan Huang
Zitao Liu
Xiangyu Zhao
Weiqing Luo
Jian Weng
AI4Ed
207
44
0
24 Jul 2024
Evaluating Long Range Dependency Handling in Code Generation LLMs
Evaluating Long Range Dependency Handling in Code Generation LLMs
Yannick Assogba
Donghao Ren
254
1
0
23 Jul 2024
Mamba meets crack segmentation
Mamba meets crack segmentation
Zhili He
Yuhao Wang
Mamba
253
5
0
22 Jul 2024
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
  Sequences Training
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Cheng Luo
Jiawei Zhao
Zhuoming Chen
Beidi Chen
A. Anandkumar
266
5
0
22 Jul 2024
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Hanlin Tang
Yang Lin
Aiyue Chen
Qingsen Han
Shikuan Hong
Jing Lin
Gongyi Wang
MQ
243
58
0
22 Jul 2024
Recent Advances in Generative AI and Large Language Models: Current
  Status, Challenges, and Perspectives
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives
D. Hagos
Rick Battle
Danda B. Rawat
LM&MAOffRL
493
84
0
20 Jul 2024
DeepGate3: Towards Scalable Circuit Representation Learning
DeepGate3: Towards Scalable Circuit Representation Learning
Zhengyuan Shi
Ziyang Zheng
Sadaf Khan
Jianyuan Zhong
Min Li
Qiang Xu
GNNAI4CE
310
27
0
15 Jul 2024
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Phillip Mueller
Lars Mikelsons
AI4CE
396
5
0
15 Jul 2024
MaskMoE: Boosting Token-Level Learning via Routing Mask in
  Mixture-of-Experts
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts
Zhenpeng Su
Zijia Lin
Xue Bai
Xing Wu
Yizhe Xiong
...
Guangyuan Ma
Hui Chen
Guiguang Ding
Wei Zhou
Songlin Hu
MoE
380
7
0
13 Jul 2024
Beyond KV Caching: Shared Attention for Efficient LLMs
Beyond KV Caching: Shared Attention for Efficient LLMs
Bingli Liao
Danilo Vasconcellos Vargas
214
9
0
13 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
517
324
0
11 Jul 2024
HDT: Hierarchical Document Transformer
HDT: Hierarchical Document Transformer
Haoyu He
Markus Flicke
Jan Buchmann
Iryna Gurevych
Andreas Geiger
233
3
0
11 Jul 2024
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
Jerry Huang
456
7
0
11 Jul 2024
Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for
  Text-to-Video Generation Task
Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task
Yiran Yang
Jinchao Zhang
Ying Deng
Jie Zhou
DiffM
192
1
0
09 Jul 2024
How Effective are State Space Models for Machine Translation?
How Effective are State Space Models for Machine Translation?
Hugo Pitorro
Mustafa Hajij
Marcos Vinícius Treviso
André F. T. Martins
Mamba
204
3
0
07 Jul 2024
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures
  Reveal Internal Characteristics of Meta's Llama 2 Model
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model
Brenden Smith
Dallin Baker
Clayton Chase
Myles Barney
Kaden Parker
Makenna Allred
Peter Hu
Alex Evans
Nancy Fulda
218
0
0
04 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Zongzhang Zhang
Di He
KELM
276
3
0
03 Jul 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
348
229
0
02 Jul 2024
Neurocache: Efficient Vector Retrieval for Long-range Language Modeling
Neurocache: Efficient Vector Retrieval for Long-range Language Modeling
Ali Safaya
Deniz Yuret
210
0
0
02 Jul 2024
Efficient Sparse Attention needs Adaptive Token Release
Efficient Sparse Attention needs Adaptive Token Release
Chaoran Zhang
Lixin Zou
Dan Luo
Min Tang
Xiangyang Luo
Zihao Li
Chenliang Li
227
7
0
02 Jul 2024
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu
Zhe Wang
Chunyun Chen
Xue Geng
Jie Lin
Xulei Yang
Ruibing Jin
Min Wu
Xiaoli Li
Weisi Lin
ViTVLM
664
18
0
02 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
242
22
0
01 Jul 2024
InfiniGen: Efficient Generative Inference of Large Language Models with
  Dynamic KV Cache Management
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Wonbeom Lee
Jungi Lee
Junghwan Seo
Jaewoong Sim
RALM
186
182
0
28 Jun 2024
Fibottention: Inceptive Visual Representation Learning with Diverse
  Attention Across Heads
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
237
1
0
27 Jun 2024
From Efficient Multimodal Models to World Models: A Survey
From Efficient Multimodal Models to World Models: A Survey
Xinji Mai
Zeng Tao
Junxiong Lin
Haoran Wang
Yang Chang
Yanlan Kang
Yan Wang
Wenqiang Zhang
306
14
0
27 Jun 2024
Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data
  Imputation
Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
Hui Wei
Maxwell A. Xu
Colin Samplawski
James M. Rehg
Santosh Kumar
Benjamin M. Marlin
169
4
0
27 Jun 2024
Few-Shot Medical Image Segmentation with High-Fidelity Prototypes
Few-Shot Medical Image Segmentation with High-Fidelity Prototypes
Song Tang
Shaxu Yan
Xiaozhi Qi
Jianxin Gao
Mao Ye
Jianwei Zhang
Xiatian Zhu
336
0
0
26 Jun 2024
Pamba: Enhancing Global Interaction in Point Clouds via State Space Model
Pamba: Enhancing Global Interaction in Point Clouds via State Space Model
Hao Sun
Yubo Ai
Jiahao Lu
Chuxin Wang
Jiacheng Deng
Hanzhi Chang
Yanzhe Liang
Wenfei Yang
Shifeng Zhang
Tianzhu Zhang
Mamba
178
0
0
25 Jun 2024
Long Context Transfer from Language to Vision
Long Context Transfer from Language to Vision
Peiyuan Zhang
Kaichen Zhang
Bo Li
Guangtao Zeng
Jingkang Yang
Yuanhan Zhang
Ziyue Wang
Haoran Tan
Chunyuan Li
Ziwei Liu
VLM
318
349
0
24 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
233
51
0
24 Jun 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing
  Backpropagation
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
237
3
0
24 Jun 2024
SimSMoE: Solving Representational Collapse via Similarity Measure
SimSMoE: Solving Representational Collapse via Similarity Measure
Giang Do
Hung Le
T. Tran
MoE
281
3
0
22 Jun 2024
Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths
Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
363
41
0
21 Jun 2024
In Tree Structure Should Sentence Be Generated
In Tree Structure Should Sentence Be Generated
Yaguang Li
Xin Chen
113
0
0
20 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
198
16
0
19 Jun 2024
In-Context Former: Lightning-fast Compressing Context for Large Language
  Model
In-Context Former: Lightning-fast Compressing Context for Large Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiangfeng Wang
Zaiyi Chen
Zheyong Xie
Tong Xu
Yongyi He
Enhong Chen
211
9
0
19 Jun 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu
Jiangfei Duan
Chang Chen
Siran Liu
Xiuhong Li
Xin Lv
Xiao Chuanfu
Huanqi Cao
Chao Yang
343
27
0
17 Jun 2024
Taking a Deep Breath: Enhancing Language Modeling of Large Language
  Models with Sentinel Tokens
Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens
Weiyao Luo
Suncong Zheng
Heming Xia
Weikang Wang
Yan Lei
Tianyu Liu
Shuang Chen
Zhifang Sui
150
2
0
16 Jun 2024
Hierarchical Compression of Text-Rich Graphs via Large Language Models
Hierarchical Compression of Text-Rich Graphs via Large Language Models
Shichang Zhang
Da Zheng
Jiani Zhang
Qi Zhu
Xiang Song
Soji Adeshina
Christos Faloutsos
George Karypis
Yizhou Sun
VLM
319
3
0
13 Jun 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
268
10
0
12 Jun 2024
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
Jingyao Li
Han Shi
Xin Jiang
Zhenguo Li
Hong Xu
Jiaya Jia
LRM
202
4
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Haoran Pan
Chen Liang
Weizhu Chen
Mamba
388
115
0
11 Jun 2024
Recurrent Context Compression: Efficiently Expanding the Context Window
  of LLM
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
Chensen Huang
Guibo Zhu
Xuepeng Wang
Yifei Luo
Guojing Ge
Haoran Chen
Dong Yi
Jinqiao Wang
223
3
0
10 Jun 2024
What Can We Learn from State Space Models for Machine Learning on
  Graphs?
What Can We Learn from State Space Models for Machine Learning on Graphs?
Yinan Huang
Siqi Miao
Pan Li
248
11
0
09 Jun 2024
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context
  Large Language Models
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models
Hengyu Zhang
RALM
213
6
0
09 Jun 2024
LoCoCo: Dropping In Convolutions for Long Context Compression
LoCoCo: Dropping In Convolutions for Long Context CompressionInternational Conference on Machine Learning (ICML), 2024
Ruisi Cai
Yuandong Tian
Zhangyang Wang
Beidi Chen
192
15
0
08 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
333
2
0
05 Jun 2024
Training of Physical Neural Networks
Training of Physical Neural Networks
Ali Momeni
Babak Rahmani
B. Scellier
Logan G. Wright
Peter L. McMahon
...
Julie Grollier
Andrea J. Liu
D. Psaltis
Andrea Alù
Romain Fleury
PINNAI4CE
323
69
0
05 Jun 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal
  Learning
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
284
10
0
04 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
Soheil Feizi
A. Bhatele
263
47
0
04 Jun 2024
Previous
123...678...242526
Next
Page 7 of 26
Pageof 26