Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,282 papers shown
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
1.1K
1
0
21 Apr 2025
CacheFormer: High Attention-Based Segment Caching
Applied Informatics (AI), 2025
Sushant Singh
A. Mahmood
221
1
0
18 Apr 2025
AttentionDrop: A Novel Regularization Method for Transformer Models
Mirza Samad Ahmed Baig
Syeda Anshrah Gillani
Abdul Akbar Khan
Shahid Munir Shah
Muhammad Omer Khan
244
0
0
16 Apr 2025
Analysis of Attention in Video Diffusion Transformers
Yuxin Wen
Jim Wu
Ajay Jain
Tom Goldstein
Ashwinee Panda
278
8
0
14 Apr 2025
Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD
Byunggun Kim
Younghun Kwon
MedIm
55
0
0
12 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGen
VGen
564
3
0
01 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
381
2
0
31 Mar 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
547
1
0
29 Mar 2025
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
Hao Zhang
R. Su
Zhihang Yuan
Pengtao Chen
Mingzhu Shen Yibo Fan
Shengen Yan
Guohao Dai
Yu Wang
303
9
0
28 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
564
23
0
27 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
297
15
0
20 Mar 2025
XAttention: Block Sparse Attention with Antidiagonal Scoring
Ruyi Xu
Guangxuan Xiao
Haofeng Huang
Junxian Guo
Enze Xie
336
55
0
20 Mar 2025
Intra-neuronal attention within language models Relationships between activation and semantics
Michael Pichat
William Pogrund
Paloma Pichat
Armanouche Gasparian
Samuel Demarchi
Corbet Alois Georgeon
Michael Veillet-Guillem
MILM
256
0
0
17 Mar 2025
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
International Conference on Learning Representations (ICLR), 2025
Ziran Qin
Yuchen Cao
Mingbao Lin
Wen Hu
Shixuan Fan
Ke Cheng
Weiyao Lin
Jianguo Li
281
26
0
16 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
332
8
0
13 Mar 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
431
4
0
13 Mar 2025
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Emily Xiao
Chin-Jou Li
Yilin Zhang
Graham Neubig
Amanda Bertsch
BDL
332
2
0
11 Mar 2025
TokenButler: Token Importance is Predictable
Yash Akhauri
Ahmed F. AbouElhamayed
Yifei Gao
Chi-chih Chang
Nilesh Jain
Mohamed S. Abdelfattah
196
3
0
10 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
854
4
0
10 Mar 2025
Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts
Aref Farhadipour
Hossein Ranjbar
Masoumeh Chapariniya
Teodora Vukovic
Sarah Ebling
Volker Dellwo
286
2
0
09 Mar 2025
Spectral Informed Mamba for Robust Point Cloud Processing
Computer Vision and Pattern Recognition (CVPR), 2025
Ali Bahri
Moslem Yazdanpanah
Mehrdad Noori
Sahar Dastani
Milad Cheraghalikhani
David Osowiechi
G. A. V. Hakim
Farzad Beizaee
Ismail ben Ayed
Christian Desrosiers
Mamba
3DPC
325
6
0
06 Mar 2025
SED2AM: Solving Multi-Trip Time-Dependent Vehicle Routing Problem using Deep Reinforcement Learning
ACM Transactions on Knowledge Discovery from Data (TKDD), 2025
Arash Mozhdehi
Longji Xu
Sun Sun
Xin Eric Wang
AI4TS
407
0
0
06 Mar 2025
L
2
^2
2
M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
311
5
0
06 Mar 2025
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Yujiao Yang
Jing Lian
Linhui Li
MoE
361
0
0
04 Mar 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
Computer Vision and Pattern Recognition (CVPR), 2025
Theodore Zhao
Sid Kiblawi
Naoto Usuyama
Ho Hin Lee
Sam Preston
Hoifung Poon
Mu-Hsin Wei
MedIm
444
1
0
04 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
1.0K
0
0
03 Mar 2025
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Computer Vision and Pattern Recognition (CVPR), 2025
Hui Liu
Chen Jia
Fan Shi
Xu Cheng
Shengyong Chen
Mamba
476
17
0
03 Mar 2025
Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners
International Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Yuxin Wang
Botian Jiang
Yiran Guo
Quan Gan
David Wipf
Qi Zhang
Xipeng Qiu
AI4CE
215
4
0
03 Mar 2025
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Yifei Xia
Suhan Ling
Fangcheng Fu
Yijiao Wang
Huixia Li
Xuefeng Xiao
Tengjiao Wang
VGen
372
30
0
28 Feb 2025
Reasoning is Periodicity? Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
562
2
0
28 Feb 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
International Conference on Learning Representations (ICLR), 2025
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
303
49
0
28 Feb 2025
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation
K. A. Kinfu
René Vidal
ViT
278
0
0
28 Feb 2025
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
International Conference on Learning Representations (ICLR), 2025
Sandeep Silwal
David P. Woodruff
Qiuyi Zhang
260
0
0
27 Feb 2025
Sliding Window Attention Training for Efficient Large Language Models
Zichuan Fu
Wentao Song
Longji Xu
X. Wu
Yefeng Zheng
Yingying Zhang
Derong Xu
Xuetao Wei
Tong Xu
Xiangyu Zhao
472
8
0
26 Feb 2025
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Zhiyu Li
Yu Li
296
3
0
25 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
237
6
0
24 Feb 2025
Protein Large Language Models: A Comprehensive Survey
Yijia Xiao
Wanjia Zhao
Junkai Zhang
Yiqiao Jin
Han Zhang
...
Xiao Luo
Yu Zhang
James Zou
Yizhou Sun
Wei Wang
LM&MA
AI4CE
426
22
0
21 Feb 2025
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention
Pattern Recognition (Pattern Recogn.), 2024
Bochao Zou
Zizheng Guo
Jiansheng Chen
Junbao Zhuo
Weiran Huang
Huimin Ma
ViT
AI4TS
354
1
0
21 Feb 2025
Compression Barriers for Autoregressive Transformers
Themistoklis Haris
Krzysztof Onak
170
1
0
21 Feb 2025
Neural Attention Search
Difan Deng
Marius Lindauer
543
0
0
18 Feb 2025
Continuous Diffusion Model for Language Modeling
Jaehyeong Jo
Sung Ju Hwang
213
3
0
17 Feb 2025
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization
Bowen Pang
Kai Li
Ruifeng She
Feifan Wang
OffRL
277
2
0
14 Feb 2025
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
432
9
0
11 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
618
0
0
10 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Litu Ou
Mirella Lapata
MoMe
1.1K
3
0
03 Feb 2025
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2025
Nathaniel Tomczak
Sanmukh Kuppannagari
608
1
0
31 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
International Conference on Learning Representations (ICLR), 2025
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
570
5
0
24 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Computer Vision and Pattern Recognition (CVPR), 2025
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
837
3
0
21 Jan 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
International Conference on Learning Representations (ICLR), 2025
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
221
7
0
21 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
International Conference on Computational Linguistics (COLING), 2024
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
468
7
0
20 Jan 2025
Previous
1
2
3
4
5
...
24
25
26
Next