Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,282 papers shown
PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling
Yukun Zhang
Xueqing Zhou
AI4CE
157
0
0
27 Sep 2025
ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
Chenyu Liu
Yuqiu Deng
T. Liu
J. Zhou
Xinliang Zhou
Ziyu Jia
Y. Ding
VLM
91
0
0
26 Sep 2025
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Jiahao Huo
Pengxiao Lin
Zhiwei Wang
Zhi-Qin John Xu
Mamba
169
0
0
22 Sep 2025
Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
Qishuai Wen
Zhiyuan Huang
Chun-Guang Li
MQ
380
0
0
21 Sep 2025
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Krati Saxena
Federico Jurado Ruiz
Guido Manzi
Dianbo Liu
Alex Lamb
192
0
0
19 Sep 2025
Local Mechanisms of Compositional Generalization in Conditional Diffusion
Arwen Bradley
DiffM
CoGe
244
1
0
19 Sep 2025
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Saeed Amizadeh
Sara Abdali
Yinheng Li
K. Koishida
175
0
0
18 Sep 2025
The Few-shot Dilemma: Over-prompting Large Language Models
Yongjian Tang
Doruk Tuncel
Christian Koerner
Thomas Runkler
232
4
0
16 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
161
1
0
16 Sep 2025
A Comprehensive Review of Reinforcement Learning for Autonomous Driving in the CARLA Simulator
Elahe Delavari
Feeza Khan Khanzada
Jaerock Kwon
145
3
0
10 Sep 2025
Customizing the Inductive Biases of Softmax Attention using Structured Matrices
Yilun Kuang
Noah Amsel
Sanae Lotfi
Shikai Qiu
Andres Potapczynski
Andrew Gordon Wilson
119
0
0
09 Sep 2025
Faster VGGT with Block-Sparse Global Attention
Chung-Shien Brian Wang
Christian Schmidt
Jens Piekenbrinck
Bastian Leibe
ViT
116
8
0
08 Sep 2025
Rethinking the long-range dependency in Mamba/SSM and transformer models
Cong Ma
Kayvan Najarian
Mamba
150
1
0
04 Sep 2025
Differentiable Entropy Regularization: A Complexity-Aware Approach for Neural Optimization
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
AAML
81
0
0
03 Sep 2025
DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
Jusheng Zhang
Yijia Fan
Kaitong Cai
Zimeng Huang
Xiaofei Sun
Jian Wang
Chengpei Tang
Keze Wang
DiffM
153
27
0
02 Sep 2025
REFRAG: Rethinking RAG based Decoding
Xiaoqiang Lin
Aritra Ghosh
Bryan Kian Hsiang Low
Anshumali Shrivastava
Vijai Mohan
LLMAG
226
1
0
01 Sep 2025
DTRNet: Dynamic Token Routing Network to Reduce Quadratic Costs in Transformers
Aman Sharma
Saeed Najafi
Parsa Farinneya
Benyamin Jamialahmadi
Marzieh S. Tahaei
Yuhe Fan
Mehdi Rezagholizadeh
Boxing Chen
A. Jafari
86
1
0
31 Aug 2025
Spiking Decision Transformers: Local Plasticity, Phase-Coding, and Dendritic Routing for Low-Power Sequence Control
Vishal Pandey
Debasmita Biswas
65
0
0
29 Aug 2025
ATM-GAD: Adaptive Temporal Motif Graph Anomaly Detection for Financial Transaction Networks
Zeyue Zhang
Lin Song
Erkang Bao
Xiaoling Lv
Xinyue Wang
AI4TS
MLAU
AIFin
168
1
0
28 Aug 2025
Interpretable by AI Mother Tongue: Native Symbolic Reasoning in Neural Models
Hung Ming Liu
LRM
76
0
0
26 Aug 2025
Limitations of Normalization in Attention Mechanism
Timur Mudarisov
Mikhail Burtsev
Tatiana Petrova
Radu State
95
2
0
25 Aug 2025
Exploring Scaling Laws of CTR Model for Online Performance Improvement
ACM Conference on Recommender Systems (RecSys), 2025
Weijiang Lai
Beihong Jin
Jiongyan Zhang
Yiyuan Zheng
Jian Dong
Jia Cheng
Jun Lei
Xingxing Wang
LRM
180
2
0
21 Aug 2025
Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation
Qirui Li
Guangcong Zheng
Qi Zhao
Jie Li
Bin Dong
Jing Lin
Xi Li
VGen
152
2
0
18 Aug 2025
Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training
Timon Merk
Saeed Salehi
Richard M. Koehler
Qiming Cui
Maria Olaru
...
Nicole R. Provenza
Simon Little
Reza Abbasi-Asl
Phil A. Starr
Wolf-Julian Neumann
AI4CE
107
0
0
13 Aug 2025
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
Yibo Jin
Yixu Xu
Yue-ting Chen
C. Wang
Tao Wang
...
Zhe Wang
Hefei Guo
Hongjie Liu
Wei Lu
Zhengyong Zhang
217
1
0
12 Aug 2025
gpt-oss-120b & gpt-oss-20b Model Card
OpenAI
Sandhini Agarwal
Lama Ahmad
Jason Ai
Sam Altman
...
D. Sculley
Harshit Sikchi
Kendal Simon
K. Singhal
Yang Song
LRM
VLM
131
268
0
08 Aug 2025
Generalizing Scaling Laws for Dense and Sparse Large Language Models
Md Arafat Hossain
Xingfu Wu
V. Taylor
Ali Jannesari
183
0
0
08 Aug 2025
Deformable Attention Graph Representation Learning for Histopathology Whole Slide Image Analysis
Mingxi Fu
Xitong Ling
Yuxuan Chen
Jiawen Li
Fanglei Fu
Huaitian Yuan
Tian Guan
Yonghong He
Lianghui Zhu
85
0
0
07 Aug 2025
GFocal: A Global-Focal Neural Operator for Solving PDEs on Arbitrary Geometries
Fangzhi Fei
Jiaxin Hu
Qiaofeng Li
Zhenyu Liu
AI4CE
209
2
0
06 Aug 2025
Trainable Dynamic Mask Sparse Attention
Jingze Shi
Yifan Wu
Yiran Peng
Yiran Peng
Liangdong Wang
Guang Liu
Yuyu Luo
351
3
0
04 Aug 2025
Pointer: Linear-Complexity Long-Range Modeling without Pre-training
Zixi Li
LLMSV
103
0
0
04 Aug 2025
Hebbian Memory-Augmented Recurrent Networks: Engram Neurons in Deep Learning
Daniel Szelogowski
98
1
0
29 Jul 2025
MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse
Kaiwen Chen
Xin Tan
Minchen Yu
Hong Xu
LRM
VLM
239
1
0
29 Jul 2025
TriangleMix: Accelerating Prefilling via Decoding-time Contribution Sparsity
Zhiyuan He
Xicheng Zhang
Chengruidong Zhang
Huiqiang Jiang
Yuqing Yang
Lili Qiu
170
0
0
29 Jul 2025
Onboard Hyperspectral Super-Resolution with Deep Pushbroom Neural Network
Remote Sensing (RS), 2025
Davide Piccinini
D. Valsesia
E. Magli
SupR
424
1
0
28 Jul 2025
EcoTransformer: Attention without Multiplication
Xin Gao
Xingming Xu
Shirin Amiraslani
Hong Xu
112
1
0
27 Jul 2025
SAMUeL: Efficient Vocal-Conditioned Music Generation via Soft Alignment Attention and Latent Diffusion
Hei Shing Cheung
Boya Zhang
Jonathan H. Chan
DiffM
195
0
0
26 Jul 2025
Modality Agnostic Efficient Long Range Encoder
T. Parag
Ahmed Elgammal
158
0
0
25 Jul 2025
Efficient Attention Mechanisms for Large Language Models: A Survey
Yutao Sun
Zhenyu Li
Yike Zhang
Tengyu Pan
Bowen Dong
Yuyi Guo
Jianyong Wang
245
10
0
25 Jul 2025
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
Simin Huo
Ning Li
ViT
243
0
0
24 Jul 2025
Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models
Zheyu Zhang
Shuo Yang
Bardh Prenkaj
Gjergji Kasneci
LMTD
256
6
0
24 Jul 2025
Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers
Vasileios Titopoulos
K. Alexandridis
G. Dimitrakopoulos
94
0
0
22 Jul 2025
Artifacts and Attention Sinks: Structured Approximations for Efficient Vision Transformers
Andrew Lu
Wentinn Liao
Liuhui Wang
Huzheng Yang
Jianbo Shi
147
1
0
21 Jul 2025
SAS: Simulated Attention Score
Chuanyang Zheng
J. Sun
Yihang Gao
Yuehao Wang
Peihao Wang
...
Atlas Wang
Mac Schwager
Anderson Schneider
Xiaodong Liu
Jianfeng Gao
AI4TS
243
2
0
10 Jul 2025
ETT: Expanding the Long Context Understanding Capability of LLMs at Test-Time
Kiarash Zahirnia
Zahra Golpayegani
Walid Ahmed
Yang Liu
260
0
0
08 Jul 2025
All in One: Visual-Description-Guided Unified Point Cloud Segmentation
Zongyan Han
Mohamed El Amine Boudjoghra
Jiahua Dong
Jinhong Wang
Rao Muhammad Anwer
222
1
0
07 Jul 2025
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers
Patrik Okanovic
Sameer Deshmukh
Grzegorz Kwa'sniewski
Yi Zhu
Haruto Fujii
...
Maciej Besta
Kentaro Katayama
Takumi Honda
Yusuke Nagasaka
Torsten Hoefler
203
0
0
03 Jul 2025
A unified framework for establishing the universal approximation of transformer-type architectures
Jingpu Cheng
T. Lin
Zuowei Shen
Qianxiao Li
155
0
0
30 Jun 2025
RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models
Bailin Wang
Chang Lan
Chong-Jun Wang
Ruoming Pang
257
2
0
18 Jun 2025
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
Luanbo Wan
Weizhi Ma
LLMAG
KELM
233
2
0
16 Jun 2025
Previous
1
2
3
4
5
...
24
25
26
Next