Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,283 papers shown
ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration
Aunabil Chakma
Aditya Chakma
Soham Khisa
Chumui Tripura
Masum Hasan
Rifat Shahriyar
110
3
0
14 Oct 2024
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Yufa Zhou
255
19
0
12 Oct 2024
Token Pruning using a Lightweight Background Aware Vision Transformer
Sudhakar Sah
Ravish Kumar
Honnesh Rohmetra
Ehsan Saboori
ViT
276
2
0
12 Oct 2024
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
Asian Conference on Computer Vision (ACCV), 2024
Nguyen Huu Bao Long
Chenyu Zhang
Yuzhi Shi
Tsubasa Hirakawa
Takayoshi Yamashita
Tohgoroh Matsui
H. Fujiyoshi
221
10
0
11 Oct 2024
InAttention: Linear Context Scaling for Transformers
Joseph Eisner
163
0
0
09 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
International Conference on Learning Representations (ICLR), 2024
Mutian He
Philip N. Garner
625
0
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
International Conference on Learning Representations (ICLR), 2024
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jia-Nan Li
Weiyao Lin
VLM
402
5
0
09 Oct 2024
Accelerating Error Correction Code Transformers
Matan Levy
Yoni Choukroun
Lior Wolf
MQ
232
3
0
08 Oct 2024
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
International Conference on Learning Representations (ICLR), 2024
R. Kannan
Chiranjib Bhattacharyya
Praneeth Kacham
David P. Woodruff
274
1
0
07 Oct 2024
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
International Conference on Learning Representations (ICLR), 2024
Lijie Yang
Zhihao Zhang
Zhuofu Chen
Zikun Li
Zhihao Jia
188
11
0
07 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
632
46
0
06 Oct 2024
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
202
2
0
04 Oct 2024
S7: Selective and Simplified State Space Layers for Sequence Modeling
Taylan Soydan
Nikola Zubić
Nico Messikommer
Siddhartha Mishra
Davide Scaramuzza
276
13
0
04 Oct 2024
Exploring the Limitations of Mamba in COPY and CoT Reasoning
Ruifeng Ren
Zhicong Li
Yong Liu
253
2
0
04 Oct 2024
Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sudipta Singha Roy
Xindi Wang
Robert E. Mercer
Frank Rudzicz
174
0
0
03 Oct 2024
Selective Attention Improves Transformer
International Conference on Learning Representations (ICLR), 2024
Yaniv Leviathan
Matan Kalman
Yossi Matias
357
20
0
03 Oct 2024
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
International Conference on Learning Representations (ICLR), 2024
Suyu Ge
Xihui Lin
Yunan Zhang
Jiawei Han
Hao Peng
347
10
0
02 Oct 2024
Attention layers provably solve single-location regression
International Conference on Learning Representations (ICLR), 2024
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
1.0K
9
0
02 Oct 2024
GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction
Zaid Ilyas
Naveed Akhtar
David Suter
Syed Zulqarnain Gilani
262
1
0
01 Oct 2024
Cottention: Linear Transformers With Cosine Attention
Gabriel Mongaras
Trevor Dohm
Eric C. Larson
167
2
0
27 Sep 2024
Generative AI-driven forecasting of oil production
Yash Gandhi
Kexin Zheng
Birendra Jha
K. Nomura
A. Nakano
P. Vashishta
R. Kalia
201
1
0
24 Sep 2024
MonoFormer: One Transformer for Both Diffusion and Autoregression
Chuyang Zhao
Yuxing Song
Wenhao Wang
Haocheng Feng
Errui Ding
Yifan Sun
Xinyan Xiao
Jingdong Wang
DiffM
234
39
0
24 Sep 2024
Efficiently Dispatching Flash Attention For Partially Filled Attention Masks
Agniv Sharma
Jonas Geiping
216
2
0
23 Sep 2024
MambaFoley: Foley Sound Generation using Selective State-Space Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
Mamba
362
5
0
13 Sep 2024
Expanding Expressivity in Transformer Models with MöbiusAttention
Anna-Maria Halacheva
M. Nayyeri
Steffen Staab
224
1
0
08 Sep 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
Hari Subramoni
D. Panda
331
5
0
30 Aug 2024
HLogformer: A Hierarchical Transformer for Representing Log Data
Zhichao Hou
Mina Ghashami
Mikhail Kuznetsov
MohamadAli Torkamani
195
1
0
29 Aug 2024
Autoregressive model path dependence near Ising criticality
Yi Hong Teoh
R. Melko
AI4CE
157
3
0
28 Aug 2024
Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Wei Chen
Zhiyuan Li
Shuo Xin
Yihao Wang
272
7
0
28 Aug 2024
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Conference on Computer and Communications Security (CCS), 2024
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wei Dong
356
12
0
28 Aug 2024
Reconstructing physiological signals from fMRI across the adult lifespan
Shiyu Wang
Ziyuan Xu
Laurent M. Lochard
Yamin Li
Jiawen Fan
Jianfei Chen
Yuankai Huo
Mara Mather
Roza G. Bayrak
Catie Chang
236
0
0
26 Aug 2024
Mixed Sparsity Training: Achieving 4
×
\times
×
FLOP Reduction for Transformer Pretraining
Pihe Hu
Shaolong Li
Longbo Huang
193
0
0
21 Aug 2024
Macformer: Transformer with Random Maclaurin Feature Attention
Yuhan Guo
Lizhong Ding
Ye Yuan
Guoren Wang
265
0
0
21 Aug 2024
ELASTIC: Efficient Linear Attention for Sequential Interest Compression
Jiaxin Deng
Shiyao Wang
Song Lu
Yinfeng Li
Xinchen Luo
Yuanjun Liu
Peixing Xu
Guorui Zhou
388
0
0
18 Aug 2024
Increasing transformer token length with a Maximum Entropy Principle Method
R. I. Cukier
193
1
0
17 Aug 2024
Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Lei Huang
Jiaming Guo
Guanhua He
Xishan Zhang
Rui Zhang
Shaohui Peng
Shaoli Liu
Tianshi Chen
194
6
0
16 Aug 2024
Snuffy: Efficient Whole Slide Image Classifier
European Conference on Computer Vision (ECCV), 2024
Hossein Jafarinia
Alireza Alipanah
Danial Hamdi
Saeed Razavi
Nahal Mirzaie
M. Rohban
3DH
328
7
0
15 Aug 2024
Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics Discovery
Neural Information Processing Systems (NeurIPS), 2024
Yue Yu
Ning Liu
Fei Lu
Tian Gao
S. Jafarzadeh
Stewart Silling
AI4CE
255
21
0
14 Aug 2024
Post-Training Sparse Attention with Double Sparsity
Shuo Yang
Ying Sheng
Joseph E. Gonzalez
Ion Stoica
Lianmin Zheng
293
25
0
11 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
368
0
0
11 Aug 2024
Prompt and Prejudice
Lorenzo Berlincioni
Luca Cultrera
Federico Becattini
Marco Bertini
Marco Bertini
226
1
0
07 Aug 2024
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yilong Chen
Guoxia Wang
Junyuan Shang
Shiyao Cui
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
Dianhai Yu
Hua Wu
256
31
0
07 Aug 2024
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets
Shima Foolad
Kourosh Kiani
R. Rastgoo
FaML
294
0
0
04 Aug 2024
LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation
International Conference on Pattern Recognition (ICPR), 2024
Dwij Mehta
Aditya Mehta
Pratik Narang
DiffM
222
2
0
04 Aug 2024
DeMansia: Mamba Never Forgets Any Tokens
Ricky Fang
Mamba
164
0
0
04 Aug 2024
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
AI4CE
410
3
0
01 Aug 2024
A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder
Hyun Rae Jo
Dong Kun Shin
275
8
0
30 Jul 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
European Conference on Computer Vision (ECCV), 2024
Junyan Li
Delin Chen
Tianle Cai
Peihao Chen
Yining Hong
Zhenfang Chen
Yikang Shen
Chuang Gan
VLM
261
7
0
29 Jul 2024
Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings
Seungyeon Rhyu
Kichang Yang
Sungjun Cho
Jaehyeon Kim
Kyogu Lee
Moontae Lee
253
0
0
29 Jul 2024
Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads
Xihui Lin
Yunan Zhang
Suyu Ge
Barun Patra
Vishrav Chaudhary
Hao Peng
Xia Song
151
0
0
25 Jul 2024
Previous
1
2
3
...
5
6
7
...
24
25
26
Next