Generating Long Sequences with Sparse Transformers

23 April 2019

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown

ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration

110

14 Oct 2024

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

255

12 Oct 2024

Token Pruning using a Lightweight Background Aware Vision Transformer

276

12 Oct 2024

DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing AttentionAsian Conference on Computer Vision (ACCV), 2024

221

11 Oct 2024

InAttention: Linear Context Scaling for Transformers

Joseph Eisner

163

09 Oct 2024

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear ComplexityInternational Conference on Learning Representations (ICLR), 2024

Mutian He

Philip N. Garner

625

09 Oct 2024

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient AttentionsInternational Conference on Learning Representations (ICLR), 2024

402

09 Oct 2024

Accelerating Error Correction Code Transformers

Matan Levy

Yoni Choukroun

Lior Wolf

232

08 Oct 2024

LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy AttentionsInternational Conference on Learning Representations (ICLR), 2024

R. Kannan

Chiranjib Bhattacharyya

Praneeth Kacham

David P. Woodruff

274

07 Oct 2024

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse AttentionInternational Conference on Learning Representations (ICLR), 2024

188

07 Oct 2024

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

...

632

06 Oct 2024

System 2 Reasoning Capabilities Are Nigh

Scott C. Lowe

VLM LRM

202

04 Oct 2024

S7: Selective and Simplified State Space Layers for Sequence Modeling

276

04 Oct 2024

Exploring the Limitations of Mamba in COPY and CoT Reasoning

Ruifeng Ren

Zhicong Li

Yong Liu

253

04 Oct 2024

Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

174

03 Oct 2024

Selective Attention Improves TransformerInternational Conference on Learning Representations (ICLR), 2024

Yaniv Leviathan

Matan Kalman

Yossi Matias

357

03 Oct 2024

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial ContextsInternational Conference on Learning Representations (ICLR), 2024

Suyu Ge

Xihui Lin

Yunan Zhang

Jiawei Han

Hao Peng

347

02 Oct 2024

Attention layers provably solve single-location regressionInternational Conference on Learning Representations (ICLR), 2024

1.0K

02 Oct 2024

GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction

Zaid Ilyas

Naveed Akhtar

David Suter

Syed Zulqarnain Gilani

262

01 Oct 2024

Cottention: Linear Transformers With Cosine Attention

Gabriel Mongaras

Trevor Dohm

Eric C. Larson

167

27 Sep 2024

Generative AI-driven forecasting of oil production

201

24 Sep 2024

MonoFormer: One Transformer for Both Diffusion and Autoregression

Errui Ding

Yifan Sun

Xinyan Xiao

Jingdong Wang

DiffM

234

24 Sep 2024

Efficiently Dispatching Flash Attention For Partially Filled Attention Masks

Agniv Sharma

Jonas Geiping

216

23 Sep 2024

MambaFoley: Foley Sound Generation using Selective State-Space ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

362

13 Sep 2024

Expanding Expressivity in Transformer Models with MöbiusAttention

Anna-Maria Halacheva

M. Nayyeri

Steffen Staab

224

08 Sep 2024

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

331

30 Aug 2024

HLogformer: A Hierarchical Transformer for Representing Log Data

195

29 Aug 2024

Autoregressive model path dependence near Ising criticality

Yi Hong Teoh

R. Melko

AI4CE

157

28 Aug 2024

Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Wei Chen

Zhiyuan Li

Shuo Xin

Yihao Wang

272

28 Aug 2024

Legilimens: Practical and Unified Content Moderation for Large Language Model ServicesConference on Computer and Communications Security (CCS), 2024

356

28 Aug 2024

Reconstructing physiological signals from fMRI across the adult lifespan

Yamin Li

236

26 Aug 2024

$Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for Transformer Pretraining$

Mixed Sparsity Training: Achieving 4

\times

FLOP Reduction for Transformer Pretraining

Pihe Hu

Shaolong Li

Longbo Huang

193

21 Aug 2024

Macformer: Transformer with Random Maclaurin Feature Attention

Yuhan Guo

Lizhong Ding

Ye Yuan

Guoren Wang

265

21 Aug 2024

ELASTIC: Efficient Linear Attention for Sequential Interest Compression

Guorui Zhou

388

18 Aug 2024

Increasing transformer token length with a Maximum Entropy Principle Method

R. I. Cukier

193

17 Aug 2024

Ex3: Automatic Novel Writing by Extracting, Excelsior and ExpandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Lei Huang

Jiaming Guo

Guanhua He

Xishan Zhang

Rui Zhang

Shaohui Peng

Shaoli Liu

Tianshi Chen

194

16 Aug 2024

Snuffy: Efficient Whole Slide Image ClassifierEuropean Conference on Computer Vision (ECCV), 2024

328

15 Aug 2024

Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics DiscoveryNeural Information Processing Systems (NeurIPS), 2024

Tian Gao

255

14 Aug 2024

Post-Training Sparse Attention with Double Sparsity

Shuo Yang

Ying Sheng

Joseph E. Gonzalez

Ion Stoica

Lianmin Zheng

293

11 Aug 2024

Sampling Foundational Transformer: A Theoretical Perspective

368

11 Aug 2024

226

07 Aug 2024

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference TimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Zhenyu Zhang

Yu Sun

256

07 Aug 2024

Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets

294

04 Aug 2024

LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake GenerationInternational Conference on Pattern Recognition (ICPR), 2024

222

04 Aug 2024

DeMansia: Mamba Never Forgets Any Tokens

Ricky Fang

Mamba

164

04 Aug 2024

What comes after transformers? -- A selective survey connecting ideas in deep learning

Johannes Schneider

AI4CE

410

01 Aug 2024

A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder

Hyun Rae Jo

Dong Kun Shin

275

30 Jul 2024

FlexAttention for Efficient High-Resolution Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024

Chuang Gan

261

29 Jul 2024

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

253

29 Jul 2024

Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

Xia Song

151

25 Jul 2024