v1v2 (latest)

Transformer Quality in Linear Time

International Conference on Machine Learning (ICML), 2022

21 February 2022

Papers citing "Transformer Quality in Linear Time"

29 / 129 papers shown

RETVec: Resilient and Efficient Text VectorizerNeural Information Processing Systems (NeurIPS), 2023

152

18 Feb 2023

Symbolic Discovery of Optimization AlgorithmsNeural Information Processing Systems (NeurIPS), 2023

...

795

517

13 Feb 2023

Efficient Attention via Control VariatesInternational Conference on Learning Representations (ICLR), 2023

Lin Zheng

Jianbo Yuan

Chong-Jun Wang

Lingpeng Kong

286

09 Feb 2023

Efficient Movie Scene Detection using State-Space TransformersComputer Vision and Pattern Recognition (CVPR), 2022

Gedas Bertasius

246

29 Dec 2022

Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022

Jonas Geiping

Tom Goldstein

MoE

270

103

28 Dec 2022

Towards Neural Variational Monte Carlo That Scales Linearly with System Size

Or Sharir

G. Chan

Anima Anandkumar

170

21 Dec 2022

Pretraining Without AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

236

20 Dec 2022

Efficient Long Sequence Modeling via State Space Augmented Transformer

Xiaodong Liu

327

15 Dec 2022

Meta-Learning Fast Weight Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

196

05 Dec 2022

Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching PerspectiveInternational Conference on Machine Learning (ICML), 2022

Cheng Tan

Hanqun Cao

Lirong Wu

Stan Z. Li

260

02 Dec 2022

Protein Language Models and Structure Prediction: Connection and Progression

Cheng Tan

Stan Z. Li

210

30 Nov 2022

DBA: Efficient Transformer with Dynamic Bilinear Low-Rank AttentionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

178

24 Nov 2022

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Hao Peng

244

07 Nov 2022

MogaNet: Multi-order Gated Aggregation NetworkInternational Conference on Learning Representations (ICLR), 2022

285

125

07 Nov 2022

The Devil in Linear TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Zhen Qin

Lingpeng Kong

210

19 Oct 2022

Decoupling Features in Hierarchical Propagation for Video Object SegmentationNeural Information Processing Systems (NeurIPS), 2022

Zongxin Yang

Yi Yang

VOS

320

198

18 Oct 2022

CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingInternational Conference on Machine Learning (ICML), 2022

606

14 Oct 2022

AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Ganesh Jawahar

Subhabrata Mukherjee

Xiaodong Liu

Young Jin Kim

Muhammad Abdul-Mageed

L. Lakshmanan

Ahmed Hassan Awadallah

Sébastien Bubeck

Jianfeng Gao

MoE

188

14 Oct 2022

Multi-scale Attention Network for Single Image Super-Resolution

308

104

28 Sep 2022

Mega: Moving Average Equipped Gated AttentionInternational Conference on Learning Representations (ICLR), 2022

Graham Neubig

Luke Zettlemoyer

339

219

21 Sep 2022

Stateful Memory-Augmented Transformers for Efficient Dialogue ModelingFindings (Findings), 2022

Qingyang Wu

Zhou Yu

RALM

152

15 Sep 2022

QSAN: A Near-term Achievable Quantum Self-Attention NetworkIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

440

14 Jul 2022

Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022

541

333

27 Jun 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022

848

3,391

27 May 2022

Spatial-Temporal Interactive Dynamic Graph Convolution Network for Traffic Forecasting

Aoyun Liu

Yaying Zhang

GNN AI4TS

284

18 May 2022

Supplementary Material: Implementation and Experiments for GAU-based Model

Zhenjie Liu

128

12 May 2022

Simple Baselines for Image RestorationEuropean Conference on Computer Vision (ECCV), 2022

932

1,252

10 Apr 2022

Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022

449

131

11 Mar 2022

On Learning the Transformer Kernel

Sankalan Pal Chowdhury

337

15 Oct 2021