ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.10447
  4. Cited By
Transformer Quality in Linear Time
v1v2 (latest)

Transformer Quality in Linear Time

International Conference on Machine Learning (ICML), 2022
21 February 2022
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
ArXiv (abs)PDFHTML

Papers citing "Transformer Quality in Linear Time"

29 / 129 papers shown
RETVec: Resilient and Efficient Text Vectorizer
RETVec: Resilient and Efficient Text VectorizerNeural Information Processing Systems (NeurIPS), 2023
Elie Bursztein
Marina Zhang
Owen Vallis
Xinyu Jia
Alexey Kurakin
VLM
152
6
0
18 Feb 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization AlgorithmsNeural Information Processing Systems (NeurIPS), 2023
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
795
517
0
13 Feb 2023
Efficient Attention via Control Variates
Efficient Attention via Control VariatesInternational Conference on Learning Representations (ICLR), 2023
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
286
22
0
09 Feb 2023
Efficient Movie Scene Detection using State-Space Transformers
Efficient Movie Scene Detection using State-Space TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Md. Mohaiminul Islam
Mahmudul Hasan
Kishan Athrey
Tony Braskich
Gedas Bertasius
ViT
246
69
0
29 Dec 2022
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022
Jonas Geiping
Tom Goldstein
MoE
270
103
0
28 Dec 2022
Towards Neural Variational Monte Carlo That Scales Linearly with System
  Size
Towards Neural Variational Monte Carlo That Scales Linearly with System Size
Or Sharir
G. Chan
Anima Anandkumar
170
6
0
21 Dec 2022
Pretraining Without Attention
Pretraining Without AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
236
56
0
20 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
327
37
0
15 Dec 2022
Meta-Learning Fast Weight Language Models
Meta-Learning Fast Weight Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
196
15
0
05 Dec 2022
Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook
  Matching Perspective
Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching PerspectiveInternational Conference on Machine Learning (ICML), 2022
Cheng Tan
Zhangyang Gao
Hanqun Cao
Xingran Chen
Ge Wang
Lirong Wu
Jun Xia
Jiangbin Zheng
Stan Z. Li
260
2
0
02 Dec 2022
Protein Language Models and Structure Prediction: Connection and
  Progression
Protein Language Models and Structure Prediction: Connection and Progression
Bozhen Hu
Jun Xia
Jiangbin Zheng
Cheng Tan
Yufei Huang
Yongjie Xu
Stan Z. Li
210
45
0
30 Nov 2022
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank AttentionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Bosheng Qin
Juncheng Li
Siliang Tang
Yueting Zhuang
178
4
0
24 Nov 2022
How Much Does Attention Actually Attend? Questioning the Importance of
  Attention in Pretrained Transformers
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Michael Hassid
Hao Peng
Daniel Rotem
Jungo Kasai
Ivan Montero
Noah A. Smith
Roy Schwartz
244
31
0
07 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
MogaNet: Multi-order Gated Aggregation NetworkInternational Conference on Learning Representations (ICLR), 2022
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
285
125
0
07 Nov 2022
The Devil in Linear Transformer
The Devil in Linear TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
210
96
0
19 Oct 2022
Decoupling Features in Hierarchical Propagation for Video Object
  Segmentation
Decoupling Features in Hierarchical Propagation for Video Object SegmentationNeural Information Processing Systems (NeurIPS), 2022
Zongxin Yang
Yi Yang
VOS
320
198
0
18 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingInternational Conference on Machine Learning (ICML), 2022
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Dianbo Sui
3DV
606
9
0
14 Oct 2022
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for
  Efficient Neural Machine Translation
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Ganesh Jawahar
Subhabrata Mukherjee
Xiaodong Liu
Young Jin Kim
Muhammad Abdul-Mageed
L. Lakshmanan
Ahmed Hassan Awadallah
Sébastien Bubeck
Jianfeng Gao
MoE
188
11
0
14 Oct 2022
Multi-scale Attention Network for Single Image Super-Resolution
Multi-scale Attention Network for Single Image Super-Resolution
Yan Wang
Yusen Li
Gang Wang
Xiaoguang Liu
SupR
308
104
0
28 Sep 2022
Mega: Moving Average Equipped Gated Attention
Mega: Moving Average Equipped Gated AttentionInternational Conference on Learning Representations (ICLR), 2022
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
339
219
0
21 Sep 2022
Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling
Stateful Memory-Augmented Transformers for Efficient Dialogue ModelingFindings (Findings), 2022
Qingyang Wu
Zhou Yu
RALM
152
4
0
15 Sep 2022
QSAN: A Near-term Achievable Quantum Self-Attention Network
QSAN: A Near-term Achievable Quantum Self-Attention NetworkIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Jinjing Shi
Ren-Xin Zhao
Wenxuan Wang
Shenmin Zhang
Xuelong Li
440
35
0
14 Jul 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
541
333
0
27 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
848
3,391
0
27 May 2022
Spatial-Temporal Interactive Dynamic Graph Convolution Network for
  Traffic Forecasting
Spatial-Temporal Interactive Dynamic Graph Convolution Network for Traffic Forecasting
Aoyun Liu
Yaying Zhang
GNNAI4TS
284
59
0
18 May 2022
Supplementary Material: Implementation and Experiments for GAU-based
  Model
Supplementary Material: Implementation and Experiments for GAU-based Model
Zhenjie Liu
128
0
0
12 May 2022
Simple Baselines for Image Restoration
Simple Baselines for Image RestorationEuropean Conference on Computer Vision (ECCV), 2022
Liangyu Chen
Xiaojie Chu
Xinming Zhang
Jian Sun
932
1,252
0
10 Apr 2022
Block-Recurrent Transformers
Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
449
131
0
11 Mar 2022
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
337
17
0
15 Oct 2021
Previous
123
Page 3 of 3