Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GLU Variants Improve Transformer"
47 / 647 papers shown
Title
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
26
231
0
27 Jun 2022
On the Parameterization and Initialization of Diagonal State Space Models
Albert Gu
Ankit Gupta
Karan Goel
Christopher Ré
14
296
0
23 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
42
347
0
17 Jun 2022
Rank Diminishing in Deep Neural Networks
Ruili Feng
Kecheng Zheng
Yukun Huang
Deli Zhao
Michael I. Jordan
Zhengjun Zha
18
28
0
13 Jun 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
James Lee-Thorp
Joshua Ainslie
MoE
30
11
0
24 May 2022
BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla
Abhik Bhattacharjee
Tahmid Hasan
Wasi Uddin Ahmad
Rifat Shahriyar
AIMat
LM&MA
18
28
0
23 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
34
6
0
21 May 2022
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
49
783
0
12 May 2022
Supplementary Material: Implementation and Experiments for GAU-based Model
Zhenjie Liu
9
0
0
12 May 2022
UL2: Unifying Language Learning Paradigms
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
Xavier Garcia
Jason W. Wei
...
Tal Schuster
H. Zheng
Denny Zhou
N. Houlsby
Donald Metzler
AI4CE
55
293
0
10 May 2022
Boosting Adversarial Transferability of MLP-Mixer
Haoran Lyu
Yajie Wang
Yu-an Tan
Huipeng Zhou
Yuhang Zhao
Quan-xin Zhang
AAML
14
1
0
26 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
23
167
0
12 Apr 2022
Simple Baselines for Image Restoration
Liangyu Chen
Xiaojie Chu
X. Zhang
Jian-jun Sun
48
831
0
10 Apr 2022
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Chunjing Xu
Yanwei Fu
17
17
0
09 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
80
5,983
0
05 Apr 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
34
333
0
28 Mar 2022
Error Correction Code Transformer
Yoni Choukroun
Lior Wolf
19
47
0
27 Mar 2022
Geometry-Aware Supertagging with Heterogeneous Dynamic Convolutions
Konstantinos Kogkalidis
M. Moortgat
6
9
0
23 Mar 2022
IT5: Text-to-text Pretraining for Italian Language Understanding and Generation
Gabriele Sarti
Malvina Nissim
AILaw
8
42
0
07 Mar 2022
TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
R. Liu
Kailun Yang
Alina Roitberg
Jiaming Zhang
Kunyu Peng
Huayao Liu
Yaonan Wang
Rainer Stiefelhagen
ViT
34
36
0
27 Feb 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
222
0
21 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
24
181
0
17 Feb 2022
VRT: A Video Restoration Transformer
Jingyun Liang
Jiezhang Cao
Yuchen Fan
K. Zhang
Rakesh Ranjan
Yawei Li
Radu Timofte
Luc Van Gool
ViT
26
251
0
28 Jan 2022
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
13
1,557
0
20 Jan 2022
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Z. Chen
Claire Cui
ALM
MoE
34
763
0
13 Dec 2021
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
V. Aribandi
Yi Tay
Tal Schuster
J. Rao
H. Zheng
...
Jianmo Ni
Jai Gupta
Kai Hui
Sebastian Ruder
Donald Metzler
MoE
10
213
0
22 Nov 2021
A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis
Jiarong Ye
Yuan Xue
Peter Liu
R. Zaino
K. Cheng
Xiaolei Huang
MedIm
20
7
0
10 Nov 2021
Geometric Transformer for End-to-End Molecule Properties Prediction
Yoni Choukroun
Lior Wolf
AI4CE
ViT
17
16
0
26 Oct 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
28
74
0
18 Oct 2021
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
17
55
0
14 Oct 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
83
152
0
17 Sep 2021
SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models
Yogesh Kumar
Alexander Ilin
H. Salo
S. Kulathinal
M. Leinonen
Pekka Marttinen
AI4TS
MedIm
12
0
0
31 Aug 2021
Sequence-to-Sequence Piano Transcription with Transformers
Curtis Hawthorne
Ian Simon
Rigel Swavely
Ethan Manilow
Jesse Engel
30
81
0
19 Jul 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MA
AI4TS
MedIm
16
38
0
07 Jul 2021
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
43
152
0
23 Jun 2021
Revisiting Deep Learning Models for Tabular Data
Yu. V. Gorishniy
Ivan Rubachev
Valentin Khrulkov
Artem Babenko
LMTD
19
694
0
22 Jun 2021
Distributed Deep Learning in Open Collaborations
Michael Diskin
Alexey Bukhtiyarov
Max Ryabinin
Lucile Saulnier
Quentin Lhoest
...
Denis Mazur
Ilia Kobelev
Yacine Jernite
Thomas Wolf
Gennady Pekhimenko
FedML
33
54
0
18 Jun 2021
Memory-efficient Transformers via Top-
k
k
k
Attention
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
MQ
32
51
0
13 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
27
1,084
0
08 Jun 2021
Pay Attention to MLPs
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
AI4CE
17
651
0
17 May 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,843
0
18 Apr 2021
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
17
126
0
23 Feb 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
11
2,067
0
11 Jan 2021
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
10
2,439
0
22 Oct 2020
Rewiring the Transformer with Depth-Wise LSTMs
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
21
6
0
13 Jul 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
6
861
0
10 Feb 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,950
0
20 Apr 2018
Previous
1
2
3
...
11
12
13