Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"GLU Variants Improve Transformer"
50 / 904 papers shown
T-former: An Efficient Transformer for Image Inpainting
ACM Multimedia (ACM MM), 2022
Ye Deng
Siqi Hui
Sanping Zhou
Deyu Meng
Jinjun Wang
ViT
215
49
0
12 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Reliability Engineering & System Safety (Reliab. Eng. Syst. Saf.), 2023
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
380
91
0
10 May 2023
XTab: Cross-table Pretraining for Tabular Transformers
International Conference on Machine Learning (ICML), 2023
Bingzhao Zhu
Xingjian Shi
Nick Erickson
Mu Li
George Karypis
Mahsa Shoaran
LMTD
278
98
0
10 May 2023
Toeplitz Neural Network for Sequence Modeling
International Conference on Learning Representations (ICLR), 2023
Zhen Qin
Xiaodong Han
Weixuan Sun
Bowen He
Dong Li
Dongxu Li
Yuchao Dai
Lingpeng Kong
Yiran Zhong
AI4TS
ViT
163
49
0
08 May 2023
A technical note on bilinear layers for interpretability
Lee D. Sharkey
FAtt
71
10
0
05 May 2023
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
189
46
0
19 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
320
63
0
17 Apr 2023
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Yiming Cui
Ziqing Yang
Xin Yao
ALM
286
389
0
17 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Edouard Grave
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
1.1K
6,043
0
14 Apr 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Neural Information Processing Systems (NeurIPS), 2023
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Yue Liu
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
223
80
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
296
51
0
07 Apr 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoE
LRM
295
122
0
06 Apr 2023
Effective Theory of Transformers at Initialization
Emily Dinan
Sho Yaida
Susan Zhang
164
19
0
04 Apr 2023
Masked Autoencoders as Image Processors
Huiyu Duan
Wei Shen
Xiongkuo Min
Danyang Tu
Long Teng
Jia Wang
Guangtao Zhai
ViT
136
13
0
30 Mar 2023
Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing
Walid Hariri
AI4MH
LM&MA
897
119
0
27 Mar 2023
The Battle of Information Representations: Comparing Sentiment and Semantic Features for Forecasting Market Trends
International Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2023
A.S. Zaichenko
A. Kazakov
Elizaveta Kovtun
S. Budennyy
AIFin
109
2
0
24 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Image and Vision Computing (IVC), 2023
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
399
409
0
20 Mar 2023
Trained on 100 million words and still in shape: BERT meets British National Corpus
Findings (Findings), 2023
David Samuel
Andrey Kutuzov
Lilja Øvrelid
Erik Velldal
343
42
0
17 Mar 2023
A Generative Model for Digital Camera Noise Synthesis
M. Song
Yang Zhang
T. Aydin
Elham Amin Mansour
DisneyResearchStudios
VLM
253
6
0
16 Mar 2023
RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose
Tao Jiang
Peng Lu
Li Zhang
Ning Ma
Rui Han
Chengqi Lyu
Yining Li
Kai-xiang Chen
3DH
383
302
0
13 Mar 2023
AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing
Sen Pei
Jingya Yu
Qi Chen
Wozhou He
134
3
0
03 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
6.1K
17,759
0
27 Feb 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu
Qihang Zhao
Guoqi Li
Nhan Duy Truong
BDL
VLM
440
115
0
27 Feb 2023
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Abigail Z. Jacobs
LM&Ro
SSL
280
189
0
24 Feb 2023
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shengkui Zhao
Bin Ma
213
74
0
23 Feb 2023
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
134
3
0
22 Feb 2023
Chain of Hindsight Aligns Language Models with Feedback
International Conference on Learning Representations (ICLR), 2023
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
802
149
0
06 Feb 2023
Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling
Zheng Yuan
Yaoyun Zhang
Chuanqi Tan
Wei Wang
Feiran Huang
Songfang Huang
AI4CE
ViT
148
7
0
02 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
International Conference on Learning Representations (ICLR), 2023
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
463
25
0
01 Feb 2023
Composer's Assistant: An Interactive Transformer for Multi-Track MIDI Infilling
International Society for Music Information Retrieval Conference (ISMIR), 2023
Martin E. Malandro
221
10
0
29 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
International Conference on Machine Learning (ICML), 2023
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
356
55
0
27 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
International Conference on Machine Learning (ICML), 2023
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&Ro
OffRL
AI4CE
LRM
326
147
0
18 Jan 2023
ExcelFormer: A neural network surpassing GBDTs on tabular data
Jintai Chen
Jiahuan Yan
Qiyuan Chen
Benlin Liu
Jian Wu
Jimeng Sun
LMTD
350
36
0
07 Jan 2023
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
221
12
0
24 Dec 2022
Pretraining Without Attention
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
229
56
0
20 Dec 2022
Latent Diffusion for Language Generation
Neural Information Processing Systems (NeurIPS), 2022
Justin Lovelace
Varsha Kishore
Chao-gang Wan
Eliot Shekhtman
Kilian Q. Weinberger
DiffM
255
111
0
19 Dec 2022
Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jai Gupta
Yi Tay
C. Kamath
Vinh Q. Tran
Donald Metzler
S. Bavadekar
Mimi Sun
E. Gabrilovich
MedIm
131
0
0
16 Dec 2022
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yekun Chai
Shuohuan Wang
Chao Pang
Yu Sun
Hao Tian
Hua Wu
234
42
0
13 Dec 2022
LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Yuguang Yang
Yu Pan
Jingjing Yin
Heng Lu
251
4
0
05 Dec 2022
Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring
Computer Vision and Pattern Recognition (CVPR), 2022
Lingshun Kong
Jiangxin Dong
Mingqiang Li
J. Ge
Jin-shan Pan
ViT
184
274
0
22 Nov 2022
MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2022
D. Coccomini
Giorgos Kordopatis-Zilos
Giuseppe Amato
R. Caldelli
Fabrizio Falchi
Symeon Papadopoulos
Claudio Gennaro
239
27
0
20 Nov 2022
AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation
International Conference on Natural Language Generation (INLG), 2022
Hayate Iso
183
8
0
15 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
841
2,755
0
09 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
International Conference on Learning Representations (ICLR), 2022
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
285
120
0
07 Nov 2022
A Long-term Dependent and Trustworthy Approach to Reactor Accident Prognosis based on Temporal Fusion Transformer
Chengyuan Li
Zihan Qiu
Yugao Ma
Mei Li
97
1
0
28 Oct 2022
What Language Model to Train if You Have One Million GPU Hours?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
573
120
0
27 Oct 2022
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models
Hao Liu
Xinyang Geng
Lisa Lee
Igor Mordatch
Sergey Levine
Sharan Narang
Pieter Abbeel
KELM
CLL
253
3
0
24 Oct 2022
The Devil in Linear Transformer
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
210
96
0
19 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
383
475
0
06 Oct 2022
Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour
Fangyu Liu
Julian Martin Eisenschlos
Jeremy R. Cole
Nigel Collier
191
5
0
26 Sep 2022
Previous
1
2
3
...
16
17
18
19
Next