ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown
T-former: An Efficient Transformer for Image Inpainting
T-former: An Efficient Transformer for Image InpaintingACM Multimedia (ACM MM), 2022
Ye Deng
Siqi Hui
Sanping Zhou
Deyu Meng
Jinjun Wang
ViT
215
49
0
12 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health
  Management: A Survey and Roadmaps
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and RoadmapsReliability Engineering & System Safety (Reliab. Eng. Syst. Saf.), 2023
Yanfang Li
Huan Wang
Muxia Sun
LM&MAAI4TSAI4CE
380
91
0
10 May 2023
XTab: Cross-table Pretraining for Tabular Transformers
XTab: Cross-table Pretraining for Tabular TransformersInternational Conference on Machine Learning (ICML), 2023
Bingzhao Zhu
Xingjian Shi
Nick Erickson
Mu Li
George Karypis
Mahsa Shoaran
LMTD
278
98
0
10 May 2023
Toeplitz Neural Network for Sequence Modeling
Toeplitz Neural Network for Sequence ModelingInternational Conference on Learning Representations (ICLR), 2023
Zhen Qin
Xiaodong Han
Weixuan Sun
Bowen He
Dong Li
Dongxu Li
Yuchao Dai
Lingpeng Kong
Yiran Zhong
AI4TSViT
163
49
0
08 May 2023
A technical note on bilinear layers for interpretability
A technical note on bilinear layers for interpretability
Lee D. Sharkey
FAtt
71
10
0
05 May 2023
A Theory on Adam Instability in Large-Scale Machine Learning
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
189
46
0
19 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoEALM
320
63
0
17 Apr 2023
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Yiming Cui
Ziqing Yang
Xin Yao
ALM
286
389
0
17 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Edouard Grave
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLMCLIPSSL
1.1K
6,043
0
14 Apr 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast InferenceNeural Information Processing Systems (NeurIPS), 2023
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Yue Liu
Yu Zhang
Ming-Wei Chang
BDLAI4CE
223
80
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
296
51
0
07 Apr 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the
  Cerebras Wafer-Scale Cluster
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoELRM
295
122
0
06 Apr 2023
Effective Theory of Transformers at Initialization
Effective Theory of Transformers at Initialization
Emily Dinan
Sho Yaida
Susan Zhang
164
19
0
04 Apr 2023
Masked Autoencoders as Image Processors
Masked Autoencoders as Image Processors
Huiyu Duan
Wei Shen
Xiongkuo Min
Danyang Tu
Long Teng
Jia Wang
Guangtao Zhai
ViT
136
13
0
30 Mar 2023
Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its
  Applications, Advantages, Limitations, and Future Directions in Natural
  Language Processing
Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing
Walid Hariri
AI4MHLM&MA
897
119
0
27 Mar 2023
The Battle of Information Representations: Comparing Sentiment and
  Semantic Features for Forecasting Market Trends
The Battle of Information Representations: Comparing Sentiment and Semantic Features for Forecasting Market TrendsInternational Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2023
A.S. Zaichenko
A. Kazakov
Elizaveta Kovtun
S. Budennyy
AIFin
109
2
0
24 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon GenesisImage and Vision Computing (IVC), 2023
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMViTCLIP
399
409
0
20 Mar 2023
Trained on 100 million words and still in shape: BERT meets British
  National Corpus
Trained on 100 million words and still in shape: BERT meets British National CorpusFindings (Findings), 2023
David Samuel
Andrey Kutuzov
Lilja Øvrelid
Erik Velldal
343
42
0
17 Mar 2023
A Generative Model for Digital Camera Noise Synthesis
A Generative Model for Digital Camera Noise Synthesis
M. Song
Yang Zhang
T. Aydin
Elham Amin Mansour
DisneyResearchStudios
VLM
253
6
0
16 Mar 2023
RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose
RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose
Tao Jiang
Peng Lu
Li Zhang
Ning Ma
Rui Han
Chengqi Lyu
Yining Li
Kai-xiang Chen
3DH
383
302
0
13 Mar 2023
AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep
  Learning Assistant Video Editing
AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing
Sen Pei
Jingya Yu
Qi Chen
Wozhou He
134
3
0
03 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
6.1K
17,759
0
27 Feb 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural
  Networks
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu
Qihang Zhao
Guoqi Li
Nhan Duy Truong
BDLVLM
440
115
0
27 Feb 2023
Language-Driven Representation Learning for Robotics
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Abigail Z. Jacobs
LM&RoSSL
280
189
0
24 Feb 2023
MossFormer: Pushing the Performance Limit of Monaural Speech Separation
  using Gated Single-Head Transformer with Convolution-Augmented Joint
  Self-Attentions
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-AttentionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shengkui Zhao
Bin Ma
213
74
0
23 Feb 2023
Entity-Level Text-Guided Image Manipulation
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
134
3
0
22 Feb 2023
Chain of Hindsight Aligns Language Models with Feedback
Chain of Hindsight Aligns Language Models with FeedbackInternational Conference on Learning Representations (ICLR), 2023
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
802
149
0
06 Feb 2023
Molecular Geometry-aware Transformer for accurate 3D Atomic System
  modeling
Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling
Zheng Yuan
Yaoyun Zhang
Chuanqi Tan
Wei Wang
Feiran Huang
Songfang Huang
AI4CEViT
148
7
0
02 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of
  Attention Maps
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention MapsInternational Conference on Learning Representations (ICLR), 2023
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
463
25
0
01 Feb 2023
Composer's Assistant: An Interactive Transformer for Multi-Track MIDI
  Infilling
Composer's Assistant: An Interactive Transformer for Multi-Track MIDI InfillingInternational Society for Music Information Retrieval Conference (ISMIR), 2023
Martin E. Malandro
221
10
0
29 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-EfficientInternational Conference on Machine Learning (ICML), 2023
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
356
55
0
27 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Human-Timescale Adaptation in an Open-Ended Task SpaceInternational Conference on Machine Learning (ICML), 2023
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&RoOffRLAI4CELRM
326
147
0
18 Jan 2023
ExcelFormer: A neural network surpassing GBDTs on tabular data
ExcelFormer: A neural network surpassing GBDTs on tabular data
Jintai Chen
Jiahuan Yan
Qiyuan Chen
Benlin Liu
Jian Wu
Jimeng Sun
LMTD
350
36
0
07 Jan 2023
On Realization of Intelligent Decision-Making in the Real World: A
  Foundation Decision Model Perspective
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
221
12
0
24 Dec 2022
Pretraining Without Attention
Pretraining Without AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
229
56
0
20 Dec 2022
Latent Diffusion for Language Generation
Latent Diffusion for Language GenerationNeural Information Processing Systems (NeurIPS), 2022
Justin Lovelace
Varsha Kishore
Chao-gang Wan
Eliot Shekhtman
Kilian Q. Weinberger
DiffM
255
111
0
19 Dec 2022
Dense Feature Memory Augmented Transformers for COVID-19 Vaccination
  Search Classification
Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jai Gupta
Yi Tay
C. Kamath
Vinh Q. Tran
Donald Metzler
S. Bavadekar
Mimi Sun
E. Gabrilovich
MedIm
131
0
0
16 Dec 2022
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for
  Programming Languages
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yekun Chai
Shuohuan Wang
Chao Pang
Yu Sun
Hao Tian
Hua Wu
234
42
0
13 Dec 2022
LMEC: Learnable Multiplicative Absolute Position Embedding Based
  Conformer for Speech Recognition
LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Yuguang Yang
Yu Pan
Jingjing Yin
Heng Lu
251
4
0
05 Dec 2022
Efficient Frequency Domain-based Transformers for High-Quality Image
  Deblurring
Efficient Frequency Domain-based Transformers for High-Quality Image DeblurringComputer Vision and Pattern Recognition (CVPR), 2022
Lingshun Kong
Jiangxin Dong
Mingqiang Li
J. Ge
Jin-shan Pan
ViT
184
274
0
22 Nov 2022
MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection
MINTIME: Multi-Identity Size-Invariant Video Deepfake DetectionIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2022
D. Coccomini
Giorgos Kordopatis-Zilos
Giuseppe Amato
R. Caldelli
Fabrizio Falchi
Symeon Papadopoulos
Claudio Gennaro
239
27
0
20 Nov 2022
AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation
AutoTemplate: A Simple Recipe for Lexically Constrained Text GenerationInternational Conference on Natural Language Generation (INLG), 2022
Hayate Iso
183
8
0
15 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
841
2,755
0
09 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
MogaNet: Multi-order Gated Aggregation NetworkInternational Conference on Learning Representations (ICLR), 2022
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
285
120
0
07 Nov 2022
A Long-term Dependent and Trustworthy Approach to Reactor Accident
  Prognosis based on Temporal Fusion Transformer
A Long-term Dependent and Trustworthy Approach to Reactor Accident Prognosis based on Temporal Fusion Transformer
Chengyuan Li
Zihan Qiu
Yugao Ma
Mei Li
97
1
0
28 Oct 2022
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoEAI4CE
573
120
0
27 Oct 2022
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal
  Language Models
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models
Hao Liu
Xinyang Geng
Lisa Lee
Igor Mordatch
Sergey Levine
Sharan Narang
Pieter Abbeel
KELMCLL
253
3
0
24 Oct 2022
The Devil in Linear Transformer
The Devil in Linear TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
210
96
0
19 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
383
475
0
06 Oct 2022
Do ever larger octopi still amplify reporting biases? Evidence from
  judgments of typical colour
Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour
Fangyu Liu
Julian Martin Eisenschlos
Jeremy R. Cole
Nigel Collier
191
5
0
26 Sep 2022
Previous
123...16171819
Next