GLU Variants Improve Transformer

12 February 2020

Noam M. Shazeer

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown

T-former: An Efficient Transformer for Image InpaintingACM Multimedia (ACM MM), 2022

215

12 May 2023

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and RoadmapsReliability Engineering & System Safety (Reliab. Eng. Syst. Saf.), 2023

380

10 May 2023

XTab: Cross-table Pretraining for Tabular TransformersInternational Conference on Machine Learning (ICML), 2023

George Karypis

278

10 May 2023

Toeplitz Neural Network for Sequence ModelingInternational Conference on Learning Representations (ICLR), 2023

Zhen Qin

Yuchao Dai

Lingpeng Kong

Yiran Zhong

AI4TS ViT

163

08 May 2023

A technical note on bilinear layers for interpretability

Lee D. Sharkey

FAtt

05 May 2023

A Theory on Adam Instability in Large-Scale Machine Learning

Igor Molybog

...

189

19 Apr 2023

The MiniPile Challenge for Data-Efficient Language Models

Jean Kaddour

MoE ALM

320

17 Apr 2023

Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca

286

389

17 Apr 2023

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab

...

1.1K

6,043

14 Apr 2023

Conditional Adapters: Parameter-efficient Transfer Learning with Fast InferenceNeural Information Processing Systems (NeurIPS), 2023

Joshua Ainslie

...

223

11 Apr 2023

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Li Shen

Liang Ding

296

07 Apr 2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

295

122

06 Apr 2023

Effective Theory of Transformers at Initialization

Emily Dinan

Sho Yaida

Susan Zhang

164

04 Apr 2023

Masked Autoencoders as Image Processors

Xiongkuo Min

Guangtao Zhai

136

30 Mar 2023

Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing

Walid Hariri

AI4MH LM&MA

897

119

27 Mar 2023

The Battle of Information Representations: Comparing Sentiment and Semantic Features for Forecasting Market TrendsInternational Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2023

109

24 Mar 2023

EVA-02: A Visual Representation for Neon GenesisImage and Vision Computing (IVC), 2023

399

409

20 Mar 2023

Trained on 100 million words and still in shape: BERT meets British National CorpusFindings (Findings), 2023

343

17 Mar 2023

A Generative Model for Digital Camera Noise Synthesis

DisneyResearchStudios

VLM

253

16 Mar 2023

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

Li Zhang

383

302

13 Mar 2023

AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing

134

03 Mar 2023

LLaMA: Open and Efficient Foundation Language Models

...

6.1K

17,759

27 Feb 2023

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Rui-Jie Zhu

440

115

27 Feb 2023

Language-Driven Representation Learning for Robotics

Dorsa Sadigh

280

189

24 Feb 2023

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-AttentionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shengkui Zhao

Bin Ma

213

23 Feb 2023

Entity-Level Text-Guided Image Manipulation

Hang Xu

Wei Zhang

134

22 Feb 2023

Chain of Hindsight Aligns Language Models with FeedbackInternational Conference on Learning Representations (ICLR), 2023

Hao Liu

Carmelo Sferrazza

Pieter Abbeel

ALM

802

149

06 Feb 2023

Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling

Chuanqi Tan

148

02 Feb 2023

Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention MapsInternational Conference on Learning Representations (ICLR), 2023

463

01 Feb 2023

Composer's Assistant: An Interactive Transformer for Multi-Track MIDI InfillingInternational Society for Music Information Retrieval Conference (ISMIR), 2023

Martin E. Malandro

221

29 Jan 2023

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-EfficientInternational Conference on Machine Learning (ICML), 2023

Tim Dettmers

356

27 Jan 2023

Human-Timescale Adaptation in an Open-Ended Task SpaceInternational Conference on Machine Learning (ICML), 2023

Feryal M. P. Behbahani

...

Lei Zhang

LM&Ro OffRL AI4CE LRM

326

147

18 Jan 2023

ExcelFormer: A neural network surpassing GBDTs on tabular data

Jintai Chen

350

07 Jan 2023

On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective

Jingxiao Chen

221

24 Dec 2022

Pretraining Without AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

229

20 Dec 2022

Latent Diffusion for Language GenerationNeural Information Processing Systems (NeurIPS), 2022

255

111

19 Dec 2022

Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

131

16 Dec 2022

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

234

13 Dec 2022

LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition

Yuguang Yang

Yu Pan

Jingjing Yin

Heng Lu

251

05 Dec 2022

Efficient Frequency Domain-based Transformers for High-Quality Image DeblurringComputer Vision and Pattern Recognition (CVPR), 2022

184

274

22 Nov 2022

MINTIME: Multi-Identity Size-Invariant Video Deepfake DetectionIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2022

D. Coccomini

Giorgos Kordopatis-Zilos

239

20 Nov 2022

AutoTemplate: A Simple Recipe for Lexically Constrained Text GenerationInternational Conference on Natural Language Generation (INLG), 2022

Hayate Iso

183

15 Nov 2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Angela Fan

...

841

2,755

09 Nov 2022

MogaNet: Multi-order Gated Aggregation NetworkInternational Conference on Learning Representations (ICLR), 2022

285

120

07 Nov 2022

A Long-term Dependent and Trustworthy Approach to Reactor Accident Prognosis based on Temporal Fusion Transformer

28 Oct 2022

What Language Model to Train if You Have One Million GPU Hours?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

...

573

120

27 Oct 2022

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

Sharan Narang

Pieter Abbeel

KELM CLL

253

24 Oct 2022

The Devil in Linear TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Zhen Qin

Lingpeng Kong

210

19 Oct 2022

VIMA: General Robot Manipulation with Multimodal Prompts

Li Fei-Fei

Linxi Fan

383

475

06 Oct 2022

Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour

Fangyu Liu

Julian Martin Eisenschlos

Jeremy R. Cole

Nigel Collier

191

26 Sep 2022