GLU Variants Improve Transformer

12 February 2020

Noam M. Shazeer

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

50 / 904 papers shown

Rethinking Performance Gains in Image Dehazing Networks

169

23 Sep 2022

Automatic Label Sequence Generation for Prompting Sequence-to-sequence ModelsInternational Conference on Computational Linguistics (COLING), 2022

Zichun Yu

Tianyu Gao

Zhengyan Zhang

Yankai Lin

Zhiyuan Liu

Maosong Sun

Jie Zhou

VLM LRM

115

20 Sep 2022

MUST-VQA: MUltilingual Scene-text VQA

Emanuele Vivoli

248

14 Sep 2022

Transformers with Learnable Activation FunctionsFindings (Findings), 2022

274

30 Aug 2022

Multiple Instance Neuroimage Transformer

Qingyu Zhao

166

19 Aug 2022

MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth

Chenjie Cao

Xinlin Ren

Yanwei Fu

402

04 Aug 2022

giMLPs: Gate with Inhibition Mechanism in MLPs

163

01 Aug 2022

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Sharan Narang

248

121

21 Jul 2022

Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022

527

332

27 Jun 2022

On the Parameterization and Initialization of Diagonal State Space ModelsNeural Information Processing Systems (NeurIPS), 2022

413

473

23 Jun 2022

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale KnowledgeNeural Information Processing Systems (NeurIPS), 2022

Linxi Fan

De-An Huang

498

495

17 Jun 2022

Rank Diminishing in Deep Neural NetworksNeural Information Processing Systems (NeurIPS), 2022

227

13 Jun 2022

Sparse Mixers: Combining MoE and Mixing to build a more efficient BERTConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

James Lee-Thorp

Joshua Ainslie

MoE

220

24 May 2022

BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in BanglaFindings (Findings), 2022

358

23 May 2022

Life after BERT: What do Other Muppets Understand about Language?Annual Meeting of the Association for Computational Linguistics (ACL), 2022

363

21 May 2022

Sergio Gomez Colmenarejo

...

444

976

12 May 2022

Supplementary Material: Implementation and Experiments for GAU-based Model

Zhenjie Liu

125

12 May 2022

UL2: Unifying Language Learning ParadigmsInternational Conference on Learning Representations (ICLR), 2022

...

566

359

10 May 2022

Boosting Adversarial Transferability of MLP-Mixer

180

26 Apr 2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?International Conference on Machine Learning (ICML), 2022

285

215

12 Apr 2022

Simple Baselines for Image RestorationEuropean Conference on Computer Vision (ECCV), 2022

916

1,241

10 Apr 2022

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and GenerationComputer Vision and Pattern Recognition (CVPR), 2022

Hang Xu

246

09 Apr 2022

PaLM: Scaling Language Modeling with PathwaysJournal of machine learning research (JMLR), 2022

Sharan Narang

...

Kathy Meier-Hellstern

1.2K

7,457

05 Apr 2022

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary SpaceConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

621

462

28 Mar 2022

Error Correction Code TransformerNeural Information Processing Systems (NeurIPS), 2022

Yoni Choukroun

Lior Wolf

214

27 Mar 2022

Geometry-Aware Supertagging with Heterogeneous Dynamic Convolutions

Konstantinos Kogkalidis

M. Moortgat

219

23 Mar 2022

IT5: Text-to-text Pretraining for Italian Language Understanding and GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022

Gabriele Sarti

Malvina Nissim

AILaw

246

07 Mar 2022

TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

Kailun Yang

273

27 Feb 2022

Transformer Quality in Linear TimeInternational Conference on Machine Learning (ICML), 2022

478

299

21 Feb 2022

ST-MoE: Designing Stable and Transferable Sparse Expert Models

422

298

17 Feb 2022

VRT: A Video Restoration TransformerIEEE Transactions on Image Processing (IEEE TIP), 2022

Yuchen Fan

Radu Timofte

Luc Van Gool

ViT

363

339

28 Jan 2022

LaMDA: Language Models for Dialog Applications

...

379

1,784

20 Jan 2022

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

...

694

1,056

13 Dec 2021

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

...

307

230

22 Nov 2021

A Multi-attribute Controllable Generative Model for Histopathology Image SynthesisInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2021

116

10 Nov 2021

Geometric Transformer for End-to-End Molecule Properties Prediction

Yoni Choukroun

Lior Wolf

AI4CE ViT

247

26 Oct 2021

NormFormer: Improved Transformer Pretraining with Extra Normalization

Sam Shleifer

Jason Weston

Myle Ott

AI4CE

275

18 Oct 2021

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

409

14 Oct 2021

Primer: Searching for Efficient Transformers for Language Modeling

401

184

17 Sep 2021

SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models

304

31 Aug 2021

Sequence-to-Sequence Piano Transcription with TransformersInternational Society for Music Information Retrieval Conference (ISMIR), 2021

331

19 Jul 2021

MedGPT: Medical Concept Prediction from Clinical Narratives

202

07 Jul 2021

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Zhen Qin

342

187

23 Jun 2021

Revisiting Deep Learning Models for Tabular DataNeural Information Processing Systems (NeurIPS), 2021

523

1,069

22 Jun 2021

Distributed Deep Learning in Open CollaborationsNeural Information Processing Systems (NeurIPS), 2021

...

278

18 Jun 2021

Memory-efficient Transformers via Top-

k

245

13 Jun 2021

A Survey of TransformersAI Open (AO), 2021

Tianyang Lin

Yuxin Wang

Xiangyang Liu

Xipeng Qiu

ViT

441

1,380

08 Jun 2021

Pay Attention to MLPsNeural Information Processing Systems (NeurIPS), 2021

574

796

17 May 2021

The Power of Scale for Parameter-Efficient Prompt TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

1.4K

4,984

18 Apr 2021

Do Transformer Modifications Transfer Across Implementations and Applications?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Sharan Narang

...

215

134

23 Feb 2021