Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2002.05202
Cited By

GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020

Noam M. Shazeer

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

4 / 904 papers shown

Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021

Noam M. Shazeer

576

3,067

0

11 Jan 2021

mT5: A massively multilingual pre-trained text-to-text transformer

mT5: A massively multilingual pre-trained text-to-text transformer

Aditya Siddhant

683

2,935

0

22 Oct 2020

Rewiring the Transformer with Depth-Wise LSTMs

Rewiring the Transformer with Depth-Wise LSTMsInternational Conference on Language Resources and Evaluation (LREC), 2020

Josef van Genabith

214

7

0

13 Jul 2020

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

How Much Knowledge Can You Pack Into the Parameters of a Language Model?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Noam M. Shazeer

572

991

0

10 Feb 2020

1 2 3...17 18 19