ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "GLU Variants Improve Transformer"

4 / 904 papers shown
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
576
3,067
0
11 Jan 2021
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
683
2,935
0
22 Oct 2020
Rewiring the Transformer with Depth-Wise LSTMs
Rewiring the Transformer with Depth-Wise LSTMsInternational Conference on Language Resources and Evaluation (LREC), 2020
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
214
7
0
13 Jul 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Much Knowledge Can You Pack Into the Parameters of a Language Model?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
572
991
0
10 Feb 2020
Previous
123...171819