
![]() Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient SparsityJournal of machine learning research (JMLR), 2021 |
![]() Rewiring the Transformer with Depth-Wise LSTMsInternational Conference on Language Resources and Evaluation (LREC), 2020 |
![]() How Much Knowledge Can You Pack Into the Parameters of a Language Model?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020 |