ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04235
  4. Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018
Noam M. Shazeer
Mitchell Stern
    ODL
ArXiv (abs)PDFHTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown
Exploring Dual Encoder Architectures for Question Answering
Exploring Dual Encoder Architectures for Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhe Dong
Jianmo Ni
Daniel M. Bikel
Enrique Alfonseca
Yuanjin Wang
Chen Qu
I. Zitouni
169
26
0
14 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for
  Zero-Shot Generalization?
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?International Conference on Machine Learning (ICML), 2022
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
294
214
0
12 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with PathwaysJournal of machine learning research (JMLR), 2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
1.2K
7,494
0
05 Apr 2022
LogicInference: A New Dataset for Teaching Logical Inference to seq2seq
  Models
LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models
Santiago Ontanon
Joshua Ainslie
Vaclav Cvicek
Zachary Kenneth Fisher
NAIReLMLRM
285
16
0
28 Mar 2022
CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues
CICERO: A Dataset for Contextualized Commonsense Inference in DialoguesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Deepanway Ghosal
Siqi Shen
Navonil Majumder
Amélie Reymond
Soujanya Poria
201
60
0
25 Mar 2022
Practical tradeoffs between memory, compute, and performance in learned
  optimizers
Practical tradeoffs between memory, compute, and performance in learned optimizers
Luke Metz
C. Freeman
James Harrison
Niru Maheswaranathan
Jascha Narain Sohl-Dickstein
408
37
0
22 Mar 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELMRALM
529
305
0
21 Mar 2022
Sequence-to-Sequence Knowledge Graph Completion and Question Answering
Sequence-to-Sequence Knowledge Graph Completion and Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Apoorv Saxena
Adrian Kochsiek
Rainer Gemulla
AIMat
291
165
0
19 Mar 2022
Towards Lithuanian grammatical error correction
Towards Lithuanian grammatical error correction
Lukas Stankevivcius
Mantas Lukovsevivcius
3DV
134
5
0
18 Mar 2022
Memorizing Transformers
Memorizing TransformersInternational Conference on Learning Representations (ICLR), 2022
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
260
211
0
16 Mar 2022
Hyperdecoders: Instance-specific decoders for multi-task NLP
Hyperdecoders: Instance-specific decoders for multi-task NLPConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Michal Guerquin
Matthew E. Peters
AI4CE
359
23
0
15 Mar 2022
UniSAr: A Unified Structure-Aware Autoregressive Language Model for
  Text-to-SQL
UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQLInternational Journal of Machine Learning and Cybernetics (IJMLC), 2022
Longxu Dou
Yan Gao
Mingyang Pan
Dingzirui Wang
Wanxiang Che
Dechen Zhan
Jian-Guang Lou
229
33
0
15 Mar 2022
Multilingual Mix: Example Interpolation Improves Multilingual Neural
  Machine Translation
Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yong Cheng
Ankur Bapna
Orhan Firat
Yuan Cao
Pidong Wang
Wolfgang Macherey
155
15
0
15 Mar 2022
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for
  Pre-trained Language Models
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models
Ning Ding
Yujia Qin
Guang Yang
Fu Wei
Zonghan Yang
...
Jianfei Chen
Yang Liu
Jie Tang
Juan Li
Maosong Sun
367
226
0
14 Mar 2022
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for
  Abstractive Summarization
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mathieu Ravaut
Shafiq Joty
Nancy F. Chen
MoE
202
113
0
13 Mar 2022
Block-Recurrent Transformers
Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
449
131
0
11 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference timeInternational Conference on Machine Learning (ICML), 2022
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
728
1,298
1
10 Mar 2022
Spatial Commonsense Graph for Object Localisation in Partial Scenes
Spatial Commonsense Graph for Object Localisation in Partial ScenesComputer Vision and Pattern Recognition (CVPR), 2022
Francesco Giuliari
Geri Skenderi
Marco Cristani
Yiming Wang
Alessio Del Bue
247
20
0
10 Mar 2022
IT5: Text-to-text Pretraining for Italian Language Understanding and
  Generation
IT5: Text-to-text Pretraining for Italian Language Understanding and GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022
Gabriele Sarti
Malvina Nissim
AILaw
259
51
0
07 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
377
224
0
07 Mar 2022
Adaptive Gradient Methods with Local Guarantees
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
446
12
0
02 Mar 2022
HyperPrompt: Prompt-based Task-Conditioning of Transformers
HyperPrompt: Prompt-based Task-Conditioning of TransformersInternational Conference on Machine Learning (ICML), 2022
Yun He
H. Zheng
Yi Tay
Jai Gupta
Yu Du
...
Yaguang Li
Zhaoji Chen
Donald Metzler
Heng-Tze Cheng
Ed H. Chi
LRMVLM
280
108
0
01 Mar 2022
Using natural language prompts for machine translation
Using natural language prompts for machine translation
Xavier Garcia
Orhan Firat
AI4CE
221
38
0
23 Feb 2022
A New Generation of Perspective API: Efficient Multilingual
  Character-level Transformers
A New Generation of Perspective API: Efficient Multilingual Character-level TransformersKnowledge Discovery and Data Mining (KDD), 2022
Alyssa Lees
Vinh Q. Tran
Yi Tay
Jeffrey Scott Sorensen
Jai Gupta
Donald Metzler
Lucy Vasserman
232
258
0
22 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice RoutingNeural Information Processing Systems (NeurIPS), 2022
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
619
568
0
18 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
423
301
0
17 Feb 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
  Reasoning
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive ReasoningEuropean Conference on Computer Vision (ECCV), 2022
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
497
61
0
10 Feb 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
449
865
0
07 Feb 2022
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Data Scaling Laws in NMT: The Effect of Noise and ArchitectureInternational Conference on Machine Learning (ICML), 2022
Yamini Bansal
Behrooz Ghorbani
Ankush Garg
Biao Zhang
M. Krikun
Colin Cherry
Behnam Neyshabur
Orhan Firat
238
61
0
04 Feb 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Robust Training of Neural Networks Using Scale Invariant ArchitecturesInternational Conference on Machine Learning (ICML), 2022
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
219
33
0
02 Feb 2022
Examining Scaling and Transfer of Language Model Architectures for
  Machine Translation
Examining Scaling and Transfer of Language Model Architectures for Machine TranslationInternational Conference on Machine Learning (ICML), 2022
Biao Zhang
Behrooz Ghorbani
Ankur Bapna
Yong Cheng
Xavier Garcia
Jonathan Shen
Orhan Firat
278
29
0
01 Feb 2022
Correcting diacritics and typos with a ByT5 transformer model
Correcting diacritics and typos with a ByT5 transformer modelApplied Sciences (Appl. Sci.), 2022
Lukas Stankevicius
M. Lukoševičius
J. Kapočiūtė-Dzikienė
Monika Briediene
Tomas Krilavičius
195
24
0
31 Jan 2022
A Stochastic Bundle Method for Interpolating Networks
A Stochastic Bundle Method for Interpolating Networks
Alasdair Paren
Leonard Berrada
Rudra P. K. Poudel
M. P. Kumar
199
6
0
29 Jan 2022
Cheating Automatic Short Answer Grading: On the Adversarial Usage of
  Adjectives and Adverbs
Cheating Automatic Short Answer Grading: On the Adversarial Usage of Adjectives and AdverbsInternational Journal of Artificial Intelligence in Education (IJAIED), 2022
Anna Filighera
Sebastian Ochs
Tim Steuer
Thomas Tregel
AAML
125
18
0
20 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
  Optimization Landscape
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization LandscapeInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Devansh Bisla
Jing Wang
A. Choromańska
332
45
0
20 Jan 2022
Towards the Next 1000 Languages in Multilingual Machine Translation:
  Exploring the Synergy Between Supervised and Self-Supervised Learning
Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Aditya Siddhant
Ankur Bapna
Orhan Firat
Yuan Cao
Mengzhao Chen
Isaac Caswell
Xavier Garcia
ELMLRM
220
30
0
09 Jan 2022
Comparison of biomedical relationship extraction methods and models for
  knowledge graph creation
Comparison of biomedical relationship extraction methods and models for knowledge graph creationJournal of Web Semantics (Web Semantics), 2022
Nikola Milosevic
W. Thielemann
278
35
0
05 Jan 2022
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Sarah Wiegreffe
Jack Hessel
Swabha Swayamdipta
Mark O. Riedl
Yejin Choi
257
170
0
16 Dec 2021
FRUIT: Faithfully Reflecting Updated Information in Text
FRUIT: Faithfully Reflecting Updated Information in Text
Robert L Logan IV
Alexandre Passos
Sameer Singh
Ming-Wei Chang
KELM
278
45
0
16 Dec 2021
CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement
  Learning
CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning
Zeqiu Wu
Yi Luan
Hannah Rashkin
David Reitter
Hannaneh Hajishirzi
Mari Ostendorf
Gaurav Singh Tomar
LRM
235
99
0
16 Dec 2021
Large Dual Encoders Are Generalizable Retrievers
Large Dual Encoders Are Generalizable Retrievers
Jianmo Ni
Chen Qu
Jing Lu
Zhuyun Dai
Gustavo Hernández Ábrego
...
Vincent Zhao
Yi Luan
Keith B. Hall
Ming-Wei Chang
Yinfei Yang
DML
625
566
0
15 Dec 2021
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALMMoE
707
1,060
0
13 Dec 2021
Dependency Learning for Legal Judgment Prediction with a Unified
  Text-to-Text Transformer
Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer
Yunyun Huang
Xiaoyu Shen
Chuanyi Li
Jidong Ge
B. Luo
AILaw
191
24
0
13 Dec 2021
Extending AdamW by Leveraging Its Second Moment and Magnitude
Extending AdamW by Leveraging Its Second Moment and Magnitude
Guoqiang Zhang
Niwa Kenta
W. Kleijn
178
3
0
09 Dec 2021
Towards Neural Functional Program Evaluation
Towards Neural Functional Program Evaluation
Torsten Scholak
Jonathan Pilault
Joey Velez-Ginorio
NAIELM
76
0
0
09 Dec 2021
Iconary: A Pictionary-Based Game for Testing Multimodal Communication
  with Drawings and Text
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
Christopher Clark
Jordi Salvador
Dustin Schwenk
Derrick Bonafilia
Mark Yatskar
...
Aaron Sarnat
Hannaneh Hajishirzi
Aniruddha Kembhavi
Oren Etzioni
Ali Farhadi
MLLM
141
7
0
01 Dec 2021
Less is More: Generating Grounded Navigation Instructions from Landmarks
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
440
81
0
25 Nov 2021
Combined Scaling for Zero-shot Transfer Learning
Combined Scaling for Zero-shot Transfer Learning
Hieu H. Pham
Zihang Dai
Golnaz Ghiasi
Kenji Kawaguchi
Hanxiao Liu
...
Yi-Ting Chen
Minh-Thang Luong
Yonghui Wu
Mingxing Tan
Quoc V. Le
VLM
390
229
0
19 Nov 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
LiT: Zero-Shot Transfer with Locked-image text TuningComputer Vision and Pattern Recognition (CVPR), 2021
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
646
672
0
15 Nov 2021
Improving Large-scale Language Models and Resources for Filipino
Improving Large-scale Language Models and Resources for FilipinoInternational Conference on Language Resources and Evaluation (LREC), 2021
Jan Christian Blaise Cruz
C. Cheng
AI4CE
155
39
0
11 Nov 2021
Previous
123...1213141516
Next
Page 13 of 16
Pageof 16