Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

Exploring Dual Encoder Architectures for Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Chen Qu

169

14 Apr 2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?International Conference on Machine Learning (ICML), 2022

294

214

12 Apr 2022

PaLM: Scaling Language Modeling with PathwaysJournal of machine learning research (JMLR), 2022

Sharan Narang

...

Kathy Meier-Hellstern

1.2K

7,494

05 Apr 2022

LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models

Santiago Ontanon

Joshua Ainslie

Vaclav Cvicek

Zachary Kenneth Fisher

NAI ReLM LRM

285

28 Mar 2022

CICERO: A Dataset for Contextualized Commonsense Inference in DialoguesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

201

25 Mar 2022

Practical tradeoffs between memory, compute, and performance in learned optimizers

Jascha Narain Sohl-Dickstein

408

22 Mar 2022

Teaching language models to support answers with verified quotes

...

Lucy Campbell-Gillingham

G. Irving

Nat McAleese

ELM RALM

529

305

21 Mar 2022

Sequence-to-Sequence Knowledge Graph Completion and Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

291

165

19 Mar 2022

Towards Lithuanian grammatical error correction

Lukas Stankevivcius

Mantas Lukovsevivcius

3DV

134

18 Mar 2022

Memorizing TransformersInternational Conference on Learning Representations (ICLR), 2022

260

211

16 Mar 2022

Hyperdecoders: Instance-specific decoders for multi-task NLPConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Michal Guerquin

Matthew E. Peters

AI4CE

359

15 Mar 2022

UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQLInternational Journal of Machine Learning and Cybernetics (IJMLC), 2022

229

15 Mar 2022

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

155

15 Mar 2022

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

...

Jianfei Chen

Yang Liu

Jie Tang

Juan Li

Maosong Sun

367

226

14 Mar 2022

SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Mathieu Ravaut

Shafiq Joty

Nancy F. Chen

MoE

202

113

13 Mar 2022

Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022

449

131

11 Mar 2022

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference timeInternational Conference on Machine Learning (ICML), 2022

Raphael Gontijo-Lopes

...

728

1,298

10 Mar 2022

Spatial Commonsense Graph for Object Localisation in Partial ScenesComputer Vision and Pattern Recognition (CVPR), 2022

247

10 Mar 2022

IT5: Text-to-text Pretraining for Italian Language Understanding and GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022

Gabriele Sarti

Malvina Nissim

AILaw

259

07 Mar 2022

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Xiaodong Liu

377

224

07 Mar 2022

Adaptive Gradient Methods with Local Guarantees

446

02 Mar 2022

HyperPrompt: Prompt-based Task-Conditioning of TransformersInternational Conference on Machine Learning (ICML), 2022

...

280

108

01 Mar 2022

Using natural language prompts for machine translation

Xavier Garcia

Orhan Firat

AI4CE

221

23 Feb 2022

A New Generation of Perspective API: Efficient Multilingual Character-level TransformersKnowledge Discovery and Data Mining (KDD), 2022

Alyssa Lees

Vinh Q. Tran

Yi Tay

Jeffrey Scott Sorensen

Jai Gupta

Donald Metzler

Lucy Vasserman

232

258

22 Feb 2022

Mixture-of-Experts with Expert Choice RoutingNeural Information Processing Systems (NeurIPS), 2022

619

568

18 Feb 2022

ST-MoE: Designing Stable and Transferable Sparse Expert Models

423

301

17 Feb 2022

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive ReasoningEuropean Conference on Computer Vision (ECCV), 2022

Yejin Choi

497

10 Feb 2022

Red Teaming Language Models with Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Saffron Huang

449

865

07 Feb 2022

Data Scaling Laws in NMT: The Effect of Noise and ArchitectureInternational Conference on Machine Learning (ICML), 2022

Colin Cherry

238

04 Feb 2022

Robust Training of Neural Networks Using Scale Invariant ArchitecturesInternational Conference on Machine Learning (ICML), 2022

Srinadh Bhojanapalli

219

02 Feb 2022

Examining Scaling and Transfer of Language Model Architectures for Machine TranslationInternational Conference on Machine Learning (ICML), 2022

278

01 Feb 2022

Correcting diacritics and typos with a ByT5 transformer modelApplied Sciences (Appl. Sci.), 2022

Lukas Stankevicius

M. Lukoševičius

J. Kapočiūtė-Dzikienė

Monika Briediene

Tomas Krilavičius

195

31 Jan 2022

A Stochastic Bundle Method for Interpolating Networks

199

29 Jan 2022

Cheating Automatic Short Answer Grading: On the Adversarial Usage of Adjectives and AdverbsInternational Journal of Artificial Intelligence in Education (IJAIED), 2022

125

20 Jan 2022

Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization LandscapeInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022

Devansh Bisla

Jing Wang

A. Choromańska

332

20 Jan 2022

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

220

09 Jan 2022

Comparison of biomedical relationship extraction methods and models for knowledge graph creationJournal of Web Semantics (Web Semantics), 2022

Nikola Milosevic

W. Thielemann

278

05 Jan 2022

Reframing Human-AI Collaboration for Generating Free-Text Explanations

Yejin Choi

257

170

16 Dec 2021

FRUIT: Faithfully Reflecting Updated Information in Text

278

16 Dec 2021

CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning

Hannah Rashkin

235

16 Dec 2021

Large Dual Encoders Are Generalizable Retrievers

Jianmo Ni

Chen Qu

Jing Lu

Zhuyun Dai

Gustavo Hernández Ábrego

...

625

566

15 Dec 2021

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

...

707

1,060

13 Dec 2021

Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer

Xiaoyu Shen

191

13 Dec 2021

Extending AdamW by Leveraging Its Second Moment and Magnitude

Guoqiang Zhang

Niwa Kenta

W. Kleijn

178

09 Dec 2021

Towards Neural Functional Program Evaluation

09 Dec 2021

Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

...

141

01 Dec 2021

Less is More: Generating Grounded Navigation Instructions from Landmarks

440

25 Nov 2021

Combined Scaling for Zero-shot Transfer Learning

...

Mingxing Tan

390

229

19 Nov 2021

LiT: Zero-Shot Transfer with Locked-image text TuningComputer Vision and Pattern Recognition (CVPR), 2021

646

672

15 Nov 2021

Improving Large-scale Language Models and Resources for FilipinoInternational Conference on Language Resources and Evaluation (LREC), 2021

Jan Christian Blaise Cruz

C. Cheng

AI4CE

155

11 Nov 2021