Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

Generating abstractive summaries of Lithuanian news articles using a transformer modelInternational Conference on Information and Software Technologies (ICIST), 2021

Lukas Stankevicius

M. Lukoševičius

127

23 Apr 2021

The Power of Scale for Parameter-Efficient Prompt TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

1.4K

4,984

18 Apr 2021

DEUX: An Attribute-Guided Framework for Sociable Recommendation Dialog Systems

Yu Li

Shirley Anugrah Hayati

Weiyan Shi

Zhou Yu

196

16 Apr 2021

Comparison of Grammatical Error Correction Using Back-Translation ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

112

16 Apr 2021

Planning with Learned Entity Prompts for Abstractive SummarizationTransactions of the Association for Computational Linguistics (TACL), 2021

278

130

15 Apr 2021

Pushing the Limits of Non-Autoregressive Speech RecognitionInterspeech (Interspeech), 2021

262

07 Apr 2021

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

360

147

05 Apr 2021

Efficient Attentions for Long Document SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

L. Huang

Shuyang Cao

Nikolaus Nova Parulian

Heng Ji

Lu Wang

330

356

05 Apr 2021

Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

262

01 Mar 2021

Do Transformer Modifications Transfer Across Implementations and Applications?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Sharan Narang

...

215

134

23 Feb 2021

WebRED: Effective Pretraining And Finetuning For Relation Extraction On The Web

Róbert Ormándi

Mohammad Saleh

Erin Winter

Vinay Rao

103

18 Feb 2021

Civil Rephrases Of Toxic Texts With Self-Supervised TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2021

Leo Laugier

John Pavlopoulos

Jeffrey Scott Sorensen

Lucas Dixon

265

01 Feb 2021

Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning

Dariush Bahrami

Sadegh Pouriyan Zadeh

ODL

22 Jan 2021

Analyzing Commonsense Emergence in Few-shot Knowledge ModelsConference on Automated Knowledge Base Construction (AKBC), 2021

Yejin Choi

470

01 Jan 2021

Studying Strategically: Learning to Mask for Closed-book QA

Sinong Wang

Hao Ma

Xiang Ren

Madian Khabsa

OffRL

265

31 Dec 2020

Promoting Graph Awareness in Linearized Graph-to-Text GenerationFindings (Findings), 2020

Alexander Miserlis Hoyle

Ana Marasović

Noah A. Smith

AI4CE

169

31 Dec 2020

AraGPT2: Pre-Trained Transformer for Arabic Language GenerationWorkshop on Arabic Natural Language Processing (WANLP), 2020

283

131

31 Dec 2020

Few-Shot Text Generation with Pattern-Exploiting TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Timo Schick

Hinrich Schütze

282

155

22 Dec 2020

Contrastive Learning with Adversarial Perturbations for Conditional Text GenerationInternational Conference on Learning Representations (ICLR), 2020

Seanie Lee

Dong Bok Lee

Sung Ju Hwang

545

117

14 Dec 2020

Collaborative Storytelling with Large-scale Neural Language ModelsMotion in Games (MIG), 2020

Eric Nichols

Leo Gao

R. Gomez

177

20 Nov 2020

Whale: Efficient Giant Model Training over Heterogeneous GPUsUSENIX Annual Technical Conference (USENIX ATC), 2020

Ziji Shi

...

Lan-yue Chen

Yong Li

Zhen Zheng

Xiaoyong Liu

Wei Lin

274

18 Nov 2020

Stochastic Optimization with Laggard Data PipelinesNeural Information Processing Systems (NeurIPS), 2020

26 Oct 2020

GO FIGURE: A Meta Evaluation of Factuality in SummarizationFindings (Findings), 2020

Yejin Choi

544

105

24 Oct 2020

Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

357

22 Oct 2020

CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

Ivana Kvapilíková

Tom Kocmi

Ondrej Bojar

22 Oct 2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

577

327

20 Oct 2020

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient DescentConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

598

19 Oct 2020

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Brett Daley

Chris Amato

ODL

138

03 Oct 2020

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

Jascha Narain Sohl-Dickstein

295

23 Sep 2020

Seq2Edits: Sequence Transduction Using Span-level Edit OperationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Felix Stahlberg

Shankar Kumar

BDL

196

23 Sep 2020

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

Diedre Carmo

130

20 Aug 2020

Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization

Jascha Narain Sohl-Dickstein

450

17 Aug 2020

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

Sai Praneeth Karimireddy

548

236

08 Aug 2020

Data Weighted Training Strategies for Grammatical Error CorrectionTransactions of the Association for Computational Linguistics (TACL), 2020

Jared Lichtarge

Chris Alberti

Shankar Kumar

237

07 Aug 2020

A Comparison of Optimization Algorithms for Deep LearningInternational journal of pattern recognition and artificial intelligence (IJPRAI), 2020

Derya Soydaner

215

187

28 Jul 2020

Binary Search and First Order Gradient Based Method for Stochastic Optimization

V. Pandey

ODL

119

27 Jul 2020

Improving compute efficacy frontiers with SliceOut

153

21 Jul 2020

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

186

12 Jul 2020

Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers

804

186

03 Jul 2020

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

393

1,635

30 Jun 2020

SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Yao-Min Zhao

Mohammad Saleh

Peter J. Liu

RALM

166

18 Jun 2020

Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs

Martin Schmitt

Leonardo F. R. Ribeiro

231

16 Jun 2020

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine LearningAAAI Conference on Artificial Intelligence (AAAI), 2020

452

333

01 Jun 2020

WT5?! Training Text-to-Text Models to Explain their Predictions

Sharan Narang

209

213

30 Apr 2020

Recipes for building an open-domain chatbotConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020

...

Jason Weston

530

1,085

28 Apr 2020

Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training

201

28 Apr 2020

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

Ping Luo

209

21 Apr 2020

TuringAdvice: A Generative and Dynamic Evaluation of Language Use

Yejin Choi

233

07 Apr 2020

Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020

968

686

12 Mar 2020

Talking-Heads Attention

268

05 Mar 2020