Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

The Marginal Value of Momentum for Small Learning Rate SGDInternational Conference on Learning Representations (ICLR), 2023

Tianhao Wang

234

27 Jul 2023

f-Divergence Minimization for Sequence-Level Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

280

27 Jul 2023

Towards Generalist Biomedical AI

...

Yossi Matias

K. Singhal

Peter R. Florence

Alan Karthikesalingam

Vivek Natarajan

LM&MA MedIm AI4MH

279

410

26 Jul 2023

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language ModelsNeural Information Processing Systems (NeurIPS), 2023

429

12 Jul 2023

RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2023

Sihan Song

Jian Zhao

195

11 Jul 2023

Event Extraction as Question Generation and AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

205

10 Jul 2023

Scaling In-Context Demonstrations with Structured Attention

Tianle Cai

Kaixuan Huang

Jason D. Lee

Mengdi Wang

LRM

166

05 Jul 2023

CAME: Confidence-guided Adaptive Memory Efficient OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xin Jiang

Yang You

ODL

345

05 Jul 2023

Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start RecommendationsThe Web Conference (WWW), 2023

Wenlin Yao

Ninghao Liu

293

29 Jun 2023

YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel CorpusNeural Information Processing Systems (NeurIPS), 2023

277

27 Jun 2023

Is Pre-training Truly Better Than Meta-Learning?

276

24 Jun 2023

On-Policy Distillation of Language Models: Learning from Self-Generated MistakesInternational Conference on Learning Representations (ICLR), 2023

Nino Vieillard

Olivier Bachem

319

183

23 Jun 2023

A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision

K. Yuksel

Thiago Castro Ferreira

Ahmet Gunduz

Mohamed Al-Badrashiny

Golara Javadi

129

21 Jun 2023

NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive LearningInterspeech (Interspeech), 2023

K. Yuksel

Thiago Castro Ferreira

Golara Javadi

Mohamed El-Badrashiny

Ahmet Gunduz

152

21 Jun 2023

GLIMMER: generalized late-interaction memory reranker

Sumit Sanghai

Joshua Ainslie

232

17 Jun 2023

Conformal Language ModelingInternational Conference on Learning Representations (ICLR), 2023

574

16 Jun 2023

Scaling Open-Vocabulary Object DetectionNeural Information Processing Systems (NeurIPS), 2023

423

315

16 Jun 2023

Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant

Xianbiao Qi

Jianan Wang

Lei Zhang

202

15 Jun 2023

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation

217

15 Jun 2023

AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks

...

Daphne Theodorakopoulos

Tanja Tornede

Henning Wachsmuth

Marius Lindauer

325

13 Jun 2023

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

...

140

11 Jun 2023

PoET: A generative model of protein families as sequences-of-sequencesNeural Information Processing Systems (NeurIPS), 2023

Timothy F. Truong

Tristan Bepler

SLR

211

09 Jun 2023

Leaping through tree space: continuous phylogenetic inference for rooted and unrooted treesGenome Biology and Evolution (GBE), 2023

302

09 Jun 2023

Unbalanced Optimal Transport for Unbalanced Word AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Yuki Arase

Han Bao

Sho Yokoi

140

07 Jun 2023

Click: Controllable Text Generation with Sequence Likelihood Contrastive LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

240

06 Jun 2023

LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative FusionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Dongfu Jiang

Xiang Ren

Bill Yuchen Lin

ELM

445

487

05 Jun 2023

SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models with Same Tower NegativesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Fedor Moiseev

Gustavo Hernández Ábrego

192

05 Jun 2023

Harnessing large-language models to generate private synthetic text

290

02 Jun 2023

THiFLY Research at SemEval-2023 Task 7: A Multi-granularity System for CTR-based Textual Entailment and Evidence RetrievalInternational Workshop on Semantic Evaluation (SemEval), 2023

Ji Wu

137

02 Jun 2023

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User InterfacesNeural Information Processing Systems (NeurIPS), 2023

269

31 May 2023

Toward Understanding Why Adam Converges Faster Than SGD for Transformers

Yan Pan

Yuanzhi Li

293

31 May 2023

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

...

Olivier Bachem

Olivier Pietquin

289

100

31 May 2023

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN TrainingEuropean Conference on Artificial Intelligence (ECAI), 2023

Fan Yang

118

31 May 2023

Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding

Parker Glenn

Parag Dakle

Preethi Raghavan

197

31 May 2023

Comparing and combining some popular NER approaches on Biomedical tasksWorkshop on Biomedical Natural Language Processing (BioNLP), 2023

Harsh Verma

S. Bergler

Narjes Tahaei

196

30 May 2023

Brainformers: Trading Simplicity for EfficiencyInternational Conference on Machine Learning (ICML), 2023

...

247

29 May 2023

Federated Learning for Semantic Parsing: Task Formulation, Evaluation Setup, New AlgorithmsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Huan Sun

117

26 May 2023

Diable: Efficient Dialogue State Tracking as Operations on TablesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

304

26 May 2023

Three Towers: Flexible Contrastive Learning with Pretrained Image ModelsNeural Information Processing Systems (NeurIPS), 2023

212

26 May 2023

Learning to Imagine: Visually-Augmented Natural Language GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

427

26 May 2023

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

239

26 May 2023

Incorporating Distributions of Discourse Structure for Long Document Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Dongqi Pu

Yifa Wang

Vera Demberg

224

26 May 2023

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer TransformerNeural Information Processing Systems (NeurIPS), 2023

493

100

25 May 2023

SING: A Plug-and-Play DNN Learning Technique

162

25 May 2023

RewriteLM: An Instruction-Tuned Large Language Model for Text RewritingAAAI Conference on Artificial Intelligence (AAAI), 2023

273

25 May 2023

Lexinvariant Language ModelsNeural Information Processing Systems (NeurIPS), 2023

176

24 May 2023

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic ParsingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

200

24 May 2023

A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

280

24 May 2023

Active Learning for Natural Language GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yotam Perlitz

Ariel Gera

Michal Shmueli-Scheuer

D. Sheinwald

Noam Slonim

L. Ein-Dor

334

24 May 2023

Text encoders bottleneck compositionality in contrastive vision-language modelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

273

24 May 2023