Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

A Spectral Condition for Feature Learning

Greg Yang

James B. Simon

Jeremy Bernstein

337

26 Oct 2023

XFEVER: Exploring Fact Verification across LanguagesTaiwan Conference on Computational Linguistics and Speech Processing (TCLSP), 2023

106

25 Oct 2023

Large Language Models are Visual Reasoning CoordinatorsNeural Information Processing Systems (NeurIPS), 2023

Bo Li

Ziwei Liu

276

23 Oct 2023

Implicit meta-learning may lead language models to trust more reliable sourcesInternational Conference on Machine Learning (ICML), 2023

Dmitrii Krasheninnikov

516

23 Oct 2023

$Once Upon a $\textit{Time}$ in $\textit{Graph}$: Relative-Time Pretraining for Complex Temporal Reasoning$

Once Upon a

\textit{Time}

\textit{Graph}

: Relative-Time Pretraining for Complex Temporal ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xin Li

197

23 Oct 2023

Benchmarking and Improving Text-to-SQL Generation under Ambiguity

316

20 Oct 2023

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Shuohang Wang

Yang Liu

161

19 Oct 2023

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

Yaqing Wang

Jialin Wu

...

182

18 Oct 2023

Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Timothee Mickus

Elaine Zosa

Denis Paperno

167

18 Oct 2023

DemoSG: Demonstration-enhanced Schema-guided Generation for Low-resource Event ExtractionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Si Li

239

16 Oct 2023

AdaLomo: Low-memory Optimization with Adaptive Learning RateAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Kai Lv

Hang Yan

Qipeng Guo

Haijun Lv

Xipeng Qiu

ODL

312

16 Oct 2023

DPZero: Private Fine-Tuning of Language Models without Backpropagation

450

14 Oct 2023

DistillSpec: Improving Speculative Decoding via Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2023

Sanjiv Kumar

266

123

12 Oct 2023

MatFormer: Nested Transformer for Elastic InferenceNeural Information Processing Systems (NeurIPS), 2023

Tim Dettmers

...

256

11 Oct 2023

QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources

275

11 Oct 2023

Guiding Language Model Math Reasoning with Planning Tokens

285

09 Oct 2023

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel DecodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

273

09 Oct 2023

Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic ParsingNeural Information Processing Systems (NeurIPS), 2023

226

07 Oct 2023

Module-wise Adaptive Distillation for Multimodality Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023

Ming-Hsuan Yang

190

06 Oct 2023

Leveraging Unpaired Data for Vision-Language Generative Models via Cycle ConsistencyInternational Conference on Learning Representations (ICLR), 2023

Guillaume Lajoie

272

05 Oct 2023

Learning to Rewrite Prompts for Personalized Text GenerationThe Web Conference (WWW), 2023

313

29 Sep 2023

Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023

Albert Mohwald

250

28 Sep 2023

Small-scale proxies for large-scale Transformer training instabilitiesInternational Conference on Learning Representations (ICLR), 2023

...

Jascha Narain Sohl-Dickstein

Kelvin Xu

Jaehoon Lee

Justin Gilmer

Simon Kornblith

319

135

25 Sep 2023

Massive End-to-end Models for Short Search Queries

...

174

22 Sep 2023

AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in TransformerPeerJ Computer Science (PeerJ Comput. Sci.), 2023

Leixin Yang

Yu Xiang

392

22 Sep 2023

A Family of Pretrained Transformer Language Models for RussianInternational Conference on Language Resources and Evaluation (LREC), 2023

...

Alena Fenogenova

318

19 Sep 2023

Few-Shot Adaptation for Parsing Contextual Utterances with LLMsInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Kevin Lin

Patrick Xia

Hao Fang

195

18 Sep 2023

Scaling Laws for Sparsely-Connected Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023

Dan Alistarh

296

15 Sep 2023

Reward Engineering for Generating Semi-structured ExplanationFindings (Findings), 2023

Jiuzhou Han

Wray Buntine

Ehsan Shareghi

LRM

155

15 Sep 2023

Self-Consistent Narrative Prompts on Abductive Natural Language InferenceInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Xin Liu

149

15 Sep 2023

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

249

14 Sep 2023

Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on TurkishInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Arda Uzunouglu

Gözde Gül Sahin

220

13 Sep 2023

Statistical Rejection Sampling Improves Preference OptimizationInternational Conference on Learning Representations (ICLR), 2023

Tianqi Liu

Yao-Min Zhao

Rishabh Joshi

Misha Khalman

Mohammad Saleh

Peter J. Liu

Jialu Liu

319

318

13 Sep 2023

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

244

12 Sep 2023

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction TuningInternational Conference on Learning Representations (ICLR), 2023

253

138

11 Sep 2023

Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine TranslationIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023

310

06 Sep 2023

Memory Efficient Optimizers with 4-bit StatesNeural Information Processing Systems (NeurIPS), 2023

Bingrui Li

Jianfei Chen

Jun Zhu

337

04 Sep 2023

RSDiff: Remote Sensing Image Generation from Text Using Diffusion Model

A. Sebaq

Mohamed ElHelw

DiffM

289

03 Sep 2023

Benchmarking the Generation of Fact Checking ExplanationsTransactions of the Association for Computational Linguistics (TACL), 2023

Daniel Russo

Serra Sinem Tekiroğlu

Marco Guerini

159

29 Aug 2023

MEMORY-VQ: Compression for Tractable Internet-Scale MemoryNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Santiago Ontañón

Sumit Sanghai

Joshua Ainslie

RALM MQ

191

28 Aug 2023

Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph LevelConference on Machine Translation (WMT), 2023

310

25 Aug 2023

Towards an On-device Agent for Text Rewriting

195

22 Aug 2023

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language ModelsACM Transactions on Software Engineering and Methodology (TOSEM), 2023

248

21 Aug 2023

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and RecognitionInterspeech (Interspeech), 2023

192

21 Aug 2023

A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and LanguagesFindings (Findings), 2023

Nikita Martynov

Mark Baushenko

Anastasia Kozlova

Katerina Kolomeytseva

Aleksandr Abramov

Alena Fenogenova

201

18 Aug 2023

Teach LLMs to Personalize -- An Approach inspired by Writing Education

Cheng Li

Mingyang Zhang

Qiaozhu Mei

Yaqing Wang

Spurthi Amba Hombaiah

Yi Liang

Michael Bendersky

AI4Ed

236

15 Aug 2023

Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models

Michael Backes

249

15 Aug 2023

You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic ContentIEEE Symposium on Security and Privacy (IEEE S&P), 2023

171

10 Aug 2023

KITLM: Domain-Specific Knowledge InTegration into Language Models for Question AnsweringICON (ICON), 2023

114

07 Aug 2023

PromptSum: Parameter-Efficient Controllable Abstractive Summarization

Hailin Chen

Nancy Chen

175

06 Aug 2023