Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

Efficient Stagewise Pretraining via Progressive Subnetworks

Sanjiv Kumar

184

08 Feb 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Daniele Calandriello

Pierre Harvey Richemond

Michal Valko

Bernardo Avila-Pires

Bilal Piot

268

143

08 Feb 2024

InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

334

08 Feb 2024

Direct Language Model Alignment from Online AI Feedback

Misha Khalman

...

Bilal Piot

259

211

07 Feb 2024

Flora: Low-Rank Adapters Are Secretly Gradient CompressorsInternational Conference on Machine Learning (ICML), 2024

Yongchang Hao

Yanshuai Cao

Lili Mou

294

05 Feb 2024

Fractal Patterns May Illuminate the Success of Next-Token Prediction

Ibrahim Alabdulmohsin

Vinh Q. Tran

Mostafa Dehghani

172

02 Feb 2024

SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization

Dong Yu

218

31 Jan 2024

TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

268

30 Jan 2024

Unlearning Traces the Influential Training Data of Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Masaru Isonuma

Ivan Titov

380

26 Jan 2024

HiFT: A Hierarchical Full Parameter Fine-Tuning StrategyConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Shi Feng

303

26 Jan 2024

LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents

Ahmed Masry

Amir Hajian

145

26 Jan 2024

TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Gokcce Uludougan

Zeynep Yirmibecsouglu Balal

214

25 Jan 2024

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

Sanjiv Kumar

198

24 Jan 2024

Lumiere: A Space-Time Diffusion Model for Video GenerationACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024

Charles Herrmann

...

403

383

23 Jan 2024

WARM: On the Benefits of Weight Averaged Reward ModelsInternational Conference on Machine Learning (ICML), 2024

Nino Vieillard

Olivier Bachem

356

130

22 Jan 2024

Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

161

18 Jan 2024

Large Language Models for Scientific Information Extraction: An Empirical Study for Virology

Mahsa Shamsabadi

Jennifer D'Souza

Sören Auer

309

18 Jan 2024

On the importance of Data Scale in Pretraining Arabic Language Models

139

15 Jan 2024

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

Damjan Kalajdzievski

CLL

253

11 Jan 2024

Instruct-Imagen: Image Generation with Multi-modal InstructionComputer Vision and Pattern Recognition (CVPR), 2024

...

248

03 Jan 2024

To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human TranslationTransactions of the Association for Computational Linguistics (TACL), 2024

Jiaming Luo

Colin Cherry

George F. Foster

185

02 Jan 2024

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

282

271

28 Dec 2023

Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion

Katrin Tomanek

Shanqing Cai

Subhashini Venugopalan

21 Dec 2023

Decoupling SQL Query Hardness Parsing for Text-to-SQL

J. Yi

Guo Chen

269

11 Dec 2023

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Justin Gilmer

280

11 Dec 2023

Domain Adaptation of a State of the Art Text-to-SQL Model: Lessons Learned and Challenges Found

197

09 Dec 2023

Magicoder: Empowering Code Generation with OSS-InstructInternational Conference on Machine Learning (ICML), 2023

308

196

04 Dec 2023

A Machine Learning Approach Towards SKILL Code Autocompletion

Enrique Dehaerne

Bappaditya Dey

Wannes Meert

195

04 Dec 2023

Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

Shanqing Cai

Subhashini Venugopalan

...

249

03 Dec 2023

RLHF and IIA: Perverse Incentives

237

02 Dec 2023

Meta-learning Optimizers for Communication-Efficient Learning

Charles-Étienne Joseph

388

02 Dec 2023

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

Tianyi Chen

397

01 Dec 2023

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Damjan Kalajdzievski

ALM

269

170

28 Nov 2023

Who is leading in AI? An analysis of industry AI research

Ben Cottier

T. Besiroglu

David Owen

317

24 Nov 2023

Locally Optimal Descent for Dynamic Stepsize SchedulingInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

258

23 Nov 2023

Diffusion Model Alignment Using Direct Preference OptimizationComputer Vision and Pattern Recognition (CVPR), 2023

449

516

21 Nov 2023

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition

188

18 Nov 2023

Countering Misinformation via Emotional Response Generation

Daniel Russo

Shane P. Kaszefski-Yaschuk

Jacopo Staiano

Marco Guerini

OffRL

231

17 Nov 2023

A Computationally Efficient Sparsified Online Newton Method

Inderjit Dhillon

193

16 Nov 2023

Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning

Kazuma Hashimoto

K. Raman

Michael Bendersky

371

16 Nov 2023

Efficient End-to-End Visual Document Understanding with Rationale Distillation

Robin Jia

152

16 Nov 2023

GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks

231

16 Nov 2023

SiRA: Sparse Mixture of Low Rank Adaptation

Xinyi Wang

...

235

15 Nov 2023

Argumentation Element Annotation Modeling using XLNet

Christopher M. Ormerod

Amy Burkhardt

Mackenzie Young

Susan Lottridge

125

10 Nov 2023

SEMQA: Semi-Extractive Multi-Source Question Answering

249

08 Nov 2023

Making Harmful Behaviors Unlearnable for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xuanjing Huang

168

02 Nov 2023

Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity TypingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yanlin Feng

Adithya Pratapa

David R. Mortensen

283

01 Nov 2023

De-Diffusion Makes Text a Strong Cross-Modal InterfaceComputer Vision and Pattern Recognition (CVPR), 2023

Siyuan Qiao

273

01 Nov 2023

HARE: Explainable Hate Speech Detection with Step-by-Step ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

315

01 Nov 2023

Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering

Zhen Guo

Yining Hua

LM&MA CLL ALM AI4MH

175

01 Nov 2023