Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Pierre Harvey Richemond

Daniele Calandriello

...

Rishabh Joshi

Bilal Piot

239

29 May 2024

4-bit Shampoo for Memory-Efficient Network Training

473

28 May 2024

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

264

28 May 2024

LoQT: Low Rank Adapters for Quantized Training

Vésteinn Snæbjarnarson

237

26 May 2024

AdaFisher: Adaptive Second Order Optimization via Fisher Information

630

26 May 2024

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

...

251

25 May 2024

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

Nolan Dey

Shane Bergsma

Joel Hestness

257

24 May 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Dan Alistarh

204

24 May 2024

Surge Phenomenon in Optimal Learning Rate and Batch Size ScalingNeural Information Processing Systems (NeurIPS), 2024

Xingwu Sun

...

293

23 May 2024

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Ibrahim Alabdulmohsin

VLM

286

22 May 2024

FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information

Dongseong Hwang

ODL

654

21 May 2024

Prompting-based Synthetic Data Generation for Few-Shot Question AnsweringInternational Conference on Language Resources and Evaluation (LREC), 2024

225

15 May 2024

DEPTH: Discourse Education through Pre-Training Hierarchically

326

13 May 2024

Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility MaximizationAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024

Hamed Zamani

Michael Bendersky

355

05 May 2024

Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts

268

02 May 2024

RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization

Dongqi Pu

Vera Demberg

364

01 May 2024

Empowering Large Language Models for Textual Data Augmentation

269

26 Apr 2024

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

250

18 Apr 2024

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

...

Tsendsuren Munkhdalai

Angad Chandorkar

Rohit Prabhavalkar

316

15 Apr 2024

Impact of Preference Noise on the Alignment Performance of Generative Language Models

Yang Gao

Dana Alon

Donald Metzler

352

15 Apr 2024

TransformerFAM: Feedback attention is working memory

422

14 Apr 2024

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Zichao Li

Cihang Xie

E. D. Cubuk

CLIP

220

12 Apr 2024

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Tsendsuren Munkhdalai

Manaal Faruqui

Siddharth Gopal

LRM LLMAG CLL

309

170

10 Apr 2024

Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint EvolutionAnnual Conference on Genetic and Evolutionary Computation (GECCO), 2024

Brandon Morgan

Dean Frederick Hougen

ODL

261

10 Apr 2024

Privacy Preserving Prompt Engineering: A Survey

Kennedy Edemacu

Xintao Wu

386

09 Apr 2024

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data

240

08 Apr 2024

$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$

Implicit Bias of AdamW:

\ell_\infty

Norm Constrained Optimization

Shuo Xie

Zhiyuan Li

OffRL

268

05 Apr 2024

Training LLMs over Neurally Compressed Text

Jascha Narain Sohl-Dickstein

Noah Constant

215

04 Apr 2024

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Siyuan Qiao

307

28 Mar 2024

Juru: Legal Brazilian Large Language Model from Reputable Sources

Roseval Malaquias Junior

245

26 Mar 2024

A Hybrid Approach To Aspect Based Sentiment Analysis Using Transfer Learning

142

25 Mar 2024

Understanding Emergent Abilities of Language Models from the Loss PerspectiveNeural Information Processing Systems (NeurIPS), 2024

Yuxiao Dong

416

23 Mar 2024

Adapprox: Adaptive Approximation in Adam Optimization via Randomized Low-Rank Matrices

Stephan Ludger Kölker

Zhefeng Wang

Xiaoming Yuan

186

22 Mar 2024

Partitioned Neural Network Training via Synthetic Intermediate Labels

C. V. Karadag

Nezih Topaloglu

251

17 Mar 2024

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

...

251

15 Mar 2024

Frozen Feature Augmentation for Few-Shot Image Classification

285

15 Mar 2024

Human Alignment of Large Language Models through Online Preference OptimisationInternational Conference on Machine Learning (ICML), 2024

Daniele Calandriello

Mark Rowland

...

Rishabh Joshi

Bilal Piot

277

13 Mar 2024

Low-Resource Court Judgment Summarization for Common Law Systems

169

07 Mar 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Yuandong Tian

418

339

06 Mar 2024

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction

253

04 Mar 2024

Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem

Samuel J. K. Chin

Matthias Winkenbach

Akash Srivastava

190

28 Feb 2024

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

289

233

27 Feb 2024

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

214

27 Feb 2024

Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

690

23 Feb 2024

Can Language Models Act as Knowledge Bases at Scale?

257

22 Feb 2024

FLAME: Self-Supervised Low-Resource Taxonomy Expansion using Large Language Models

Sahil Mishra

Ujjwal Sudev

Tanmoy Chakraborty

131

21 Feb 2024

VideoPrism: A Foundational Visual Encoder for Video Understanding

...

392

20 Feb 2024

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

...

Niklas Muennighoff

246

328

12 Feb 2024

Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and HindiFindings (Findings), 2024

157

11 Feb 2024

Efficient Stagewise Pretraining via Progressive Subnetworks

Sanjiv Kumar

184

08 Feb 2024