Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

FiMMIA: scaling semantic perturbation-based membership inference across modalities

Anton A. Emelyanov

Sergei Kudriashov

Alena Fenogenova

142

02 Dec 2025

On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts

18 Nov 2025

AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate

Meng Zhu

Quan Xiao

Weidong Min

266

17 Nov 2025

Weight-sparse transformers have interpretable circuits

227

17 Nov 2025

High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

Aukosh Jagannath

Taj Jones-McCormick

Varnan Sarangian

126

06 Nov 2025

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

147

30 Oct 2025

What Really Matters in Matrix-Whitening Optimizers?

Kevin Frans

Pieter Abbeel

Sergey Levine

124

28 Oct 2025

MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task

116

28 Oct 2025

Large language model-based task planning for service robots: A review

204

27 Oct 2025

REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects

123

24 Oct 2025

Weight Decay may matter more than muP for Learning Rate Transfer in Practice

116

21 Oct 2025

MARS-M: When Variance Reduction Meets Matrices

Yifeng Liu

Angela Yuan

Q. Gu

224

20 Oct 2025

Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training

152

15 Oct 2025

LTR-ICD: A Learning-to-Rank Approach for Automatic ICD Coding

Mohammad Mansoori

Amira Soliman

Farzaneh Etminani

15 Oct 2025

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise

156

15 Oct 2025

AdaPM: a Partial Momentum Algorithm for LLM Training

Yimu Zhang

Yuanshi Liu

Cong Fang

146

10 Oct 2025

Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography

106

08 Oct 2025

Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization

Kristi Topollai

A. Choromańska

ODL

327

06 Oct 2025

QDeepGR4J: Quantile-based ensemble of deep learning and GR4J hybrid rainfall-runoff models for extreme flow prediction with uncertainty quantification

Arpit Kapoor

Rohitash Chandra

119

06 Oct 2025

Scalable In-context Ranking with Generative Models

231

06 Oct 2025

REG: A Regularization Optimizer for Robust Training Dynamics

108

04 Oct 2025

Conda: Column-Normalized Adam for Training Large Language Models Faster

241

29 Sep 2025

Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs

Shane Bergsma

Nolan Dey

Joel Hestness

162

29 Sep 2025

Scaling with Collapse: Efficient and Predictable Training of LLM Families

136

29 Sep 2025

Knowledge distillation through geometry-aware representational alignment

176

27 Sep 2025

Effective Quantization of Muon Optimizer States

139

27 Sep 2025

CoDA: Coding LM via Diffusion Adaptation

...

109

27 Sep 2025

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

189

26 Sep 2025

Understanding SOAP from the Perspective of Gradient Whitening

158

26 Sep 2025

CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure

160

23 Sep 2025

CorPipe at CRAC 2025: Evaluating Multilingual Encoders for Multilingual Coreference Resolution

Milan Straka

182

22 Sep 2025

Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules

Doğay Altınel

134

22 Sep 2025

Patent Language Model Pretraining with ModernBERT

Amirhossein Yousefiramandi

Ciaran Cooney

AILaw VLM

294

18 Sep 2025

You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models

17 Sep 2025

Fresh in memory: Training-order recency is linearly encoded in language model activations

Dmitrii Krasheninnikov

Richard E. Turner

David Krueger

MILM LLMSV

155

17 Sep 2025

Harnessing Optimization Dynamics for Curvature-Informed Model Merging

Pouria Mahdavinia

Hamed Mahdavi

Niloofar Mireshghallah

M. Mahdavi

MoMe

180

14 Sep 2025

Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora

Thales Sales Almeida

Rodrigo Nogueira

Hélio Pedrini

157

10 Sep 2025

X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs

Dazhi Peng

07 Sep 2025

Filling the Gap for Uzbek: Creating Translation Resources for Southern Uzbek

Mukhammadsaid Mamasaidov

Azizullah Aral

Abror Shopulatov

Mironshoh Inomjonov

20 Aug 2025

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches

Yishun Lu

Wesley Armour

ODL

361

19 Aug 2025

MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search

209

19 Aug 2025

Advancing Cross-lingual Aspect-Based Sentiment Analysis with LLMs and Constrained Decoding for Sequence-to-Sequence ModelsInternational Conference on Agents and Artificial Intelligence (ICAART), 2025

Jakub Šmíd

P. Pribán

Pavel Král

121

14 Aug 2025

Improving Generative Cross-lingual Aspect-Based Sentiment Analysis with Constrained Decoding

129

14 Aug 2025

Prompt-Based Approach for Czech Sentiment AnalysisRecent Advances in Natural Language Processing (RANLP), 2025

Jakub Šmíd

P. Pribán

115

12 Aug 2025

Czech Dataset for Complex Aspect-Based Sentiment Analysis TasksInternational Conference on Language Resources and Evaluation (LREC), 2025

156

11 Aug 2025

Few-shot Cross-lingual Aspect-Based Sentiment Analysis with Sequence-to-Sequence ModelsInternational Conference on Text, Speech and Dialogue (TSD), 2025

Jakub Šmíd

Pavel Přibáň

Pavel Král

124

11 Aug 2025

HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs

172

26 Jul 2025

DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD

140

23 Jul 2025

Apple Intelligence Foundation Language Models: Tech Report 2025

Ethan Li

Anders Boesen Lindbo Larsen

...

170

17 Jul 2025

Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models

369

14 Jul 2025