Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1804.04235
Cited By

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018

Noam M. Shazeer

ArXiv (abs)PDF HTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown

CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation

CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation

398

4

0

18 Feb 2025

We Can't Understand AI Using our Existing Vocabulary

We Can't Understand AI Using our Existing Vocabulary

325

14

0

11 Feb 2025

What makes a good feedforward computational graph?

What makes a good feedforward computational graph?

Alex Vitvitskyi

Petar Velickovic

367

6

0

10 Feb 2025

Memory-Efficient Fine-Tuning of Transformers via Token Selection

Memory-Efficient Fine-Tuning of Transformers via Token SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

Antoine Simoulin

427

6

0

31 Jan 2025

LiPO: Listwise Preference Optimization through Learning-to-Rank

LiPO: Listwise Preference Optimization through Learning-to-RankNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

...

Simon Baumgartner

605

85

0

28 Jan 2025

Celo: Training Versatile Learned Optimizers on a Compute Diet

Celo: Training Versatile Learned Optimizers on a Compute Diet

Guillaume Lajoie

Eugene Belilovsky

990

0

0

22 Jan 2025

A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science

A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science

Gongqingjian Jiang

375

0

0

21 Jan 2025

RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs

RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMsAAAI Conference on Artificial Intelligence (AAAI), 2024

Sushant Prakash

387

4

0

20 Jan 2025

Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision

Iterative Label Refinement Matters More than Preference Optimization under Weak SupervisionInternational Conference on Learning Representations (ICLR), 2025

Cassidy Laidlaw

Jacob Steinhardt

240

3

0

14 Jan 2025

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

342

0

0

13 Jan 2025

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM TrainingInternational Conference on Learning Representations (ICLR), 2025

394

15

0

12 Jan 2025

Dialectal and Low-Resource Machine Translation for Aromanian

Dialectal and Low-Resource Machine Translation for AromanianInternational Conference on Computational Linguistics (COLING), 2024

Alexandru-Iulius Jerpelea

Alina-Ştefania Rădoi

266

3

0

08 Jan 2025

Multi-task retriever fine-tuning for domain-specific and efficient RAG

Multi-task retriever fine-tuning for domain-specific and efficient RAG

Patrice Béchard

Orlando Marquez Ayala

269

0

0

08 Jan 2025

Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation

Michael Bendersky

277

20

0

07 Jan 2025

The interplay between domain specialization and model size

The interplay between domain specialization and model size

Roseval Malaquias Junior

Thales Sales Almeida

514

1

0

03 Jan 2025

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-TuningInternational Conference on Learning Representations (ICLR), 2024

Yehonathan Refael

Jonathan Svirsky

Ofir Lindenbaum

300

10

0

31 Dec 2024

Grams: Gradient Descent with Adaptive Momentum Scaling

Grams: Gradient Descent with Adaptive Momentum Scaling

501

5

0

22 Dec 2024

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Benjamin Warner

Antoine Chaffin

Benjamin Clavié

Oskar Hallström

...

457

389

0

18 Dec 2024

No More Adam: Learning Rate Scaling at Initialization is All You Need

No More Adam: Learning Rate Scaling at Initialization is All You Need

341

4

0

16 Dec 2024

Analyzing the Attention Heads for Pronoun Disambiguation in
Context-aware Machine Translation Models

Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models

Yusuf Can Semerci

Gerasimos Spanakis

275

1

0

15 Dec 2024

SMMF: Square-Matricized Momentum Factorization for Memory-Efficient
Optimization

SMMF: Square-Matricized Momentum Factorization for Memory-Efficient OptimizationAAAI Conference on Artificial Intelligence (AAAI), 2024

Kwangryeol Park

189

1

0

12 Dec 2024

Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax
Variance-Guided LLMs without Real Data Replay

Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data ReplayAAAI Conference on Artificial Intelligence (AAAI), 2024

240

4

0

10 Dec 2024

Visual Lexicon: Rich Image Features in Language Space

Visual Lexicon: Rich Image Features in Language SpaceComputer Vision and Pattern Recognition (CVPR), 2024

Cordelia Schmid

208

7

0

09 Dec 2024

SceneDiffuser: Efficient and Controllable Driving Simulation
Initialization and Rollout

SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and RolloutNeural Information Processing Systems (NeurIPS), 2024

Christopher Davis

...

Dragomir Anguelov

281

39

0

05 Dec 2024

SimuScope: Realistic Endoscopic Synthetic Dataset Generation through
Surgical Simulation and Diffusion Models

SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

Sabina Martyniak

Diego DallÁlba

Michał Naskręt

Przemysław Korzeniowski

366

6

0

03 Dec 2024

Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and
Annoy with Fine-Tuned Features

Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features

MD Shaikh Rahman

Syed Maudud E Rabbi

Muhammad Mahbubur Rashid

251

4

0

02 Dec 2024

COAP: Memory-Efficient Training with Correlation-Aware Gradient ProjectionComputer Vision and Pattern Recognition (CVPR), 2024

Bo Yuan

419

6

0

26 Nov 2024

Cautious Optimizers: Improving Training with One Line of Code

Cautious Optimizers: Improving Training with One Line of Code

711

21

0

25 Nov 2024

Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for
large-scale optimization

Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization

Corrado Coppola

403

0

0

24 Nov 2024

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Aleksandr Beznosikov

Samuel Horváth

304

4

0

12 Nov 2024

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

301

0

0

06 Nov 2024

Transfer Learning for Finetuning Large Language Models

Transfer Learning for Finetuning Large Language Models

Tobias Strangmann

Lennart Purucker

Katharina Eggensperger

227

4

0

02 Nov 2024

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

Apoorv Khandelwal

Stephen H. Bach

272

6

0

30 Oct 2024

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
of Neural Networks

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

225

4

0

28 Oct 2024

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 TrainingInternational Conference on Learning Representations (ICLR), 2024

494

18

0

25 Oct 2024

Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in
Low-Resource Code

Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code

Jianshu Zhang

162

2

0

24 Oct 2024

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging
Small LMs

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Veeranjaneyulu Sadhanala

Afshin Rostamizadeh

Ayan Chakrabarti

Wittawat Jitkrittum

...

Rakesh Shivanna

Sashank J. Reddi

Sanjiv Kumar

465

10

0

24 Oct 2024

Scalable Influence and Fact Tracing for Large Language Model Pretraining

Scalable Influence and Fact Tracing for Large Language Model PretrainingInternational Conference on Learning Representations (ICLR), 2024

Dheeraj Rajagopal

Tolga Bolukbasi

307

16

0

22 Oct 2024

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
Contrastive Loss

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Zhiqiang Hu

160

3

0

22 Oct 2024

MiniPLM: Knowledge Distillation for Pre-Training Language Models

MiniPLM: Knowledge Distillation for Pre-Training Language ModelsInternational Conference on Learning Representations (ICLR), 2024

461

16

0

22 Oct 2024

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

LDAdam: Adaptive Optimization from Low-Dimensional Gradient StatisticsInternational Conference on Learning Representations (ICLR), 2024

Ionut-Vlad Modoranu

457

21

0

21 Oct 2024

TIPS: Text-Image Pretraining with Spatial awareness

TIPS: Text-Image Pretraining with Spatial awarenessInternational Conference on Learning Representations (ICLR), 2024

Kevis-Kokitsi Maninis

...

Dan Gnanapragasam

Mojtaba Seyedhosseini

Andre Araujo

443

18

0

21 Oct 2024

VidPanos: Generative Panoramic Videos from Casual Panning Videos

VidPanos: Generative Panoramic Videos from Casual Panning VideosACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024

Aleksander Holynski

Brian L. Curless

Michael Rubinstein

229

7

0

17 Oct 2024

Learning to Predict Usage Options of Product Reviews with LLM-Generated
Labels

Learning to Predict Usage Options of Product Reviews with LLM-Generated Labels

Frederic Sadrieh

Matthis Clausen

Konstantin Ketterer

Avetis Navasardyan

Tamara Czinczoll

103

1

0

16 Oct 2024

Model Balancing Helps Low-data Training and Fine-tuning

Model Balancing Helps Low-data Training and Fine-tuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

226

9

0

16 Oct 2024

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks
in English

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

T. Y. S. S. Santosh

Cornelius Weiss

Matthias Grabmair

460

9

0

12 Oct 2024

Parameter-Efficient Fine-Tuning of Large Language Models using Semantic
Knowledge Tuning

Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge TuningScientific Reports (Sci Rep), 2024

Nusrat Jahan Prottasha

Md. Shohanur Islam Sobuj

Niloofar Yousefi

299

19

0

11 Oct 2024

CursorCore: Assist Programming through Aligning Anything

CursorCore: Assist Programming through Aligning Anything

378

2

0

09 Oct 2024

Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation
Learning

Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Zedong Wang

Luyuan Zhang

Zicheng Liu

Yang Liu

Baigui Sun

Stan Z. Li

232

2

0

08 Oct 2024

A second-order-like optimizer with adaptive gradient scaling for deep
learning

A second-order-like optimizer with adaptive gradient scaling for deep learning

Edouard Pauwels

208

0

0

08 Oct 2024

1 2 3 4 5 6...14 15 16