ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04235
  4. Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018
Noam M. Shazeer
Mitchell Stern
    ODL
ArXiv (abs)PDFHTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown
FiMMIA: scaling semantic perturbation-based membership inference across modalities
FiMMIA: scaling semantic perturbation-based membership inference across modalities
Anton A. Emelyanov
Sergei Kudriashov
Alena Fenogenova
142
0
0
02 Dec 2025
On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts
On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts
Kashaf Gulzar
Dominik Wagner
Sebastian P. Bayerl
Florian Honig
Tobias Bocklet
Korbinian Riedhammer
63
0
0
18 Nov 2025
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
Meng Zhu
Quan Xiao
Weidong Min
266
0
0
17 Nov 2025
Weight-sparse transformers have interpretable circuits
Weight-sparse transformers have interpretable circuits
Leo Gao
Achyuta Rajaram
Jacob Coxon
Soham V. Govande
Bowen Baker
Dan Mossing
MILM
227
5
0
17 Nov 2025
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Aukosh Jagannath
Taj Jones-McCormick
Varnan Sarangian
126
1
0
06 Nov 2025
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Biao Zhang
Yong Cheng
Siamak Shakeri
Xinyi Wang
Min Ma
Orhan Firat
147
1
0
30 Oct 2025
What Really Matters in Matrix-Whitening Optimizers?
What Really Matters in Matrix-Whitening Optimizers?
Kevin Frans
Pieter Abbeel
Sergey Levine
124
2
0
28 Oct 2025
MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
Juraj Juraska
Tobias Domhan
M. Finkelstein
Tetsuji Nakagawa
Geza Kovacs
Daniel Deutsch
Pidong Wang
Markus Freitag
116
3
0
28 Oct 2025
Large language model-based task planning for service robots: A review
Large language model-based task planning for service robots: A review
Shaohan Bian
Ying Zhang
Guohui Tian
Zhiqiang Miao
Edmond Q. Wu
Simon X. Yang
C. Hua
LLMAGLM&Ro
204
0
0
27 Oct 2025
REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
Yassine El Ouahidi
Jonathan Lys
Philipp Tholke
Nicolas Farrugia
Bastien Pasdeloup
Vincent Gripon
Karim Jerbi
G. Lioi
AI4TSVLM
123
1
0
24 Oct 2025
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
Atli Kosson
Jeremy Welborn
Yang Liu
Martin Jaggi
Xi Chen
116
4
0
21 Oct 2025
MARS-M: When Variance Reduction Meets Matrices
MARS-M: When Variance Reduction Meets Matrices
Yifeng Liu
Angela Yuan
Q. Gu
224
1
0
20 Oct 2025
Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training
Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training
Jie Hao
Xiaochuan Gong
Jie Xu
Z. Wang
Mingrui Liu
AI4CE
152
0
0
15 Oct 2025
LTR-ICD: A Learning-to-Rank Approach for Automatic ICD Coding
LTR-ICD: A Learning-to-Rank Approach for Automatic ICD Coding
Mohammad Mansoori
Amira Soliman
Farzaneh Etminani
82
0
0
15 Oct 2025
Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
Bingbin Liu
Rachit Bansal
Depen Morwani
Nikhil Vyas
David Alvarez-Melis
Sham Kakade
156
2
0
15 Oct 2025
AdaPM: a Partial Momentum Algorithm for LLM Training
AdaPM: a Partial Momentum Algorithm for LLM Training
Yimu Zhang
Yuanshi Liu
Cong Fang
146
0
0
10 Oct 2025
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Jiuan Zhou
Yu Cheng
Yuan Xie
Z. Yin
106
3
0
08 Oct 2025
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
Kristi Topollai
A. Choromańska
ODL
327
0
0
06 Oct 2025
QDeepGR4J: Quantile-based ensemble of deep learning and GR4J hybrid rainfall-runoff models for extreme flow prediction with uncertainty quantification
QDeepGR4J: Quantile-based ensemble of deep learning and GR4J hybrid rainfall-runoff models for extreme flow prediction with uncertainty quantification
Arpit Kapoor
Rohitash Chandra
119
2
0
06 Oct 2025
Scalable In-context Ranking with Generative Models
Scalable In-context Ranking with Generative Models
Nilesh Gupta
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Inderjit Dhillon
Felix X. Yu
231
2
0
06 Oct 2025
REG: A Regularization Optimizer for Robust Training Dynamics
REG: A Regularization Optimizer for Robust Training Dynamics
Zehua Liu
Han Wu
Xiaojin Fu
Shuqi Liu
Xiongwei Han
Tao Zhong
Mingxuan Yuan
108
0
0
04 Oct 2025
Conda: Column-Normalized Adam for Training Large Language Models Faster
Conda: Column-Normalized Adam for Training Large Language Models Faster
Junjie Wang
Pan Zhou
Yiming Dong
Huan Li
Jia Li
Xun Zhou
Qicheng Lao
Cong Fang
Zhouchen Lin
AI4CE
241
0
0
29 Sep 2025
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Shane Bergsma
Nolan Dey
Joel Hestness
162
0
0
29 Sep 2025
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Shane Bergsma
Bin Claire Zhang
Nolan Dey
Shaheer Muhammad
Gurpreet Gosal
Joel Hestness
136
2
0
29 Sep 2025
Knowledge distillation through geometry-aware representational alignment
Knowledge distillation through geometry-aware representational alignment
Prajjwal Bhattarai
Mohammad Amjad
Dmytro Zhylko
Tuka Alhanai
176
0
0
27 Sep 2025
Effective Quantization of Muon Optimizer States
Effective Quantization of Muon Optimizer States
Aman Gupta
Rafael Celente
Abhishek Shivanna
D. T. Braithwaite
Gregory Dexter
Shao Tang
Hiroto Udagawa
Daniel Silva
R. Ramanath
S. Keerthi
MQ
139
0
0
27 Sep 2025
CoDA: Coding LM via Diffusion Adaptation
CoDA: Coding LM via Diffusion Adaptation
H. Chen
Shiyu Wang
Can Qin
B. Pang
Zuxin Liu
...
Shelby Heinecke
Silvio Savarese
Caiming Xiong
Huan Wang
Weiran Yao
DiffM
109
1
0
27 Sep 2025
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
Song Fei
Tian Ye
Lujia Wang
Lei Zhu
189
0
0
26 Sep 2025
Understanding SOAP from the Perspective of Gradient Whitening
Understanding SOAP from the Perspective of Gradient Whitening
Yanqing Lu
Letao Wang
Jinbo Liu
FAtt
158
0
0
26 Sep 2025
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
Boao Kong
Junzhu Liang
Yuxi Liu
Renjia Deng
Kun Yuan
160
1
0
23 Sep 2025
CorPipe at CRAC 2025: Evaluating Multilingual Encoders for Multilingual Coreference Resolution
CorPipe at CRAC 2025: Evaluating Multilingual Encoders for Multilingual Coreference Resolution
Milan Straka
182
0
0
22 Sep 2025
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Doğay Altınel
134
0
0
22 Sep 2025
Patent Language Model Pretraining with ModernBERT
Patent Language Model Pretraining with ModernBERT
Amirhossein Yousefiramandi
Ciaran Cooney
AILawVLM
294
2
0
18 Sep 2025
You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models
You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
89
0
0
17 Sep 2025
Fresh in memory: Training-order recency is linearly encoded in language model activations
Fresh in memory: Training-order recency is linearly encoded in language model activations
Dmitrii Krasheninnikov
Richard E. Turner
David Krueger
MILMLLMSV
155
0
0
17 Sep 2025
Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Pouria Mahdavinia
Hamed Mahdavi
Niloofar Mireshghallah
M. Mahdavi
MoMe
180
1
0
14 Sep 2025
Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
Thales Sales Almeida
Rodrigo Nogueira
Hélio Pedrini
157
4
0
10 Sep 2025
X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs
X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs
Dazhi Peng
90
0
0
07 Sep 2025
Filling the Gap for Uzbek: Creating Translation Resources for Southern Uzbek
Filling the Gap for Uzbek: Creating Translation Resources for Southern Uzbek
Mukhammadsaid Mamasaidov
Azizullah Aral
Abror Shopulatov
Mironshoh Inomjonov
80
1
0
20 Aug 2025
Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Yishun Lu
Wesley Armour
ODL
361
1
0
19 Aug 2025
MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search
MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search
Jeremy Carleton
Debajoy Mukherjee
Srinivas Shakkottai
D. Kalathil
209
1
0
19 Aug 2025
Advancing Cross-lingual Aspect-Based Sentiment Analysis with LLMs and Constrained Decoding for Sequence-to-Sequence Models
Advancing Cross-lingual Aspect-Based Sentiment Analysis with LLMs and Constrained Decoding for Sequence-to-Sequence ModelsInternational Conference on Agents and Artificial Intelligence (ICAART), 2025
Jakub Šmíd
P. Pribán
Pavel Král
121
6
0
14 Aug 2025
Improving Generative Cross-lingual Aspect-Based Sentiment Analysis with Constrained Decoding
Improving Generative Cross-lingual Aspect-Based Sentiment Analysis with Constrained Decoding
Jakub Šmíd
P. Pribán
Pavel Král
AI4CE
129
0
0
14 Aug 2025
Prompt-Based Approach for Czech Sentiment Analysis
Prompt-Based Approach for Czech Sentiment AnalysisRecent Advances in Natural Language Processing (RANLP), 2025
Jakub Šmíd
P. Pribán
115
5
0
12 Aug 2025
Czech Dataset for Complex Aspect-Based Sentiment Analysis Tasks
Czech Dataset for Complex Aspect-Based Sentiment Analysis TasksInternational Conference on Language Resources and Evaluation (LREC), 2025
Jakub Šmíd
P. Pribán
O. Pražák
Pavel Král
CoGe
156
5
0
11 Aug 2025
Few-shot Cross-lingual Aspect-Based Sentiment Analysis with Sequence-to-Sequence Models
Few-shot Cross-lingual Aspect-Based Sentiment Analysis with Sequence-to-Sequence ModelsInternational Conference on Text, Speech and Dialogue (TSD), 2025
Jakub Šmíd
Pavel Přibáň
Pavel Král
124
0
0
11 Aug 2025
HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
Dongquan Yang
Yifan Yang
Xiaotian Yu
Xianbiao Qi
Rong Xiao
MQ
172
0
0
26 Jul 2025
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
Xianbiao Qi
Marco Chen
Wenjie Xiao
Jiaquan Ye
Yelin He
Chun-Guang Li
Zhouchen Lin
OffRL
140
0
0
23 Jul 2025
Apple Intelligence Foundation Language Models: Tech Report 2025
Apple Intelligence Foundation Language Models: Tech Report 2025
Ethan Li
Anders Boesen Lindbo Larsen
Chen Zhang
Xiyou Zhou
Jun Qin
...
Josh Elman
Dong Yin
Yusuf Goren
J. Lai
Yiran Fei
170
6
0
17 Jul 2025
Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models
Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models
Zejian Li
Yize Li
Chenye Meng
Zhongni Liu
Yang Ling
Shengyuan Zhang
Guang Yang
Changyuan Yang
Zhiyuan Yang
Lingyun Sun
369
5
0
14 Jul 2025
1234...141516
Next