ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04235
  4. Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018
Noam M. Shazeer
Mitchell Stern
    ODL
ArXiv (abs)PDFHTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Minghao Fu
Guo-Hua Wang
Liangfu Cao
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
398
4
0
18 Feb 2025
We Can't Understand AI Using our Existing Vocabulary
We Can't Understand AI Using our Existing Vocabulary
John Hewitt
Robert Geirhos
Been Kim
325
14
0
11 Feb 2025
What makes a good feedforward computational graph?
What makes a good feedforward computational graph?
Alex Vitvitskyi
J. G. Araújo
Marc Lackenby
Petar Velickovic
367
6
0
10 Feb 2025
Memory-Efficient Fine-Tuning of Transformers via Token Selection
Memory-Efficient Fine-Tuning of Transformers via Token SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Antoine Simoulin
Namyong Park
Xiaoyi Liu
Grey Yang
427
6
0
31 Jan 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
LiPO: Listwise Preference Optimization through Learning-to-RankNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
605
85
0
28 Jan 2025
Celo: Training Versatile Learned Optimizers on a Compute Diet
Celo: Training Versatile Learned Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
990
0
0
22 Jan 2025
A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science
A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Shanshan Li
Dongsheng Li
375
0
0
21 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMsAAAI Conference on Artificial Intelligence (AAAI), 2024
Jiaxing Wu
Lin Ning
Luyang Liu
Harrison Lee
Neo Wu
Chao Wang
Sushant Prakash
S. O’Banion
Bradley Green
Jun Xie
387
4
0
20 Jan 2025
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Iterative Label Refinement Matters More than Preference Optimization under Weak SupervisionInternational Conference on Learning Representations (ICLR), 2025
Yaowen Ye
Cassidy Laidlaw
Jacob Steinhardt
ALM
240
3
0
14 Jan 2025
Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training
Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training
Ziqing Wen
Ping Luo
Jun Wang
Xiaoge Deng
Jinping Zou
Kun Yuan
Tao Sun
Dongsheng Li
CLL
342
0
0
13 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM TrainingInternational Conference on Learning Representations (ICLR), 2025
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zinan Lin
Shiwei Liu
394
15
0
12 Jan 2025
Dialectal and Low-Resource Machine Translation for Aromanian
Dialectal and Low-Resource Machine Translation for AromanianInternational Conference on Computational Linguistics (COLING), 2024
Alexandru-Iulius Jerpelea
Alina-Ştefania Rădoi
Sergiu Nisioi
266
3
0
08 Jan 2025
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Patrice Béchard
Orlando Marquez Ayala
269
0
0
08 Jan 2025
Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation
Alireza Salemi
Cheng-rong Li
Mingyang Zhang
Qiaozhu Mei
Weize Kong
Tao Chen
Zhuowan Li
Michael Bendersky
Hamed Zamani
LRMRALMReLM
277
20
0
07 Jan 2025
The interplay between domain specialization and model size
The interplay between domain specialization and model size
Roseval Malaquias Junior
Ramon Pires
Thales Sales Almeida
Kenzo Sakiyama
R. Romero
R. Nogueira
514
1
0
03 Jan 2025
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-TuningInternational Conference on Learning Representations (ICLR), 2024
Yehonathan Refael
Jonathan Svirsky
Boris Shustin
Wasim Huleihel
Ofir Lindenbaum
300
10
0
31 Dec 2024
Grams: Gradient Descent with Adaptive Momentum Scaling
Grams: Gradient Descent with Adaptive Momentum Scaling
Yang Cao
Xiaoyu Li
Zhao Song
ODL
501
5
0
22 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
457
389
0
18 Dec 2024
No More Adam: Learning Rate Scaling at Initialization is All You Need
No More Adam: Learning Rate Scaling at Initialization is All You Need
Minghao Xu
Lichuan Xiang
Xu Cai
Hongkai Wen
341
4
0
16 Dec 2024
Analyzing the Attention Heads for Pronoun Disambiguation in
  Context-aware Machine Translation Models
Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
275
1
0
15 Dec 2024
SMMF: Square-Matricized Momentum Factorization for Memory-Efficient
  Optimization
SMMF: Square-Matricized Momentum Factorization for Memory-Efficient OptimizationAAAI Conference on Artificial Intelligence (AAAI), 2024
Kwangryeol Park
Seulki Lee
189
1
0
12 Dec 2024
Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax
  Variance-Guided LLMs without Real Data Replay
Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data ReplayAAAI Conference on Artificial Intelligence (AAAI), 2024
Ruiheng Liu
Jinyu Zhang
Yanqi Song
Yu Zhang
Bailong Yang
KELMCLL
240
4
0
10 Dec 2024
Visual Lexicon: Rich Image Features in Language Space
Visual Lexicon: Rich Image Features in Language SpaceComputer Vision and Pattern Recognition (CVPR), 2024
Xudong Wang
Xingyi Zhou
Alireza Fathi
Trevor Darrell
Cordelia Schmid
VLM
208
7
0
09 Dec 2024
SceneDiffuser: Efficient and Controllable Driving Simulation
  Initialization and Rollout
SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and RolloutNeural Information Processing Systems (NeurIPS), 2024
C. Jiang
Yijing Bai
Andre Cornman
Christopher Davis
Xiukun Huang
...
Carlos Fuertes
Chang Yuan
Mingxing Tan
Yin Zhou
Dragomir Anguelov
281
39
0
05 Dec 2024
SimuScope: Realistic Endoscopic Synthetic Dataset Generation through
  Surgical Simulation and Diffusion Models
SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Sabina Martyniak
Joanna Kaleta
Diego DallÁlba
Michał Naskręt
Szymon Płotka
Przemysław Korzeniowski
MedIm
366
6
0
03 Dec 2024
Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and
  Annoy with Fine-Tuned Features
Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features
MD Shaikh Rahman
Syed Maudud E Rabbi
Muhammad Mahbubur Rashid
251
4
0
02 Dec 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient ProjectionComputer Vision and Pattern Recognition (CVPR), 2024
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
419
6
0
26 Nov 2024
Cautious Optimizers: Improving Training with One Line of Code
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
711
21
0
25 Nov 2024
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for
  large-scale optimization
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization
Corrado Coppola
Lorenzo Papa
Irene Amerini
L. Palagi
ODL
403
0
0
24 Nov 2024
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
304
4
0
12 Nov 2024
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Yoni Choukroun
Shlomi Azoulay
P. Kisilev
301
0
0
06 Nov 2024
Transfer Learning for Finetuning Large Language Models
Transfer Learning for Finetuning Large Language Models
Tobias Strangmann
Lennart Purucker
Jörg Franke
Ivo Rapant
Fabio Ferreira
Katharina Eggensperger
227
4
0
02 Nov 2024
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal
Tian Yun
Nihal V. Nayak
Jack Merullo
Stephen H. Bach
Chen Sun
Ellie Pavlick
VLMAI4CEOnRL
272
6
0
30 Oct 2024
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
  of Neural Networks
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
225
4
0
28 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 TrainingInternational Conference on Learning Representations (ICLR), 2024
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
494
18
0
25 Oct 2024
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in
  Low-Resource Code
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code
Jipeng Zhang
Jianshu Zhang
Yuanzhe Li
Renjie Pi
Boyao Wang
Runtao Liu
Ziqiang Zheng
Tong Zhang
162
2
0
24 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging
  Small LMs
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
465
10
0
24 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Scalable Influence and Fact Tracing for Large Language Model PretrainingInternational Conference on Learning Representations (ICLR), 2024
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
307
16
0
22 Oct 2024
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
  Contrastive Loss
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Zesen Cheng
Hang Zhang
Kehan Li
Sicong Leng
Zhiqiang Hu
Fei Wu
Deli Zhao
Xin Li
Lidong Bing
160
3
0
22 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Shiyu Huang
461
16
0
22 Oct 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
LDAdam: Adaptive Optimization from Low-Dimensional Gradient StatisticsInternational Conference on Learning Representations (ICLR), 2024
Thomas Robert
M. Safaryan
Ionut-Vlad Modoranu
Dan Alistarh
ODL
457
21
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
TIPS: Text-Image Pretraining with Spatial awarenessInternational Conference on Learning Representations (ICLR), 2024
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
443
18
0
21 Oct 2024
VidPanos: Generative Panoramic Videos from Casual Panning Videos
VidPanos: Generative Panoramic Videos from Casual Panning VideosACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Jingwei Ma
Erika Lu
Roni Paiss
Shiran Zada
Aleksander Holynski
Tali Dekel
Brian L. Curless
Michael Rubinstein
Forrester Cole
VGen
229
7
0
17 Oct 2024
Learning to Predict Usage Options of Product Reviews with LLM-Generated
  Labels
Learning to Predict Usage Options of Product Reviews with LLM-Generated Labels
Leo Kohlenberg
Leonard Horns
Frederic Sadrieh
Nils Kiele
Matthis Clausen
Konstantin Ketterer
Avetis Navasardyan
Tamara Czinczoll
Gerard de Melo
Ralf Herbrich
103
1
0
16 Oct 2024
Model Balancing Helps Low-data Training and Fine-tuning
Model Balancing Helps Low-data Training and Fine-tuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zihang Liu
Yihan Hu
Tianyu Pang
Yefan Zhou
Pu Ren
Yaoqing Yang
226
9
0
16 Oct 2024
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks
  in English
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English
T. Y. S. S. Santosh
Cornelius Weiss
Matthias Grabmair
AILawELM
460
9
0
12 Oct 2024
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic
  Knowledge Tuning
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge TuningScientific Reports (Sci Rep), 2024
Nusrat Jahan Prottasha
Asif Mahmud
Md. Shohanur Islam Sobuj
Prakash Bhat
Md. Kowsher
Niloofar Yousefi
O. Garibay
299
19
0
11 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
378
2
0
09 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation
  Learning
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
232
2
0
08 Oct 2024
A second-order-like optimizer with adaptive gradient scaling for deep
  learning
A second-order-like optimizer with adaptive gradient scaling for deep learning
Jérôme Bolte
Ryan Boustany
Edouard Pauwels
Andrei Purica
ODL
208
0
0
08 Oct 2024
Previous
123456...141516
Next