Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.04235
Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
11 April 2018
Noam M. Shazeer
Mitchell Stern
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"
50 / 799 papers shown
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Minghao Fu
Guo-Hua Wang
Liangfu Cao
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
398
4
0
18 Feb 2025
We Can't Understand AI Using our Existing Vocabulary
John Hewitt
Robert Geirhos
Been Kim
325
14
0
11 Feb 2025
What makes a good feedforward computational graph?
Alex Vitvitskyi
J. G. Araújo
Marc Lackenby
Petar Velickovic
367
6
0
10 Feb 2025
Memory-Efficient Fine-Tuning of Transformers via Token Selection
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Antoine Simoulin
Namyong Park
Xiaoyi Liu
Grey Yang
427
6
0
31 Jan 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
605
85
0
28 Jan 2025
Celo: Training Versatile Learned Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
990
0
0
22 Jan 2025
A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Shanshan Li
Dongsheng Li
375
0
0
21 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jiaxing Wu
Lin Ning
Luyang Liu
Harrison Lee
Neo Wu
Chao Wang
Sushant Prakash
S. O’Banion
Bradley Green
Jun Xie
387
4
0
20 Jan 2025
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
International Conference on Learning Representations (ICLR), 2025
Yaowen Ye
Cassidy Laidlaw
Jacob Steinhardt
ALM
240
3
0
14 Jan 2025
Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training
Ziqing Wen
Ping Luo
Jun Wang
Xiaoge Deng
Jinping Zou
Kun Yuan
Tao Sun
Dongsheng Li
CLL
342
0
0
13 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
International Conference on Learning Representations (ICLR), 2025
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zinan Lin
Shiwei Liu
394
15
0
12 Jan 2025
Dialectal and Low-Resource Machine Translation for Aromanian
International Conference on Computational Linguistics (COLING), 2024
Alexandru-Iulius Jerpelea
Alina-Ştefania Rădoi
Sergiu Nisioi
266
3
0
08 Jan 2025
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Patrice Béchard
Orlando Marquez Ayala
269
0
0
08 Jan 2025
Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation
Alireza Salemi
Cheng-rong Li
Mingyang Zhang
Qiaozhu Mei
Weize Kong
Tao Chen
Zhuowan Li
Michael Bendersky
Hamed Zamani
LRM
RALM
ReLM
277
20
0
07 Jan 2025
The interplay between domain specialization and model size
Roseval Malaquias Junior
Ramon Pires
Thales Sales Almeida
Kenzo Sakiyama
R. Romero
R. Nogueira
514
1
0
03 Jan 2025
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
International Conference on Learning Representations (ICLR), 2024
Yehonathan Refael
Jonathan Svirsky
Boris Shustin
Wasim Huleihel
Ofir Lindenbaum
300
10
0
31 Dec 2024
Grams: Gradient Descent with Adaptive Momentum Scaling
Yang Cao
Xiaoyu Li
Zhao Song
ODL
501
5
0
22 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
457
389
0
18 Dec 2024
No More Adam: Learning Rate Scaling at Initialization is All You Need
Minghao Xu
Lichuan Xiang
Xu Cai
Hongkai Wen
341
4
0
16 Dec 2024
Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
275
1
0
15 Dec 2024
SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization
AAAI Conference on Artificial Intelligence (AAAI), 2024
Kwangryeol Park
Seulki Lee
189
1
0
12 Dec 2024
Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data Replay
AAAI Conference on Artificial Intelligence (AAAI), 2024
Ruiheng Liu
Jinyu Zhang
Yanqi Song
Yu Zhang
Bailong Yang
KELM
CLL
240
4
0
10 Dec 2024
Visual Lexicon: Rich Image Features in Language Space
Computer Vision and Pattern Recognition (CVPR), 2024
Xudong Wang
Xingyi Zhou
Alireza Fathi
Trevor Darrell
Cordelia Schmid
VLM
208
7
0
09 Dec 2024
SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout
Neural Information Processing Systems (NeurIPS), 2024
C. Jiang
Yijing Bai
Andre Cornman
Christopher Davis
Xiukun Huang
...
Carlos Fuertes
Chang Yuan
Mingxing Tan
Yin Zhou
Dragomir Anguelov
281
39
0
05 Dec 2024
SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Sabina Martyniak
Joanna Kaleta
Diego DallÁlba
Michał Naskręt
Szymon Płotka
Przemysław Korzeniowski
MedIm
366
6
0
03 Dec 2024
Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features
MD Shaikh Rahman
Syed Maudud E Rabbi
Muhammad Mahbubur Rashid
251
4
0
02 Dec 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Computer Vision and Pattern Recognition (CVPR), 2024
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
419
6
0
26 Nov 2024
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
711
21
0
25 Nov 2024
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization
Corrado Coppola
Lorenzo Papa
Irene Amerini
L. Palagi
ODL
403
0
0
24 Nov 2024
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
304
4
0
12 Nov 2024
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Yoni Choukroun
Shlomi Azoulay
P. Kisilev
301
0
0
06 Nov 2024
Transfer Learning for Finetuning Large Language Models
Tobias Strangmann
Lennart Purucker
Jörg Franke
Ivo Rapant
Fabio Ferreira
Katharina Eggensperger
227
4
0
02 Nov 2024
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal
Tian Yun
Nihal V. Nayak
Jack Merullo
Stephen H. Bach
Chen Sun
Ellie Pavlick
VLM
AI4CE
OnRL
272
6
0
30 Oct 2024
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
225
4
0
28 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
International Conference on Learning Representations (ICLR), 2024
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
494
18
0
25 Oct 2024
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code
Jipeng Zhang
Jianshu Zhang
Yuanzhe Li
Renjie Pi
Boyao Wang
Runtao Liu
Ziqiang Zheng
Tong Zhang
162
2
0
24 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
465
10
0
24 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
International Conference on Learning Representations (ICLR), 2024
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
307
16
0
22 Oct 2024
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Zesen Cheng
Hang Zhang
Kehan Li
Sicong Leng
Zhiqiang Hu
Fei Wu
Deli Zhao
Xin Li
Lidong Bing
160
3
0
22 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
International Conference on Learning Representations (ICLR), 2024
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Shiyu Huang
461
16
0
22 Oct 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
International Conference on Learning Representations (ICLR), 2024
Thomas Robert
M. Safaryan
Ionut-Vlad Modoranu
Dan Alistarh
ODL
457
21
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
International Conference on Learning Representations (ICLR), 2024
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
443
18
0
21 Oct 2024
VidPanos: Generative Panoramic Videos from Casual Panning Videos
ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Jingwei Ma
Erika Lu
Roni Paiss
Shiran Zada
Aleksander Holynski
Tali Dekel
Brian L. Curless
Michael Rubinstein
Forrester Cole
VGen
229
7
0
17 Oct 2024
Learning to Predict Usage Options of Product Reviews with LLM-Generated Labels
Leo Kohlenberg
Leonard Horns
Frederic Sadrieh
Nils Kiele
Matthis Clausen
Konstantin Ketterer
Avetis Navasardyan
Tamara Czinczoll
Gerard de Melo
Ralf Herbrich
103
1
0
16 Oct 2024
Model Balancing Helps Low-data Training and Fine-tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zihang Liu
Yihan Hu
Tianyu Pang
Yefan Zhou
Pu Ren
Yaoqing Yang
226
9
0
16 Oct 2024
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English
T. Y. S. S. Santosh
Cornelius Weiss
Matthias Grabmair
AILaw
ELM
460
9
0
12 Oct 2024
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning
Scientific Reports (Sci Rep), 2024
Nusrat Jahan Prottasha
Asif Mahmud
Md. Shohanur Islam Sobuj
Prakash Bhat
Md. Kowsher
Niloofar Yousefi
O. Garibay
299
19
0
11 Oct 2024
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
378
2
0
09 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
232
2
0
08 Oct 2024
A second-order-like optimizer with adaptive gradient scaling for deep learning
Jérôme Bolte
Ryan Boustany
Edouard Pauwels
Andrei Purica
ODL
208
0
0
08 Oct 2024
Previous
1
2
3
4
5
6
...
14
15
16
Next