ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03507
  4. Cited By
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
v1v2 (latest)

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

6 March 2024
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zinan Lin
A. Anandkumar
Yuandong Tian
ArXiv (abs)PDFHTMLHuggingFace (189 upvotes)

Papers citing "GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection"

50 / 219 papers shown
Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees
Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees
Chuyan Chen
Yutong He
Pengrui Li
Weichen Jia
Kun Yuan
625
4
0
11 Jul 2025
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
Xujia Wang
Yunjia Qi
Bin Xu
249
0
0
06 Jul 2025
Pay Attention to Small Weights
Pay Attention to Small Weights
Chao Zhou
Tom Jacobs
Advait Gadhikar
R. Burkholz
220
0
0
26 Jun 2025
A Minimalist Optimizer Design for LLM Pretraining
A Minimalist Optimizer Design for LLM Pretraining
Athanasios Glentis
Jiaxiang Li
Andi Han
Mingyi Hong
266
3
0
20 Jun 2025
A geometric framework for momentum-based optimizers for low-rank training
A geometric framework for momentum-based optimizers for low-rank training
Steffen Schotthöfer
Timon Klein
J. Kusch
AI4CE
229
2
0
20 Jun 2025
Subspace-Boosted Model Merging
Subspace-Boosted Model Merging
Ronald Skorobogat
Karsten Roth
Mariana-Iuliana Georgescu
MoMe
400
2
0
19 Jun 2025
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki
Xiuyu Li
Junxian Guo
Ligeng Zhu
Chenfeng Xu
Konstantinos N. Plataniotis
Amir Yazdanbakhsh
Kurt Keutzer
Song Han
Zhijian Liu
217
4
0
19 Jun 2025
Memory-Efficient Differentially Private Training with Gradient Random Projection
Memory-Efficient Differentially Private Training with Gradient Random Projection
Alex Mulrooney
Devansh Gupta
James Flemings
Huanyu Zhang
Murali Annavaram
Meisam Razaviyayn
Xinwei Zhang
243
1
0
18 Jun 2025
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Yuto Harada
Yusuke Yamauchi
Yusuke Oda
Yohei Oseki
Yusuke Miyao
Yu Takagi
ALM
258
5
0
17 Jun 2025
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
Di He
Ajay Jaiswal
Songjun Tu
Li Shen
Ganzhao Yuan
Shiwei Liu
L. Yin
386
1
0
17 Jun 2025
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Jianlong Wu
Sihao Liu
Chuan Rao
Bang An
Tiancheng Shen
Juil Sock
Ming-Hsuan Yang
Bernard Ghanem
262
4
0
16 Jun 2025
Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Haonan Wang
Brian K Chen
Siquan Li
Xinhe Liang
Hwee Kuan Lee
Kenji Kawaguchi
Tianyang Hu
211
0
0
16 Jun 2025
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Hsi-Che Lin
Yu-Chu Yu
Kai-Po Chang
Y. Wang
277
0
0
13 Jun 2025
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Runa Eschenhagen
Aaron Defazio
Tsung-Hsien Lee
Richard Turner
Hao-Jun Michael Shi
283
4
0
04 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion
Liang Zhang
Bingcong Li
Niao He
256
3
0
03 Jun 2025
QKV Projections Require a Fraction of Their Memory
QKV Projections Require a Fraction of Their Memory
Malik Khalf
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQVLM
324
1
0
03 Jun 2025
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
Wei Shen
Zhang Yaxiang
Minhui Huang
Mengfan Xu
Jiawei Zhang
Cong Shen
AI4CE
326
1
0
02 Jun 2025
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Sameera Ramasinghe
Thalaiyasingam Ajanthan
Gil Avraham
Yan Zuo
Alexander Long
GNN
377
0
0
02 Jun 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Siyuan Li
Juanxi Tian
Zedong Wang
Xin Jin
Zicheng Liu
Wentao Zhang
Dan Xu
230
0
0
01 Jun 2025
Structured Gradient Guidance for Few-Shot Adaptation in Large Language Models
Structured Gradient Guidance for Few-Shot Adaptation in Large Language Models
Hongye Zheng
Yichen Wang
Ray Pan
Guiran Liu
Binrong Zhu
Hanlu Zhang
142
9
0
31 May 2025
LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs
LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMsJournal of Information Security and Applications (JISA), 2025
Luis Ibanez-Lissen
Lorena Gonzalez-Manzano
José Maria De Fuentes
Nicolas Anciaux
140
2
0
30 May 2025
GradPower: Powering Gradients for Faster Language Model Pre-Training
GradPower: Powering Gradients for Faster Language Model Pre-Training
Mingze Wang
Jinbo Wang
Jiaqi Zhang
Wei Wang
Peng Pei
Xunliang Cai
Weinan E
Lei Wu
231
0
0
30 May 2025
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
Yehonathan Refael
Guy Smorodinsky
Tom Tirer
Ofir Lindenbaum
180
5
0
30 May 2025
On the Convergence Analysis of Muon
On the Convergence Analysis of Muon
Wei Shen
Ruichuan Huang
Minhui Huang
Cong Shen
Jiawei Zhang
303
0
0
29 May 2025
Two Is Better Than One: Rotations Scale LoRAs
Two Is Better Than One: Rotations Scale LoRAs
Hongcan Guo
Guoshun Nan
Yuan Yang
Diyang Zhang
Haotian Li
...
Yuhan Ran
Xinye Cao
Sicong Leng
Xiaofeng Tao
Xudong Jiang
243
0
0
29 May 2025
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Q. Xiao
Alan Ansell
Boqian Wu
Lu Yin
Mykola Pechenizkiy
Shiwei Liu
Decebal Constantin Mocanu
279
2
0
29 May 2025
MuLoCo: Muon is a practical inner optimizer for DiLoCo
MuLoCo: Muon is a practical inner optimizer for DiLoCo
Benjamin Thérien
Xiaolong Huang
Irina Rish
Eugene Belilovsky
MoE
170
6
0
29 May 2025
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Ba-Hien Tran
Van Minh Nguyen
MQ
373
0
0
28 May 2025
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
Yaoyang Liu
Junlin Li
Yinjun Wu
Zhen Chen
329
1
0
25 May 2025
Efficient Data Selection at Scale via Influence Distillation
Efficient Data Selection at Scale via Influence Distillation
Mahdi Nikdan
Vincent Cohen-Addad
Dan Alistarh
Vahab Mirrokni
TDI
329
4
0
25 May 2025
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
Zhendong Mi
Qitao Tan
Xiaodong Yu
Zining Zhu
Geng Yuan
Shaoyi Huang
356
4
0
24 May 2025
PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training
PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training
Matan Haroush
Daniel Soudry
362
0
0
23 May 2025
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
Ionut-Vlad Modoranu
M. Safaryan
Erik Schultheis
Max Ryabinin
Artem Chumachenko
Dan Alistarh
321
0
0
23 May 2025
NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling
NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling
Bram Grooten
Farid Hasanov
Chenxiang Zhang
Q. Xiao
Boqian Wu
...
Shiwei Liu
L. Yin
Elena Mocanu
Mykola Pechenizkiy
Decebal Constantin Mocanu
307
0
0
23 May 2025
AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training
AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training
Huishuai Zhang
Bohan Wang
Luoxin Chen
ODL
482
2
0
22 May 2025
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong
Jiho Jang
Nojun Kwak
459
0
0
22 May 2025
Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models
Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models
Sajjad Ghiasvand
Haniyeh Ehsani Oskouie
Mahnoosh Alizadeh
Ramtin Pedarsani
AAMLVLM
407
8
0
21 May 2025
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Sifeng Shang
Jiayi Zhou
Chenyu Lin
Minxian Li
Kaiyang Zhou
MQ
353
1
0
19 May 2025
ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates
ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates
Tingfeng Lan
Yusen Wu
Bin Ma
Zhaoyuan Su
Rui Yang
Tekin Bicer
Masahiro Tanaka
Olatunji Ruwase
Dong Li
Yue Cheng
567
3
0
18 May 2025
AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections
AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections
Xin Yu
Yujia Wang
Jinghui Chen
Lingzhou Xue
326
2
0
18 May 2025
Continuous Subspace Optimization for Continual Learning
Continuous Subspace Optimization for Continual Learning
Quan Cheng
Yuanyu Wan
Lingyu Wu
Chenping Hou
Lijun Zhang
CLL
488
1
0
17 May 2025
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Yezhen Wang
Zhouhao Yang
Brian K Chen
Fanyi Pu
Yue Liu
Tianyu Gao
Kenji Kawaguchi
236
0
0
03 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Jianchao Tan
Lizhuang Ma
Jiangming Wang
Jun Wang
Weinan Zhang
Wei Zhang
MQ
335
0
0
01 May 2025
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Nairouz Mrabah
Nicolas Richet
Ismail Ben Ayed
Eric Granger
BDLVLM
417
0
0
16 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
STEP: Staged Parameter-Efficient Pre-training for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
295
3
0
05 Apr 2025
Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge
Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge
Senkang Hu
Yanan Ma
Yihang Tao
Zhengru Fang
Zihan Fang
Yiqin Deng
Sam Kwong
Yuguang Fang
261
2
0
29 Mar 2025
ASGO: Adaptive Structured Gradient Optimization
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Boyao Wang
Shiqian Ma
Shiqian Ma
Tong Zhang
Tong Zhang
ODL
430
26
0
26 Mar 2025
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
Laura Balzano
Tianjiao Ding
B. Haeffele
Soo Min Kwon
Qing Qu
Peng Wang
Liang Luo
Can Yaras
OffRLAI4CE
239
3
0
25 Mar 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image SegmentationInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
385
1
0
18 Mar 2025
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
ClusComp: A Simple Paradigm for Model Compression and Efficient FinetuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Baohao Liao
Christian Herold
Seyyed Hadi Hashemi
Stefan Vasilev
Shahram Khadivi
Christof Monz
MQ
374
1
0
17 Mar 2025
Previous
12345
Next