Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14342
Cited By
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
23 May 2023
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training"
50 / 103 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
26
0
0
05 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
48
0
0
22 Apr 2025
Second-order Optimization of Gaussian Splats with Importance Sampling
Hamza Pehlivan
Andrea Boscolo Camiletto
Lin Geng Foo
Marc Habermann
Christian Theobalt
3DGS
25
0
0
17 Apr 2025
Understanding Machine Unlearning Through the Lens of Mode Connectivity
Jiali Cheng
Hadi Amiri
MU
117
0
0
08 Apr 2025
SNRAware: Improved Deep Learning MRI Denoising with SNR Unit Training and G-factor Map Augmentation
H. Xue
Sarah M. Hooper
Iain Pierce
R. Davies
John Stairs
...
C. Manisty
James C. Moon
T. Treibel
Peter Kellman
Michael S. Hansen
MedIm
44
0
0
23 Mar 2025
Semi-Decision-Focused Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization
Juhyeong Kim
70
0
0
16 Mar 2025
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
Shuo Xie
Tianhao Wang
Sashank J. Reddi
Sanjiv Kumar
Zhiyuan Li
45
1
0
13 Mar 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
78
1
0
26 Feb 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
159
0
0
25 Feb 2025
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
111
1
0
24 Feb 2025
Stacking as Accelerated Gradient Descent
Naman Agarwal
Pranjal Awasthi
Satyen Kale
Eric Zhao
ODL
65
2
0
20 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
45
0
0
10 Feb 2025
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Jie Chen
AI4CE
59
2
0
28 Jan 2025
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
46
1
0
21 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
108
1
0
21 Jan 2025
A Hessian-informed hyperparameter optimization for differential learning rate
Shiyun Xu
Zhiqi Bu
Yiliang Zhang
Ian J. Barnett
39
1
0
12 Jan 2025
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints
Alberto Maté
Mariella Dimiccoli
AI4TS
26
0
0
27 Dec 2024
Distributed Sign Momentum with Local Steps for Training Transformers
Shuhua Yu
Ding Zhou
Cong Xie
An Xu
Zhi-Li Zhang
Xin Liu
S. Kar
64
0
0
26 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language
Eta Yang
SLR
82
0
0
19 Nov 2024
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Zhijie Chen
Qiaobo Li
A. Banerjee
FedML
30
0
0
11 Nov 2024
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Yoni Choukroun
Shlomi Azoulay
P. Kisilev
24
0
0
06 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurélien Lucchi
AI4CE
39
0
0
04 Nov 2024
Data movement limits to frontier model training
Ege Erdil
David Schneider-Joseph
31
0
0
02 Nov 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
65
5
0
22 Oct 2024
A Scientific Machine Learning Approach for Predicting and Forecasting Battery Degradation in Electric Vehicles
Sharv Murgai
Hrishikesh Bhagwat
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
13
0
0
18 Oct 2024
Second-Order Min-Max Optimization with Lazy Hessians
Lesi Chen
Chengchang Liu
Jingzhao Zhang
41
1
0
12 Oct 2024
Scalable and Resource-Efficient Second-Order Federated Learning via Over-the-Air Aggregation
Abdulmomen Ghalkha
Chaouki Ben Issaid
Mehdi Bennis
24
0
0
10 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
29
0
0
08 Oct 2024
A second-order-like optimizer with adaptive gradient scaling for deep learning
Jérôme Bolte
Ryan Boustany
Edouard Pauwels
Andrei Purica
ODL
30
0
0
08 Oct 2024
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
61
23
0
17 Sep 2024
A framework for measuring the training efficiency of a neural architecture
Eduardo Cueto-Mendoza
John D. Kelleher
38
0
0
12 Sep 2024
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
28
8
0
05 Sep 2024
Second-Order Forward-Mode Automatic Differentiation for Optimization
Adam D. Cobb
Atılım Güneş Baydin
Barak A. Pearlmutter
Susmit Jha
ODL
31
1
0
19 Aug 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
28
0
22 Jul 2024
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
45
3
0
17 Jul 2024
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
44
3
0
16 Jul 2024
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
42
17
0
10 Jul 2024
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Vincent Roulet
Atish Agarwala
Jean-Bastien Grill
Grzegorz Swirszcz
Mathieu Blondel
Fabian Pedregosa
34
1
0
08 Jul 2024
Memory
3
\text{Memory}^3
Memory
3
: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
48
11
0
01 Jul 2024
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
27
7
0
25 Jun 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Ruoyu Sun
36
34
0
24 Jun 2024
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
22
0
0
24 Jun 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Rui Pan
Tong Zhang
21
4
0
21 Jun 2024
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
Son Nguyen
Lizhang Chen
Bo Liu
Qiang Liu
20
3
0
14 Jun 2024
Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm
Ahmed Elbakary
Chaouki Ben Issaid
Mohammad Shehab
Karim G. Seddik
Tamer A. ElBatt
Mehdi Bennis
24
2
0
10 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
37
6
0
31 May 2024
4-bit Shampoo for Memory-Efficient Network Training
Sike Wang
Jia Li
Pan Zhou
Hua Huang
MQ
31
5
0
28 May 2024
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Jiaxiang Li
Siliang Zeng
Hoi-To Wai
Chenliang Li
Alfredo García
Mingyi Hong
57
15
0
28 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
74
2
0
26 May 2024
1
2
3
Next