ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14342
  4. Cited By
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

23 May 2023
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
    VLM
ArXivPDFHTML

Papers citing "Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training"

50 / 103 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
26
0
0
05 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
48
0
0
22 Apr 2025
Second-order Optimization of Gaussian Splats with Importance Sampling
Second-order Optimization of Gaussian Splats with Importance Sampling
Hamza Pehlivan
Andrea Boscolo Camiletto
Lin Geng Foo
Marc Habermann
Christian Theobalt
3DGS
25
0
0
17 Apr 2025
Understanding Machine Unlearning Through the Lens of Mode Connectivity
Understanding Machine Unlearning Through the Lens of Mode Connectivity
Jiali Cheng
Hadi Amiri
MU
117
0
0
08 Apr 2025
SNRAware: Improved Deep Learning MRI Denoising with SNR Unit Training and G-factor Map Augmentation
SNRAware: Improved Deep Learning MRI Denoising with SNR Unit Training and G-factor Map Augmentation
H. Xue
Sarah M. Hooper
Iain Pierce
R. Davies
John Stairs
...
C. Manisty
James C. Moon
T. Treibel
Peter Kellman
Michael S. Hansen
MedIm
44
0
0
23 Mar 2025
Semi-Decision-Focused Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization
Semi-Decision-Focused Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization
Juhyeong Kim
70
0
0
16 Mar 2025
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
Shuo Xie
Tianhao Wang
Sashank J. Reddi
Sanjiv Kumar
Zhiyuan Li
45
1
0
13 Mar 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
78
1
0
26 Feb 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
159
0
0
25 Feb 2025
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
111
1
0
24 Feb 2025
Stacking as Accelerated Gradient Descent
Stacking as Accelerated Gradient Descent
Naman Agarwal
Pranjal Awasthi
Satyen Kale
Eric Zhao
ODL
65
2
0
20 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
45
0
0
10 Feb 2025
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Jie Chen
AI4CE
59
2
0
28 Jan 2025
Physics of Skill Learning
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
46
1
0
21 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
108
1
0
21 Jan 2025
A Hessian-informed hyperparameter optimization for differential learning rate
A Hessian-informed hyperparameter optimization for differential learning rate
Shiyun Xu
Zhiqi Bu
Yiliang Zhang
Ian J. Barnett
39
1
0
12 Jan 2025
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation
  by Learning and Enforcing Temporal Constraints
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints
Alberto Maté
Mariella Dimiccoli
AI4TS
26
0
0
27 Dec 2024
Distributed Sign Momentum with Local Steps for Training Transformers
Distributed Sign Momentum with Local Steps for Training Transformers
Shuhua Yu
Ding Zhou
Cong Xie
An Xu
Zhi-Li Zhang
Xin Liu
S. Kar
64
0
0
26 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language
Signformer is all you need: Towards Edge AI for Sign Language
Eta Yang
SLR
82
0
0
19 Nov 2024
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Zhijie Chen
Qiaobo Li
A. Banerjee
FedML
30
0
0
11 Nov 2024
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Yoni Choukroun
Shlomi Azoulay
P. Kisilev
24
0
0
06 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurélien Lucchi
AI4CE
39
0
0
04 Nov 2024
Data movement limits to frontier model training
Data movement limits to frontier model training
Ege Erdil
David Schneider-Joseph
31
0
0
02 Nov 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
65
5
0
22 Oct 2024
A Scientific Machine Learning Approach for Predicting and Forecasting
  Battery Degradation in Electric Vehicles
A Scientific Machine Learning Approach for Predicting and Forecasting Battery Degradation in Electric Vehicles
Sharv Murgai
Hrishikesh Bhagwat
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
13
0
0
18 Oct 2024
Second-Order Min-Max Optimization with Lazy Hessians
Second-Order Min-Max Optimization with Lazy Hessians
Lesi Chen
Chengchang Liu
Jingzhao Zhang
41
1
0
12 Oct 2024
Scalable and Resource-Efficient Second-Order Federated Learning via Over-the-Air Aggregation
Scalable and Resource-Efficient Second-Order Federated Learning via Over-the-Air Aggregation
Abdulmomen Ghalkha
Chaouki Ben Issaid
Mehdi Bennis
24
0
0
10 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation
  Learning
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
29
0
0
08 Oct 2024
A second-order-like optimizer with adaptive gradient scaling for deep
  learning
A second-order-like optimizer with adaptive gradient scaling for deep learning
Jérôme Bolte
Ryan Boustany
Edouard Pauwels
Andrei Purica
ODL
30
0
0
08 Oct 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
61
23
0
17 Sep 2024
A framework for measuring the training efficiency of a neural
  architecture
A framework for measuring the training efficiency of a neural architecture
Eduardo Cueto-Mendoza
John D. Kelleher
38
0
0
12 Sep 2024
The AdEMAMix Optimizer: Better, Faster, Older
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
28
8
0
05 Sep 2024
Second-Order Forward-Mode Automatic Differentiation for Optimization
Second-Order Forward-Mode Automatic Differentiation for Optimization
Adam D. Cobb
Atılım Güneş Baydin
Barak A. Pearlmutter
Susmit Jha
ODL
31
1
0
19 Aug 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
28
0
22 Jul 2024
Questionable practices in machine learning
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
45
3
0
17 Jul 2024
Exploring Quantization for Efficient Pre-Training of Transformer
  Language Models
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
44
3
0
16 Jul 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
42
17
0
10 Jul 2024
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Vincent Roulet
Atish Agarwala
Jean-Bastien Grill
Grzegorz Swirszcz
Mathieu Blondel
Fabian Pedregosa
34
1
0
08 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
48
11
0
01 Jul 2024
A New Perspective on Shampoo's Preconditioner
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
27
7
0
25 Jun 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Ruoyu Sun
36
34
0
24 Jun 2024
Building on Efficient Foundations: Effectively Training LLMs with
  Structured Feedforward Layers
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
22
0
0
24 Jun 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Rui Pan
Tong Zhang
21
4
0
21 Jun 2024
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
Son Nguyen
Lizhang Chen
Bo Liu
Qiang Liu
20
3
0
14 Jun 2024
Fed-Sophia: A Communication-Efficient Second-Order Federated Learning
  Algorithm
Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm
Ahmed Elbakary
Chaouki Ben Issaid
Mohammad Shehab
Karim G. Seddik
Tamer A. ElBatt
Mehdi Bennis
24
2
0
10 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit
  Regularization
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
37
6
0
31 May 2024
4-bit Shampoo for Memory-Efficient Network Training
4-bit Shampoo for Memory-Efficient Network Training
Sike Wang
Jia Li
Pan Zhou
Hua Huang
MQ
31
5
0
28 May 2024
Getting More Juice Out of the SFT Data: Reward Learning from Human
  Demonstration Improves SFT for LLM Alignment
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Jiaxiang Li
Siliang Zeng
Hoi-To Wai
Chenliang Li
Alfredo García
Mingyi Hong
57
15
0
28 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
74
2
0
26 May 2024
123
Next