ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.04838
  4. Cited By
Optimization Methods for Large-Scale Machine Learning
v1v2v3 (latest)

Optimization Methods for Large-Scale Machine Learning

15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
ArXiv (abs)PDFHTML

Papers citing "Optimization Methods for Large-Scale Machine Learning"

50 / 1,490 papers shown
Approximate Agreement Algorithms for Byzantine Collaborative Learning
Approximate Agreement Algorithms for Byzantine Collaborative LearningACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2025
Tijana Milentijević
Mélanie Cambus
Darya Melnyk
Stefan Schmid
FedML
442
2
0
02 Apr 2025
Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models
Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models
Junyi Zhu
Ruicong Yao
Taha Ceritli
Savas Ozkan
Matthew B. Blaschko
Eunchung Noh
Jeongwon Min
Cho Jung Min
Mete Ozay
FedML
504
0
0
26 Mar 2025
A Flexible Fairness Framework with Surrogate Loss Reweighting for Addressing Sociodemographic Disparities
A Flexible Fairness Framework with Surrogate Loss Reweighting for Addressing Sociodemographic Disparities
Wen Xu
Elham Dolatabadi
FaML
287
1
0
21 Mar 2025
Learning Energy-Based Models by Self-normalising the Likelihood
Hugo Senetaire
Paul Jeha
Pierre-Alexandre Mattei
J. Frellsen
330
1
0
10 Mar 2025
Decision-Dependent Stochastic Optimization: The Role of Distribution Dynamics
Zhiyu He
S. Bolognani
Florian Dorfler
Michael Muehlebach
258
5
0
10 Mar 2025
FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic OptimizationConference on Algebraic Informatics (AI), 2025
Zhanhong Jiang
Md Zahid Hasan
Aditya Balu
Joshua R. Waite
Genyi Huang
Soumik Sarkar
218
0
0
06 Mar 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODLAAML
1.3K
2
0
25 Feb 2025
Theory-guided Pseudo-spectral Full Waveform Inversion via Deep Neural Networks
Theory-guided Pseudo-spectral Full Waveform Inversion via Deep Neural Networks
Christopher Zerafa
Pauline Galea
Cristiana Sebu
376
0
0
24 Feb 2025
Convergence of Shallow ReLU Networks on Weakly Interacting Data
Convergence of Shallow ReLU Networks on Weakly Interacting Data
Léo Dana
Francis R. Bach
Loucas Pillaud-Vivien
MLT
289
2
0
24 Feb 2025
Verification and Validation for Trustworthy Scientific Machine Learning
Verification and Validation for Trustworthy Scientific Machine Learning
John D. Jakeman
Lorena A. Barba
J. Martins
Thomas O'Leary-Roseberry
AI4CE
466
2
0
21 Feb 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMsInternational Conference on Learning Representations (ICLR), 2025
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
349
22
0
21 Feb 2025
Preconditioned Inexact Stochastic ADMM for Deep Model
Preconditioned Inexact Stochastic ADMM for Deep Model
Shenglong Zhou
Ouya Wang
Ziyan Luo
Yongxu Zhu
Geoffrey Ye Li
447
1
0
15 Feb 2025
Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
Zhaoxian Wu
Quan Xian
Tayfun Gokmen
Omobayode Fagbohungbe
Tianyi Chen
494
1
0
10 Feb 2025
Comparison of CNN-based deep learning architectures for unsteady CFD acceleration on small datasets
Comparison of CNN-based deep learning architectures for unsteady CFD acceleration on small datasetsNuclear Engineering and Technology (NET), 2025
Sangam Khanal
Shilaj Baral
Joongoo Jeon
AI4CE
561
4
0
06 Feb 2025
E-3SFC: Communication-Efficient Federated Learning with Double-way Features Synthesizing
E-3SFC: Communication-Efficient Federated Learning with Double-way Features SynthesizingIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Yuhao Zhou
Yuxin Tian
Mingjia Shi
Yuanxi Li
Yanan Sun
Qing Ye
Jiancheng Lv
233
2
0
05 Feb 2025
Estimating Multi-chirp Parameters using Curvature-guided Langevin Monte Carlo
Estimating Multi-chirp Parameters using Curvature-guided Langevin Monte CarloIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Sattwik Basu
Debottam Dutta
Yu-Lin Wei
Romit Roy Choudhury
206
0
0
30 Jan 2025
PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy
PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy
Linh Tran
Timothy Castiglia
Stacy Patterson
Ana Milanova
FedML
360
1
0
23 Jan 2025
Celo: Training Versatile Learned Optimizers on a Compute Diet
Celo: Training Versatile Learned Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
1.0K
0
0
22 Jan 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning AlgorithmIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Yilang Zhang
Bingcong Li
G. Giannakis
AAML
223
0
0
11 Jan 2025
Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis
Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing AnalysisInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Ruichen Luo
Sebastian U Stich
Samuel Horváth
Martin Takáč
530
2
0
08 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
1.2K
0
0
30 Dec 2024
MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes
Igor Sokolov
Peter Richtárik
318
1
0
22 Dec 2024
Sharpness-Aware Minimization with Adaptive Regularization for Training
  Deep Neural Networks
Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural NetworksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jinping Zou
Xiaoge Deng
Tao Sun
337
1
0
22 Dec 2024
Causal Invariance Learning via Efficient Nonconvex Optimization
Causal Invariance Learning via Efficient Nonconvex Optimization
Zhenyu Wang
Yifan Hu
Peter Buhlmann
Zijian Guo
471
3
0
16 Dec 2024
Towards Understanding the Role of Sharpness-Aware Minimization
  Algorithms for Out-of-Distribution Generalization
Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization
Samuel Schapiro
Han Zhao
389
1
0
06 Dec 2024
Conformal Symplectic Optimization for Stable Reinforcement Learning
Conformal Symplectic Optimization for Stable Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Yao Lyu
Xiangteng Zhang
Shengbo Eben Li
Jingliang Duan
Letian Tao
Qing Xu
Lei He
Keqiang Li
390
3
0
03 Dec 2024
Curvature in the Looking-Glass: Optimal Methods to Exploit Curvature of
  Expectation in the Loss Landscape
Curvature in the Looking-Glass: Optimal Methods to Exploit Curvature of Expectation in the Loss Landscape
Jed A. Duersch
Tommie A. Catanach
Alexander Safonov
Jeremy Wendt
349
0
0
25 Nov 2024
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for
  large-scale optimization
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization
Corrado Coppola
Lorenzo Papa
Irene Amerini
L. Palagi
ODL
405
0
0
24 Nov 2024
A Potential Game Perspective in Federated Learning
Kang Liu
Ziqi Wang
Enrique Zuazua
FedML
314
2
0
18 Nov 2024
Towards Accurate and Efficient Sub-8-Bit Integer Training
Wenjin Guo
Donglai Liu
Weiying Xie
Yunsong Li
Xuefei Ning
Zihan Meng
Shulin Zeng
Jie Lei
Zhenman Fang
Yu Wang
MQ
234
1
0
17 Nov 2024
Convergence Rate Analysis of LION
Convergence Rate Analysis of LION
Yiming Dong
Huan Li
Zhouchen Lin
282
6
0
12 Nov 2024
Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems
Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems
Matteo Lapucci
Davide Pucci
ODL
237
0
0
11 Nov 2024
Provably Faster Algorithms for Bilevel Optimization via
  Without-Replacement Sampling
Provably Faster Algorithms for Bilevel Optimization via Without-Replacement SamplingNeural Information Processing Systems (NeurIPS), 2024
Junyi Li
Heng Huang
257
1
0
07 Nov 2024
SPGD: Steepest Perturbed Gradient Descent Optimization
SPGD: Steepest Perturbed Gradient Descent Optimization
Amir M. Vahedi
Horea T. Ilies
275
2
0
07 Nov 2024
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Yoni Choukroun
Shlomi Azoulay
P. Kisilev
305
0
0
06 Nov 2024
Forecasting Outside the Box: Application-Driven Optimal Pointwise Forecasts for Stochastic Optimization
Forecasting Outside the Box: Application-Driven Optimal Pointwise Forecasts for Stochastic Optimization
Tito Homem-de-Mello
Juan Valencia
Felipe Lagos
Guido Lagos
331
1
0
05 Nov 2024
Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models
Rethinking Weight Decay for Robust Fine-Tuning of Foundation ModelsNeural Information Processing Systems (NeurIPS), 2024
Junjiao Tian
Chengyue Huang
Z. Kira
190
3
0
03 Nov 2024
Normalization Layer Per-Example Gradients are Sufficient to Predict
  Gradient Noise Scale in Transformers
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in TransformersNeural Information Processing Systems (NeurIPS), 2024
Gavia Gray
Aman Tiwari
Shane Bergsma
Joel Hestness
360
2
0
01 Nov 2024
Hierarchical mixtures of Unigram models for short text clustering: The role of Beta-Liouville priors
Hierarchical mixtures of Unigram models for short text clustering: The role of Beta-Liouville priorsAnnals of Operations Research (Ann. Oper. Res.), 2024
Massimo Bilancia
Samuele Magro
300
0
0
29 Oct 2024
Neuro-symbolic Learning Yielding Logical Constraints
Neuro-symbolic Learning Yielding Logical ConstraintsNeural Information Processing Systems (NeurIPS), 2024
Zenan Li
Yunpeng Huang
Zhaoyu Li
Xingtai Lv
Jingwei Xu
Taolue Chen
Xiaoxing Ma
Jian Lu
NAI
236
13
0
28 Oct 2024
On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training
On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training
Zhaoxian Wu
Quan-Wu Xiao
Tayfun Gokmen
H. Tsai
Kaoutar El Maghraoui
Tianyi Chen
285
2
0
19 Oct 2024
Implicit Regularization of Sharpness-Aware Minimization for
  Scale-Invariant Problems
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant ProblemsNeural Information Processing Systems (NeurIPS), 2024
Bingcong Li
Liang Zhang
Niao He
296
9
0
18 Oct 2024
Single-Timescale Multi-Sequence Stochastic Approximation Without Fixed
  Point Smoothness: Theories and Applications
Single-Timescale Multi-Sequence Stochastic Approximation Without Fixed Point Smoothness: Theories and ApplicationsIEEE Transactions on Signal Processing (IEEE TSP), 2024
Yue Huang
Zhaoxian Wu
Shiqian Ma
Qing Ling
321
2
0
17 Oct 2024
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
415
2
0
17 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
From Gradient Clipping to Normalization for Heavy Tailed SGDInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Florian Hübler
Ilyas Fatkhullin
Niao He
441
29
0
17 Oct 2024
Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$
Stability and Sharper Risk Bounds with Convergence Rate O~(1/n2)\tilde{O}(1/n^2)O~(1/n2)
Bowei Zhu
Shaojie Li
Mingyang Yi
Yong Liu
287
1
0
13 Oct 2024
Distribution-Aware Mean Estimation under User-level Local Differential
  Privacy
Distribution-Aware Mean Estimation under User-level Local Differential Privacy
Corentin Pla
Hugo Richard
Maxime Vono
FedML
189
0
0
12 Oct 2024
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus
Steven Abreu
LLMSV
766
12
0
09 Oct 2024
Extended convexity and smoothness and their applications in deep learning
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
436
0
0
08 Oct 2024
Aiding Global Convergence in Federated Learning via Local Perturbation
  and Mutual Similarity Information
Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information
Emanuel Buttaci
Giuseppe Carlo Calafiore
FedML
238
0
0
07 Oct 2024
Previous
123456...282930
Next
Page 3 of 30
Pageof 30