ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.04838
  4. Cited By
Optimization Methods for Large-Scale Machine Learning
v1v2v3 (latest)

Optimization Methods for Large-Scale Machine Learning

15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
ArXiv (abs)PDFHTML

Papers citing "Optimization Methods for Large-Scale Machine Learning"

50 / 1,490 papers shown
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
435
3
0
01 Oct 2023
Robust Stochastic Optimization via Gradient Quantile Clipping
Robust Stochastic Optimization via Gradient Quantile Clipping
Ibrahim Merad
Stéphane Gaïffas
201
3
0
29 Sep 2023
High Throughput Training of Deep Surrogates from Large Ensemble Runs
High Throughput Training of Deep Surrogates from Large Ensemble RunsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Lucas Meyer
M. Schouler
R. Caulk
Alejandro Ribés
Bruno Raffin
AI4CE
179
7
0
28 Sep 2023
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Enhancing Sharpness-Aware Optimization Through Variance SuppressionNeural Information Processing Systems (NeurIPS), 2023
Bingcong Li
G. Giannakis
AAML
453
34
0
27 Sep 2023
Revisiting LARS for Large Batch Training Generalization of Neural
  Networks
Revisiting LARS for Large Batch Training Generalization of Neural NetworksIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
K. Do
Duong Nguyen
Hoa Nguyen
Long Tran-Thanh
Nguyen-Hoang Tran
Quoc-Viet Pham
AI4CEODL
354
6
0
25 Sep 2023
Robust Distributed Learning: Tight Error Bounds and Breakdown Point
  under Data Heterogeneity
Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data HeterogeneityNeural Information Processing Systems (NeurIPS), 2023
Youssef Allouah
R. Guerraoui
Nirupam Gupta
Rafael Pinot
Geovani Rizk
OOD
289
25
0
24 Sep 2023
A Novel Gradient Methodology with Economical Objective Function
  Evaluations for Data Science Applications
A Novel Gradient Methodology with Economical Objective Function Evaluations for Data Science Applications
Christian Varner
Vivak Patel
363
2
0
19 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
258
45
0
12 Sep 2023
Derivation of Coordinate Descent Algorithms from Optimal Control Theory
Derivation of Coordinate Descent Algorithms from Optimal Control Theory
I. Michael Ross
59
2
0
07 Sep 2023
Backward error analysis and the qualitative behaviour of stochastic
  optimization algorithms: Application to stochastic coordinate descent
Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descentJournal of Computational Dynamics (J. Comput. Dyn.), 2023
Stefano Di Giovacchino
D. Higham
K. Zygalakis
179
2
0
05 Sep 2023
Majorization-Minimization for sparse SVMs
Majorization-Minimization for sparse SVMs
A. Benfenati
Émilie Chouzenoux
Giorgia Franchini
Salla Latva-Aijo
Dominik Narnhofer
J. Pesquet
S. J. Scott
M. Yousefi
139
1
0
31 Aug 2023
Model-free Reinforcement Learning with Stochastic Reward Stabilization
  for Recommender Systems
Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Tianchi Cai
Shenliao Bao
Jiyan Jiang
Shiji Zhou
Wenpeng Zhang
Lihong Gu
Jinjie Gu
Guannan Zhang
OffRL
176
3
0
25 Aug 2023
SGMM: Stochastic Approximation to Generalized Method of Moments
SGMM: Stochastic Approximation to Generalized Method of Moments
Xiaohong Chen
S. Lee
Yuan Liao
M. Seo
Youngki Shin
Myunghyun Song
169
7
0
25 Aug 2023
We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual
  Learning Rate And Beyond
We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond
A. Khadangi
ODL
238
1
0
21 Aug 2023
Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Towards Understanding the Generalizability of Delayed Stochastic Gradient DescentIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xiaoge Deng
Li Shen
Shengwei Li
Tao Sun
Dongsheng Li
Dacheng Tao
351
3
0
18 Aug 2023
Max-affine regression via first-order methods
Max-affine regression via first-order methodsSIAM Journal on Mathematics of Data Science (SIMODS), 2023
Seonho Kim
Kiryung Lee
154
3
0
15 Aug 2023
Quantile Optimization via Multiple Timescale Local Search for Black-box
  Functions
Quantile Optimization via Multiple Timescale Local Search for Black-box FunctionsOperational Research (OR), 2023
Jiaqiao Hu
Meichen Song
Michael Fu
56
13
0
15 Aug 2023
Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence
  and Variance Reduction
Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance ReductionNeural Information Processing Systems (NeurIPS), 2023
Xiao-Yan Jiang
Sebastian U. Stich
243
30
0
11 Aug 2023
Almost-sure convergence of iterates and multipliers in stochastic
  sequential quadratic optimization
Almost-sure convergence of iterates and multipliers in stochastic sequential quadratic optimizationJournal of Optimization Theory and Applications (JOTA), 2023
Frank E. Curtis
Xin Jiang
Qi Wang
191
8
0
07 Aug 2023
Eva: A General Vectorized Approximation Framework for Second-order
  Optimization
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
Shaoshuai Shi
Yue Liu
221
1
0
04 Aug 2023
Hierarchical Federated Learning in Wireless Networks: Pruning Tackles
  Bandwidth Scarcity and System Heterogeneity
Hierarchical Federated Learning in Wireless Networks: Pruning Tackles Bandwidth Scarcity and System HeterogeneityIEEE Transactions on Wireless Communications (IEEE TWC), 2023
Md Ferdous Pervej
Richeng Jin
H. Dai
357
23
0
03 Aug 2023
From continuous-time formulations to discretization schemes: tensor
  trains and robust regression for BSDEs and parabolic PDEs
From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEsJournal of machine learning research (JMLR), 2023
Lorenz Richter
Leon Sallandt
Nikolas Nusken
195
8
0
28 Jul 2023
The Marginal Value of Momentum for Small Learning Rate SGD
The Marginal Value of Momentum for Small Learning Rate SGDInternational Conference on Learning Representations (ICLR), 2023
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
242
10
0
27 Jul 2023
High Probability Analysis for Non-Convex Stochastic Optimization with
  Clipping
High Probability Analysis for Non-Convex Stochastic Optimization with ClippingEuropean Conference on Artificial Intelligence (ECAI), 2023
Shaojie Li
Yong Liu
220
5
0
25 Jul 2023
Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis
Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and AnalysisIEEE Transactions on Mobile Computing (IEEE TMC), 2023
Yang Jiao
Kai Yang
Dongjin Song
351
4
0
25 Jul 2023
Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters
  and Non-ergodic Case
Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic CaseMachine-mediated learning (ML), 2023
Meixuan He
Yuqing Liang
Jinlan Liu
Dongpo Xu
235
14
0
20 Jul 2023
Enhancing Supervised Learning with Contrastive Markings in Neural
  Machine Translation Training
Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation TrainingEuropean Association for Machine Translation Conferences/Workshops (EAMT), 2023
Nathaniel Berger
M. Exel
Matthias Huck
Stefan Riezler
238
2
0
17 Jul 2023
Decentralized Local Updates with Dual-Slow Estimation and Momentum-based
  Variance-Reduction for Non-Convex Optimization
Decentralized Local Updates with Dual-Slow Estimation and Momentum-based Variance-Reduction for Non-Convex OptimizationEuropean Conference on Artificial Intelligence (ECAI), 2023
Kangyang Luo
Kunkun Zhang
Sheng Zhang
Xiang Li
Ming Gao
127
2
0
17 Jul 2023
Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality
Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality
Ziyang Wei
Wanrong Zhu
Wei Biao Wu
353
6
0
13 Jul 2023
Transgressing the boundaries: towards a rigorous understanding of deep
  learning and its (non-)robustness
Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
C. Hartmann
Lorenz Richter
AAML
206
2
0
05 Jul 2023
TablEye: Seeing small Tables through the Lens of Images
TablEye: Seeing small Tables through the Lens of Images
Seungeun Lee
Sang-Chul Lee
LMTD
244
2
0
04 Jul 2023
Systematic Investigation of Sparse Perturbed Sharpness-Aware
  Minimization Optimizer
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization OptimizerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Mi
Li Shen
Tianhe Ren
Weihao Ye
Tianshuo Xu
Xiaoshuai Sun
Tongliang Liu
Rongrong Ji
Dacheng Tao
AAML
256
3
0
30 Jun 2023
Training Deep Surrogate Models with Large Scale Online Learning
Training Deep Surrogate Models with Large Scale Online LearningInternational Conference on Machine Learning (ICML), 2023
Lucas Meyer
M. Schouler
R. Caulk
Alejandro Ribés
Bruno Raffin
3DGSAI4CE
181
8
0
28 Jun 2023
G-TRACER: Expected Sharpness Optimization
G-TRACER: Expected Sharpness Optimization
John R. Williams
Stephen J. Roberts
148
0
0
24 Jun 2023
Efficient preconditioned stochastic gradient descent for estimation in
  latent variable models
Efficient preconditioned stochastic gradient descent for estimation in latent variable modelsInternational Conference on Machine Learning (ICML), 2023
C. Baey
Maud Delattre
E. Kuhn
Jean-Benoist Léger
Sarah Lemler
148
6
0
22 Jun 2023
Don't be so Monotone: Relaxing Stochastic Line Search in
  Over-Parameterized Models
Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized ModelsNeural Information Processing Systems (NeurIPS), 2023
Leonardo Galli
Holger Rauhut
Mark Schmidt
215
17
0
22 Jun 2023
Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective
  and Improved Bounds
Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds
Xu Cai
Cheuk Yin Lin
Jelena Diakonikolas
FedML
250
6
0
21 Jun 2023
MimiC: Combating Client Dropouts in Federated Learning by Mimicking
  Central Updates
MimiC: Combating Client Dropouts in Federated Learning by Mimicking Central UpdatesIEEE Transactions on Mobile Computing (IEEE TMC), 2023
Yuchang Sun
Yuyi Mao
Jinchao Zhang
FedML
263
23
0
21 Jun 2023
Adaptive Federated Learning with Auto-Tuned Clients
Adaptive Federated Learning with Auto-Tuned ClientsInternational Conference on Learning Representations (ICLR), 2023
Junhyung Lyle Kim
Taha Toghani
César A. Uribe
Anastasios Kyrillidis
FedML
557
14
0
19 Jun 2023
Bootstrapped Representations in Reinforcement Learning
Bootstrapped Representations in Reinforcement LearningInternational Conference on Machine Learning (ICML), 2023
Charline Le Lan
Stephen Tu
Mark Rowland
Anna Harutyunyan
Rishabh Agarwal
Marc G. Bellemare
Will Dabney
OffRLOODSSL
254
12
0
16 Jun 2023
Schema-learning and rebinding as mechanisms of in-context learning and
  emergence
Schema-learning and rebinding as mechanisms of in-context learning and emergenceNeural Information Processing Systems (NeurIPS), 2023
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
223
22
0
16 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and
  Lipschitz Constant
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
212
0
0
15 Jun 2023
Robustly Learning a Single Neuron via Sharpness
Robustly Learning a Single Neuron via SharpnessInternational Conference on Machine Learning (ICML), 2023
Puqian Wang
Nikos Zarifis
Ilias Diakonikolas
Jelena Diakonikolas
188
13
0
13 Jun 2023
GQFedWAvg: Optimization-Based Quantized Federated Learning in General
  Edge Computing Systems
GQFedWAvg: Optimization-Based Quantized Federated Learning in General Edge Computing SystemsIEEE Transactions on Wireless Communications (IEEE TWC), 2023
Yangchen Li
Ying Cui
Vincent K. N. Lau
FedML
253
4
0
13 Jun 2023
Analysis of the Relative Entropy Asymmetry in the Regularization of
  Empirical Risk Minimization
Analysis of the Relative Entropy Asymmetry in the Regularization of Empirical Risk MinimizationInternational Symposium on Information Theory (ISIT), 2023
Francisco Daunas
I. Esnaola
S. Perlaza
H. Vincent Poor
248
23
0
12 Jun 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous
  Updates
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous UpdatesACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2023
Efstathia Soufleri
Gang Yan
Maroun Touma
Jian Li
260
7
0
11 Jun 2023
Improving Accelerated Federated Learning with Compression and Importance
  Sampling
Improving Accelerated Federated Learning with Compression and Importance Sampling
Michal Grudzieñ
Grigory Malinovsky
Peter Richtárik
FedML
280
11
0
05 Jun 2023
Integrated Sensing, Computation, and Communication for UAV-assisted
  Federated Edge Learning
Integrated Sensing, Computation, and Communication for UAV-assisted Federated Edge LearningIEEE Transactions on Wireless Communications (IEEE TWC), 2023
Yao Tang
Guangxu Zhu
Wei Xu
M. H. Cheung
T. Lok
Shuguang Cui
171
17
0
05 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically EquivalentInternational Conference on Machine Learning (ICML), 2023
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Weilong Dai
Dacheng Tao
663
19
0
05 Jun 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan
Yuanzhi Li
304
54
0
31 May 2023
Previous
123...789...282930
Next
Page 8 of 30
Pageof 30