ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.04838
  4. Cited By
Optimization Methods for Large-Scale Machine Learning
v1v2v3 (latest)

Optimization Methods for Large-Scale Machine Learning

15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
ArXiv (abs)PDFHTML

Papers citing "Optimization Methods for Large-Scale Machine Learning"

50 / 1,490 papers shown
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in
  Distributed SGD
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
388
203
0
03 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency AnalysisACM Computing Surveys (CSUR), 2018
Tal Ben-Nun
Torsten Hoefler
GNN
318
772
0
26 Feb 2018
GPU Accelerated Sub-Sampled Newton's Method
GPU Accelerated Sub-Sampled Newton's Method
Sudhir B. Kylasa
Farbod Roosta-Khorasani
Michael W. Mahoney
A. Grama
ODL
160
8
0
26 Feb 2018
Complex-valued Neural Networks with Non-parametric Activation Functions
Complex-valued Neural Networks with Non-parametric Activation Functions
Simone Scardapane
S. Van Vaerenbergh
Amir Hussain
A. Uncini
184
92
0
22 Feb 2018
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Luca Venturi
Afonso S. Bandeira
Joan Bruna
337
75
0
18 Feb 2018
Convergence of Online Mirror Descent
Convergence of Online Mirror Descent
Yunwen Lei
Ding-Xuan Zhou
140
23
0
18 Feb 2018
Stochastic quasi-Newton with adaptive step lengths for large-scale
  problems
Stochastic quasi-Newton with adaptive step lengths for large-scale problems
A. Wills
Thomas B. Schon
131
9
0
12 Feb 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
274
241
0
11 Feb 2018
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel
  Time Using Mobile Location Data
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
Susan Athey
David M. Blei
Rob Donnelly
Francisco J. R. Ruiz
Tobias Schmidt
166
68
0
22 Jan 2018
Optimal Convergence for Distributed Learning with Stochastic Gradient
  Methods and Spectral Algorithms
Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms
Junhong Lin
Volkan Cevher
171
35
0
22 Jan 2018
Rover Descent: Learning to optimize by learning to navigate on
  prototypical loss surfaces
Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces
Louis Faury
Flavian Vasile
150
2
0
22 Jan 2018
When Does Stochastic Gradient Algorithm Work Well?
When Does Stochastic Gradient Algorithm Work Well?
Lam M. Nguyen
Nam H. Nguyen
Dzung Phan
Jayant Kalagnanam
K. Scheinberg
139
15
0
18 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for
  scaling Deep Learning
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
142
21
0
11 Jan 2018
Gradient-based Optimization for Regression in the Functional
  Tensor-Train Format
Gradient-based Optimization for Regression in the Functional Tensor-Train Format
Alex A. Gorodetsky
J. Jakeman
244
37
0
03 Jan 2018
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
A Stochastic Trust Region Algorithm Based on Careful Step NormalizationINFORMS Journal on Optimization (JIO), 2017
Frank E. Curtis
K. Scheinberg
R. Shi
196
52
0
29 Dec 2017
Geometrical Insights for Implicit Generative Modeling
Geometrical Insights for Implicit Generative Modeling
Léon Bottou
Martín Arjovsky
David Lopez-Paz
Maxime Oquab
225
50
0
21 Dec 2017
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems
  over Large Graphs
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large GraphsIEEE Transactions on Automatic Control (TAC), 2017
Adil Salim
Pascal Bianchi
W. Hachem
162
17
0
19 Dec 2017
Parallel Complexity of Forward and Backward Propagation
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
172
8
0
18 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
312
313
0
18 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
163
20
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
289
149
0
06 Dec 2017
A two-dimensional decomposition approach for matrix completion through
  gossip
A two-dimensional decomposition approach for matrix completion through gossip
Mukul Bhutani
Bamdev Mishra
99
0
0
21 Nov 2017
Convergent Block Coordinate Descent for Training Tikhonov Regularized
  Deep Neural Networks
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks
Ziming Zhang
M. Brand
107
77
0
20 Nov 2017
BPGrad: Towards Global Optimality in Deep Learning via Branch and
  Pruning
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Ziming Zhang
Yuanwei Wu
Guanghui Wang
ODL
193
28
0
19 Nov 2017
Accelerated Method for Stochastic Composition Optimization with
  Nonsmooth Regularization
Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
Zhouyuan Huo
Bin Gu
Ji Liu
Heng-Chiao Huang
221
53
0
10 Nov 2017
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and
  Complements
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements
Francisco J. R. Ruiz
Susan Athey
David M. Blei
616
97
0
09 Nov 2017
Analysis of Biased Stochastic Gradient Descent Using Sequential
  Semidefinite Programs
Analysis of Biased Stochastic Gradient Descent Using Sequential Semidefinite Programs
Bin Hu
Peter M. Seiler
Laurent Lessard
294
41
0
03 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
680
1,080
0
01 Nov 2017
Adaptive Sampling Strategies for Stochastic Optimization
Adaptive Sampling Strategies for Stochastic Optimization
Raghu Bollapragada
R. Byrd
J. Nocedal
111
128
0
30 Oct 2017
On the role of synaptic stochasticity in training low-precision neural
  networks
On the role of synaptic stochasticity in training low-precision neural networksPhysical Review Letters (PRL), 2017
Carlo Baldassi
Federica Gerace
H. Kappen
Carlo Lucibello
Luca Saglietti
Enzo Tartaglione
R. Zecchina
198
23
0
26 Oct 2017
Avoiding Communication in Proximal Methods for Convex Optimization
  Problems
Avoiding Communication in Proximal Methods for Convex Optimization Problems
Saeed Soori
Aditya Devarakonda
J. Demmel
Mert Gurbuzbalaban
M. Dehnavi
139
7
0
24 Oct 2017
Smart "Predict, then Optimize"
Smart "Predict, then Optimize"Management Sciences (MS), 2017
Adam N. Elmachtoub
Paul Grigas
480
735
0
22 Oct 2017
Convergence diagnostics for stochastic gradient descent with constant
  step size
Convergence diagnostics for stochastic gradient descent with constant step size
Jerry Chee
Panos Toulis
191
14
0
17 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text
  Recognition
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
155
10
0
10 Oct 2017
SGD for robot motion? The effectiveness of stochastic optimization on a
  new benchmark for biped locomotion tasks
SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks
Martim Brandao
K. Hashimoto
A. Takanishi
91
6
0
09 Oct 2017
Training Feedforward Neural Networks with Standard Logistic Activations
  is Feasible
Training Feedforward Neural Networks with Standard Logistic Activations is Feasible
Emanuele Sansone
F. D. De Natale
106
4
0
03 Oct 2017
How regularization affects the critical points in linear networks
How regularization affects the critical points in linear networks
Amirhossein Taghvaei
Jin-Won Kim
P. Mehta
151
13
0
27 Sep 2017
On Principal Components Regression, Random Projections, and Column
  Subsampling
On Principal Components Regression, Random Projections, and Column Subsampling
M. Slawski
171
22
0
23 Sep 2017
Feedforward and Recurrent Neural Networks Backward Propagation and
  Hessian in Matrix Form
Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form
Maxim Naumov
95
9
0
16 Sep 2017
ClickBAIT: Click-based Accelerated Incremental Training of Convolutional
  Neural Networks
ClickBAIT: Click-based Accelerated Incremental Training of Convolutional Neural Networks
Ervin Teng
João Diogo Falcão
Bob Iannucci
136
15
0
15 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient
  Descent for Nonconvex Problems
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
V. Patel
MLT
110
8
0
14 Sep 2017
Second-Order Optimization for Non-Convex Machine Learning: An Empirical
  Study
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
Peng Xu
Farbod Roosta-Khorasani
Michael W. Mahoney
ODL
203
156
0
25 Aug 2017
Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian
  Information
Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information
Peng Xu
Farbod Roosta-Khorasani
Michael W. Mahoney
580
220
0
23 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
429
526
0
23 Aug 2017
Regularizing and Optimizing LSTM Language Models
Regularizing and Optimizing LSTM Language ModelsInternational Conference on Learning Representations (ICLR), 2017
Stephen Merity
N. Keskar
R. Socher
344
1,147
0
07 Aug 2017
On the convergence properties of a $K$-step averaging stochastic
  gradient descent algorithm for nonconvex optimization
On the convergence properties of a KKK-step averaging stochastic gradient descent algorithm for nonconvex optimization
Fan Zhou
Guojing Cong
414
243
0
03 Aug 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAMLODL
238
47
0
26 Jul 2017
Warped Riemannian metrics for location-scale models
Warped Riemannian metrics for location-scale models
Salem Said
Lionel Bombrun
Y. Berthoumieu
202
14
0
22 Jul 2017
Stochastic, Distributed and Federated Optimization for Machine Learning
Stochastic, Distributed and Federated Optimization for Machine Learning
Jakub Konecný
FedML
195
38
0
04 Jul 2017
Optimization Methods for Supervised Machine Learning: From Linear Models
  to Deep Learning
Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning
Frank E. Curtis
K. Scheinberg
194
48
0
30 Jun 2017
Previous
123...282930
Next
Page 29 of 30
Pageof 30