Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.04838
Cited By
Optimization Methods for Large-Scale Machine Learning
15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimization Methods for Large-Scale Machine Learning"
50 / 1,407 papers shown
Title
Decoupled Parallel Backpropagation with Convergence Guarantee
Zhouyuan Huo
Bin Gu
Qian Yang
Heng-Chiao Huang
15
97
0
27 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
37
659
0
20 Apr 2018
Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling
Dmitry Babichev
Francis R. Bach
25
9
0
16 Apr 2018
E-commerce Anomaly Detection: A Bayesian Semi-Supervised Tensor Decomposition Approach using Natural Gradients
Anil R. Yelundur
Srinivasan H. Sengamedu
Bamdev Mishra
13
1
0
11 Apr 2018
Sequence Training of DNN Acoustic Models With Natural Gradient
Adnan Haider
P. Woodland
26
7
0
06 Apr 2018
Probabilistic Contraction Analysis of Iterated Random Operators
Abhishek Gupta
Rahul Jain
Peter Glynn
6
9
0
04 Apr 2018
A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
Adil Salim
Pascal Bianchi
W. Hachem
15
2
0
03 Apr 2018
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
12
306
0
01 Apr 2018
A Common Framework for Natural Gradient and Taylor based Optimisation using Manifold Theory
Adnan Haider
8
2
0
26 Mar 2018
Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
Arnulf Jentzen
Philippe von Wurstemberger
75
31
0
22 Mar 2018
Group Normalization
Yuxin Wu
Kaiming He
45
3,596
0
22 Mar 2018
Efficient FPGA Implementation of Conjugate Gradient Methods for Laplacian System using HLS
Sahithi Rampalli
N. Sehgal
Ishita Bindlish
Tanya Tyagi
Pawan Kumar
13
4
0
10 Mar 2018
A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization
Andre Milzarek
X. Xiao
Shicong Cen
Zaiwen Wen
M. Ulbrich
15
36
0
09 Mar 2018
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
22
86
0
07 Mar 2018
Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning
Yao Zhang
Andrew M. Saxe
Madhu S. Advani
A. Lee
18
59
0
05 Mar 2018
DAGs with NO TEARS: Continuous Optimization for Structure Learning
Xun Zheng
Bryon Aragam
Pradeep Ravikumar
Eric Xing
NoLa
CML
OffRL
19
914
0
04 Mar 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
31
193
0
03 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
702
0
26 Feb 2018
GPU Accelerated Sub-Sampled Newton's Method
Sudhir B. Kylasa
Farbod Roosta-Khorasani
Michael W. Mahoney
A. Grama
ODL
26
8
0
26 Feb 2018
Complex-valued Neural Networks with Non-parametric Activation Functions
Simone Scardapane
S. Van Vaerenbergh
Amir Hussain
A. Uncini
23
81
0
22 Feb 2018
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Luca Venturi
Afonso S. Bandeira
Joan Bruna
32
74
0
18 Feb 2018
Convergence of Online Mirror Descent
Yunwen Lei
Ding-Xuan Zhou
23
20
0
18 Feb 2018
Stochastic quasi-Newton with adaptive step lengths for large-scale problems
A. Wills
Thomas B. Schon
24
9
0
12 Feb 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
24
226
0
11 Feb 2018
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
Susan Athey
David M. Blei
Rob Donnelly
Francisco J. R. Ruiz
Tobias Schmidt
14
66
0
22 Jan 2018
Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms
Junhong Lin
V. Cevher
14
34
0
22 Jan 2018
Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces
Louis Faury
Flavian Vasile
15
2
0
22 Jan 2018
When Does Stochastic Gradient Algorithm Work Well?
Lam M. Nguyen
Nam H. Nguyen
Dzung Phan
Jayant Kalagnanam
K. Scheinberg
24
15
0
18 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
23
20
0
11 Jan 2018
Gradient-based Optimization for Regression in the Functional Tensor-Train Format
Alex A. Gorodetsky
J. Jakeman
19
32
0
03 Jan 2018
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
Frank E. Curtis
K. Scheinberg
R. Shi
27
45
0
29 Dec 2017
Geometrical Insights for Implicit Generative Modeling
Léon Bottou
Martín Arjovsky
David Lopez-Paz
Maxime Oquab
32
49
0
21 Dec 2017
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs
Adil Salim
Pascal Bianchi
W. Hachem
25
17
0
19 Dec 2017
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
16
8
0
18 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
24
287
0
18 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
8
19
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
24
136
0
06 Dec 2017
A two-dimensional decomposition approach for matrix completion through gossip
Mukul Bhutani
Bamdev Mishra
16
0
0
21 Nov 2017
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks
Ziming Zhang
M. Brand
26
70
0
20 Nov 2017
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Ziming Zhang
Yuanwei Wu
Guanghui Wang
ODL
32
28
0
19 Nov 2017
Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
Zhouyuan Huo
Bin Gu
Ji Liu
Heng-Chiao Huang
29
50
0
10 Nov 2017
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements
Francisco J. R. Ruiz
Susan Athey
David M. Blei
24
85
0
09 Nov 2017
Analysis of Biased Stochastic Gradient Descent Using Sequential Semidefinite Programs
Bin Hu
Peter M. Seiler
Laurent Lessard
18
39
0
03 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
13
978
0
01 Nov 2017
Adaptive Sampling Strategies for Stochastic Optimization
Raghu Bollapragada
R. Byrd
J. Nocedal
11
115
0
30 Oct 2017
On the role of synaptic stochasticity in training low-precision neural networks
Carlo Baldassi
Federica Gerace
H. Kappen
C. Lucibello
Luca Saglietti
Enzo Tartaglione
R. Zecchina
9
23
0
26 Oct 2017
Avoiding Communication in Proximal Methods for Convex Optimization Problems
Saeed Soori
Aditya Devarakonda
J. Demmel
Mert Gurbuzbalaban
M. Dehnavi
27
7
0
24 Oct 2017
Smart "Predict, then Optimize"
Adam N. Elmachtoub
Paul Grigas
22
578
0
22 Oct 2017
Convergence diagnostics for stochastic gradient descent with constant step size
Jerry Chee
Panos Toulis
8
11
0
17 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
24
10
0
10 Oct 2017
Previous
1
2
3
...
26
27
28
29
Next