ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.04838
  4. Cited By
Optimization Methods for Large-Scale Machine Learning

Optimization Methods for Large-Scale Machine Learning

15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
ArXivPDFHTML

Papers citing "Optimization Methods for Large-Scale Machine Learning"

50 / 1,407 papers shown
Title
Decoupled Parallel Backpropagation with Convergence Guarantee
Decoupled Parallel Backpropagation with Convergence Guarantee
Zhouyuan Huo
Bin Gu
Qian Yang
Heng-Chiao Huang
15
97
0
27 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
37
659
0
20 Apr 2018
Constant Step Size Stochastic Gradient Descent for Probabilistic
  Modeling
Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling
Dmitry Babichev
Francis R. Bach
25
9
0
16 Apr 2018
E-commerce Anomaly Detection: A Bayesian Semi-Supervised Tensor
  Decomposition Approach using Natural Gradients
E-commerce Anomaly Detection: A Bayesian Semi-Supervised Tensor Decomposition Approach using Natural Gradients
Anil R. Yelundur
Srinivasan H. Sengamedu
Bamdev Mishra
13
1
0
11 Apr 2018
Sequence Training of DNN Acoustic Models With Natural Gradient
Sequence Training of DNN Acoustic Models With Natural Gradient
Adnan Haider
P. Woodland
26
7
0
06 Apr 2018
Probabilistic Contraction Analysis of Iterated Random Operators
Probabilistic Contraction Analysis of Iterated Random Operators
Abhishek Gupta
Rahul Jain
Peter Glynn
6
9
0
04 Apr 2018
A Constant Step Stochastic Douglas-Rachford Algorithm with Application
  to Non Separable Regularizations
A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
Adil Salim
Pascal Bianchi
W. Hachem
15
2
0
03 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
12
306
0
01 Apr 2018
A Common Framework for Natural Gradient and Taylor based Optimisation
  using Manifold Theory
A Common Framework for Natural Gradient and Taylor based Optimisation using Manifold Theory
Adnan Haider
8
2
0
26 Mar 2018
Lower error bounds for the stochastic gradient descent optimization
  algorithm: Sharp convergence rates for slowly and fast decaying learning
  rates
Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
Arnulf Jentzen
Philippe von Wurstemberger
75
31
0
22 Mar 2018
Group Normalization
Group Normalization
Yuxin Wu
Kaiming He
45
3,596
0
22 Mar 2018
Efficient FPGA Implementation of Conjugate Gradient Methods for
  Laplacian System using HLS
Efficient FPGA Implementation of Conjugate Gradient Methods for Laplacian System using HLS
Sahithi Rampalli
N. Sehgal
Ishita Bindlish
Tanya Tyagi
Pawan Kumar
13
4
0
10 Mar 2018
A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex
  Optimization
A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization
Andre Milzarek
X. Xiao
Shicong Cen
Zaiwen Wen
M. Ulbrich
15
36
0
09 Mar 2018
WNGrad: Learn the Learning Rate in Gradient Descent
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
22
86
0
07 Mar 2018
Energy-entropy competition and the effectiveness of stochastic gradient
  descent in machine learning
Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning
Yao Zhang
Andrew M. Saxe
Madhu S. Advani
A. Lee
18
59
0
05 Mar 2018
DAGs with NO TEARS: Continuous Optimization for Structure Learning
DAGs with NO TEARS: Continuous Optimization for Structure Learning
Xun Zheng
Bryon Aragam
Pradeep Ravikumar
Eric Xing
NoLa
CML
OffRL
19
914
0
04 Mar 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in
  Distributed SGD
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
31
193
0
03 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
702
0
26 Feb 2018
GPU Accelerated Sub-Sampled Newton's Method
GPU Accelerated Sub-Sampled Newton's Method
Sudhir B. Kylasa
Farbod Roosta-Khorasani
Michael W. Mahoney
A. Grama
ODL
26
8
0
26 Feb 2018
Complex-valued Neural Networks with Non-parametric Activation Functions
Complex-valued Neural Networks with Non-parametric Activation Functions
Simone Scardapane
S. Van Vaerenbergh
Amir Hussain
A. Uncini
23
81
0
22 Feb 2018
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Luca Venturi
Afonso S. Bandeira
Joan Bruna
32
74
0
18 Feb 2018
Convergence of Online Mirror Descent
Convergence of Online Mirror Descent
Yunwen Lei
Ding-Xuan Zhou
23
20
0
18 Feb 2018
Stochastic quasi-Newton with adaptive step lengths for large-scale
  problems
Stochastic quasi-Newton with adaptive step lengths for large-scale problems
A. Wills
Thomas B. Schon
24
9
0
12 Feb 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
24
226
0
11 Feb 2018
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel
  Time Using Mobile Location Data
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
Susan Athey
David M. Blei
Rob Donnelly
Francisco J. R. Ruiz
Tobias Schmidt
14
66
0
22 Jan 2018
Optimal Convergence for Distributed Learning with Stochastic Gradient
  Methods and Spectral Algorithms
Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms
Junhong Lin
V. Cevher
14
34
0
22 Jan 2018
Rover Descent: Learning to optimize by learning to navigate on
  prototypical loss surfaces
Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces
Louis Faury
Flavian Vasile
15
2
0
22 Jan 2018
When Does Stochastic Gradient Algorithm Work Well?
When Does Stochastic Gradient Algorithm Work Well?
Lam M. Nguyen
Nam H. Nguyen
Dzung Phan
Jayant Kalagnanam
K. Scheinberg
24
15
0
18 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for
  scaling Deep Learning
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
23
20
0
11 Jan 2018
Gradient-based Optimization for Regression in the Functional
  Tensor-Train Format
Gradient-based Optimization for Regression in the Functional Tensor-Train Format
Alex A. Gorodetsky
J. Jakeman
19
32
0
03 Jan 2018
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
Frank E. Curtis
K. Scheinberg
R. Shi
27
45
0
29 Dec 2017
Geometrical Insights for Implicit Generative Modeling
Geometrical Insights for Implicit Generative Modeling
Léon Bottou
Martín Arjovsky
David Lopez-Paz
Maxime Oquab
32
49
0
21 Dec 2017
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems
  over Large Graphs
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs
Adil Salim
Pascal Bianchi
W. Hachem
25
17
0
19 Dec 2017
Parallel Complexity of Forward and Backward Propagation
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
16
8
0
18 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
24
287
0
18 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
8
19
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
24
136
0
06 Dec 2017
A two-dimensional decomposition approach for matrix completion through
  gossip
A two-dimensional decomposition approach for matrix completion through gossip
Mukul Bhutani
Bamdev Mishra
16
0
0
21 Nov 2017
Convergent Block Coordinate Descent for Training Tikhonov Regularized
  Deep Neural Networks
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks
Ziming Zhang
M. Brand
26
70
0
20 Nov 2017
BPGrad: Towards Global Optimality in Deep Learning via Branch and
  Pruning
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Ziming Zhang
Yuanwei Wu
Guanghui Wang
ODL
32
28
0
19 Nov 2017
Accelerated Method for Stochastic Composition Optimization with
  Nonsmooth Regularization
Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
Zhouyuan Huo
Bin Gu
Ji Liu
Heng-Chiao Huang
29
50
0
10 Nov 2017
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and
  Complements
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements
Francisco J. R. Ruiz
Susan Athey
David M. Blei
24
85
0
09 Nov 2017
Analysis of Biased Stochastic Gradient Descent Using Sequential
  Semidefinite Programs
Analysis of Biased Stochastic Gradient Descent Using Sequential Semidefinite Programs
Bin Hu
Peter M. Seiler
Laurent Lessard
18
39
0
03 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
13
978
0
01 Nov 2017
Adaptive Sampling Strategies for Stochastic Optimization
Adaptive Sampling Strategies for Stochastic Optimization
Raghu Bollapragada
R. Byrd
J. Nocedal
11
115
0
30 Oct 2017
On the role of synaptic stochasticity in training low-precision neural
  networks
On the role of synaptic stochasticity in training low-precision neural networks
Carlo Baldassi
Federica Gerace
H. Kappen
C. Lucibello
Luca Saglietti
Enzo Tartaglione
R. Zecchina
9
23
0
26 Oct 2017
Avoiding Communication in Proximal Methods for Convex Optimization
  Problems
Avoiding Communication in Proximal Methods for Convex Optimization Problems
Saeed Soori
Aditya Devarakonda
J. Demmel
Mert Gurbuzbalaban
M. Dehnavi
27
7
0
24 Oct 2017
Smart "Predict, then Optimize"
Smart "Predict, then Optimize"
Adam N. Elmachtoub
Paul Grigas
22
578
0
22 Oct 2017
Convergence diagnostics for stochastic gradient descent with constant
  step size
Convergence diagnostics for stochastic gradient descent with constant step size
Jerry Chee
Panos Toulis
8
11
0
17 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text
  Recognition
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
24
10
0
10 Oct 2017
Previous
123...26272829
Next