ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.3803
24
260

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions

14 November 2014
Mengdi Wang
Ethan X. Fang
Han Liu
ArXivPDFHTML
Abstract

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., problems of the form min⁡xEv[fv(Ew[gw(x)])]\min_x \mathbf{E}_v [f_v\big(\mathbf{E}_w [g_w(x)]\big)]minx​Ev​[fv​(Ew​[gw​(x)])]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of fv,gwf_v,g_{w}fv​,gw​ and use an auxiliary variable to track the unknown quantity Ew[gw(x)]\mathbf{E}_w[g_w(x)]Ew​[gw​(x)]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieve a convergence rate of O(k−1/4)O(k^{-1/4})O(k−1/4) in the general case and O(k−2/3)O(k^{-2/3})O(k−2/3) in the strongly convex case, after taking kkk samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k−2/7)O(k^{-2/7})O(k−2/7) in the general case and O(k−4/5)O(k^{-4/5})O(k−4/5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

View on arXiv
Comments on this paper