49
4

A Global Stochastic Optimization Particle Filter Algorithm

Abstract

We introduce a new algorithm to learn on the fly the parameter value θ:=argmaxθΘE[logfθ(Y0)]\theta_\star:=\mathrm{argmax}_{\theta\in\Theta}\mathbb{E}[\log f_\theta(Y_0)] from a sequence (Yt)t1(Y_t)_{t\geq 1} of independent copies of Y0Y_0, with {fθ,θΘRd}\{f_\theta,\,\theta\in\Theta\subseteq\mathbb{R}^d\} a parametric model. The main idea of the proposed approach is to define a sequence (π~t)t1(\tilde{\pi}_t)_{t\geq 1} of probability distributions on Θ\Theta which (i) is shown to concentrate on θ\theta_\star as tt\rightarrow\infty and (ii) can be estimated in an online fashion by means of a standard particle filter (PF) algorithm. The sequence (π~t)t1(\tilde{\pi}_t)_{t\geq 1} depends on a learning rate ht0h_t\rightarrow 0, with the slower hth_t converges to zero the greater is the ability of the PF approximation π~tN\tilde{\pi}_t^N of π~t\tilde{\pi}_t to escape from a local optimum of the objective function, but the slower is the rate at which π~t\tilde{\pi}_t concentrates on θ\theta_\star. To conciliate ability to escape from a local optimum and fast convergence towards θ\theta_\star we exploit the acceleration property of averaging, well-known in the stochastic gradient descent literature, by letting θˉtN:=t1s=1tΘθ π~sN(dθ)\bar{\theta}_t^N:=t^{-1}\sum_{s=1}^t \int_{\Theta}\theta\ \tilde{\pi}_s^N(\mathrm{d} \theta) be the proposed estimator of θ\theta_\star. Our numerical experiments suggest that θˉtN\bar{\theta}_t^N converges to θ\theta_\star at the optimal t1/2t^{-1/2} rate in challenging models and in situations where π~tN\tilde{\pi}_t^N concentrates on this parameter value at a slower rate. We illustrate the practical usefulness of the proposed optimization algorithm for online parameter learning and for computing the maximum likelihood estimator.

View on arXiv
Comments on this paper