76
v1v2 (latest)

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

Abstract

We consider the constrained sampling problem where the goal is to sample from a target distribution π(x)ef(x)\pi(x)\propto e^{-f(x)} when xx is constrained to lie on a convex body C\mathcal{C}. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem into an unconstrained sampling problem by introducing a penalty function for constraint violations. When ff is smooth and gradients are available, we get O~(d/ε10)\tilde{\mathcal{O}}(d/\varepsilon^{10}) iteration complexity for PLD to sample the target up to an ε\varepsilon-error where the error is measured in the TV distance and O~()\tilde{\mathcal{O}}(\cdot) hides logarithmic factors. For PULMC, we improve the result to O~(d/ε7)\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7}) when the Hessian of ff is Lipschitz and the boundary of C\mathcal{C} is sufficiently smooth. To our knowledge, these are the first convergence results for underdamped Langevin Monte Carlo methods in the constrained sampling that handle non-convex ff and provide guarantees with the best dimension dependency among existing methods with deterministic gradient. If unbiased stochastic estimates of the gradient of ff are available, we propose PSGLD and PSGULMC methods that can handle stochastic gradients and are scaleable to large datasets without requiring Metropolis-Hasting correction steps. For PSGLD and PSGULMC, when ff is strongly convex and smooth, we obtain O~(d/ε18)\tilde{\mathcal{O}}(d/\varepsilon^{18}) and O~(dd/ε39)\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39}) iteration complexity in W2 distance. When ff is smooth and can be non-convex, we provide finite-time performance bounds and iteration complexity results. Finally, we illustrate the performance on Bayesian LASSO regression and Bayesian constrained deep learning problems.

View on arXiv
Comments on this paper