351
v1v2 (latest)

Wasserstein Distributionally Robust Nonparametric Regression

Main:63 Pages
Bibliography:3 Pages
Abstract

Wasserstein distributionally robust optimization (WDRO) strengthens statistical learning under model uncertainty by minimizing the local worst-case risk within a prescribed ambiguity set. Although WDRO has been extensively studied in parametric settings, its theoretical properties in nonparametric frameworks remain underexplored. This paper investigates WDRO for nonparametric regression. We first establish a structural distinction based on the order kk of the Wasserstein distance, showing that k=1k=1 induces Lipschitz-type regularization, whereas k>1k > 1 corresponds to gradient-norm regularization. To address model misspecification, we analyze the excess local worst-case risk, deriving non-asymptotic error bounds for estimators constructed using norm-constrained feedforward neural networks. This analysis is supported by new covering number and approximation bounds that simultaneously control both the function and its gradient. The proposed estimator achieves a convergence rate of n2β/(d+2β)n^{-2\beta/(d+2\beta)} up to logarithmic factors, where β\beta depends on the target's smoothness and network parameters. This rate is shown to be minimax optimal under conditions commonly satisfied in high-dimensional settings. Moreover, these bounds on the excess local worst-case risk imply guarantees on the excess natural risk, ensuring robustness against any distribution within the ambiguity set. We show the framework's generality across regression and classification problems. Simulation studies and an application to the MNIST dataset further illustrate the estimator's robustness.

View on arXiv
Comments on this paper