v1v2 (latest)

Conditional Mean and Variance Estimation via \textit{k}-NN Algorithm with Automated Variance Selection

2 February 2024

Marcos Matabuena

J. Vidal

Oscar Hernan Madrid Padilla

J. Onnela

ArXiv (abs)PDF HTML

Main:23 Pages

4 Figures

Bibliography:6 Pages

26 Tables

Appendix:19 Pages

Abstract

We introduce a novel \textit{k}-nearest neighbor (\textit{k}-NN) regression method for joint estimation of the conditional mean and variance. The proposed algorithm preserves the computational efficiency and manifold-learning capabilities of classical non-parametric \textit{k}-NN models, while integrating a data-driven variable selection step that improves empirical performance. By accurately estimating both conditional mean and variance regression functions, the method effectively reconstructs the conditional distribution and density functions for multiple families of scale-and-localization generative models. We show that our estimator can achieve fast convergence rates, and we derive practical rules for selecting the smoothing parameter~ $k$ that enhance the precision of the algorithm in finite sample regimes. Extensive simulations for low, moderate and large-dimensional covariate spaces, together with a real-world biomedical application, demonstrate that the proposed method can consistently outperform the conventional \textit{k-NN} regression algorithm while being more interpretable in the model output.

View on arXiv

Comments on this paper