Kernel Density Estimation with Linked Boundary Conditions

Kernel density estimation on a finite interval poses an outstanding challenge because of the well-recognized bias at the boundaries of the interval. Motivated by an application in cancer research, we consider a boundary constraint linking the values of the unknown target density function at the boundaries. For this application, ignoring the boundary condition results in inaccurate molecular kinetics which can compromise the interpretation of results when studying cytostatic drugs for cancer treatment. We provide a kernel density estimator (KDE) that successfully incorporates this linked boundary condition, leading to a non-self-adjoint diffusion process and an expansion in non-separable generalized eigenfunctions of the spatial differential operator. The solution is rigorously analyzed through the unified transform (or Fokas method), giving rise to an integral representation in the complex plane. Our analysis confirms that the new KDE possesses many desirable properties, such as consistency and asymptotically negligible bias at the boundaries. These properties include an increased rate of approximation, as measured by the AMISE. We apply our method to the motivating example in biology and provide numerical experiments with synthetic data, including comparisons with state-of-the-art KDEs. Results suggest that the new method is fast and accurate, and compares favourably with existing methods (which currently cannot handle linked boundary constraints). Furthermore, we demonstrate how to build statistical estimators of the boundary conditions satisfied by the target function, without apriori knowledge, in tangent with the corresponding kernel. Finally, our analysis can be extended to more general boundary conditions that may be encountered in applications.
View on arXiv