44

Combinatorial optimization of the coefficient of determination

Marc Harary
Main:7 Pages
3 Figures
Bibliography:1 Pages
2 Tables
Abstract

Robust correlation analysis is among the most critical challenges in statistics. Herein, we develop an efficient algorithm for selecting the kk- subset of nn points in the plane with the highest coefficient of determination (R2)\left( R^2 \right). Drawing from combinatorial geometry, we propose a method called the \textit{quadratic sweep} that consists of two steps: (i) projectively lifting the data points into R5\mathbb R^5 and then (ii) iterating over each linearly separable kk-subset. Its basis is that the optimal set of outliers is separable from its complement in R2\mathbb R^2 by a conic section, which, in R5\mathbb R^5, can be found by a topological sweep in Θ(n5logn)\Theta \left( n^5 \log n \right) time. Although key proofs of quadratic separability remain underway, we develop strong mathematical intuitions for our conjectures, then experimentally demonstrate our method's optimality over several million trials up to n=30n=30 without error. Implementations in Julia and fully seeded, reproducible experiments are available at https://github.com/marc-harary/QuadraticSweep.

View on arXiv
Comments on this paper