Comparisons Are All You Need for Optimizing Smooth Functions

When optimizing machine learning models, there are various scenarios where gradient computations are challenging or even infeasible. Furthermore, in reinforcement learning (RL), preference-based RL that only compares between options has wide applications, including reinforcement learning with human feedback in large language models. In this paper, we systematically study optimization of a smooth function only assuming an oracle that compares function values at two points and tells which is larger. When is convex, we give two algorithms using and comparison queries to find an -optimal solution, respectively. When is nonconvex, our algorithm uses comparison queries to find an -approximate stationary point. All these results match the best-known zeroth-order algorithms with function evaluation queries in dependence, thus suggest that \emph{comparisons are all you need for optimizing smooth functions using derivative-free methods}. In addition, we also give an algorithm for escaping saddle points and reaching an -second order stationary point of a nonconvex , using comparison queries.
View on arXiv