23
11

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Abstract

Functions of the ratio of the densities p/qp/q are widely used in machine learning to quantify the discrepancy between the two distributions pp and qq. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well separated, estimating the density ratio with a binary classifier is challenging. In this work, we show that the state-of-the-art density ratio estimators perform poorly on well-separated cases and demonstrate that this is due to distribution shifts between training and evaluation time. We present an alternative method that leverages multi-class classification for density ratio estimation and does not suffer from distribution shift issues. The method uses a set of auxiliary densities {mk}k=1K\{m_k\}_{k=1}^K and trains a multi-class logistic regression to classify the samples from p,qp, q, and {mk}k=1K\{m_k\}_{k=1}^K into K+2K+2 classes. We show that if these auxiliary densities are constructed such that they overlap with pp and qq, then a multi-class logistic regression allows for estimating logp/q\log p/q on the domain of any of the K+2K+2 distributions and resolves the distribution shift problems of the current state-of-the-art methods. We compare our method to state-of-the-art density ratio estimators on both synthetic and real datasets and demonstrate its superior performance on the tasks of density ratio estimation, mutual information estimation, and representation learning. Code: https://www.blackswhan.com/mdre/

View on arXiv
Comments on this paper