Learning from Conditional Distributions via Dual Kernel Embeddings
In many machine learning problems, such as policy evaluation in reinforcement learning and learning with invariance, each data point itself is a conditional distribution , and we want to learn a function which links these conditional distributions to target values . The learning problem becomes very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution . Commonly used approaches either assume that is independent of , or require an overwhelmingly large sample size from each conditional distribution. To address these challenges, we propose a novel approach which reformulates the original problem into a min-max optimization problem. In the new view, we only need to deal with the kernel embedding of the joint distribution which is easy to estimate. Furthermore, we design an efficient learning algorithm based on mirror descent stochastic approximation, and establish the sample complexity for learning from conditional distributions. Finally, numerical experiments in both synthetic and real data show that our method can significantly improve over the previous state-of-the-arts.
View on arXiv