In this paper, we propose a new method for determining shared features of and measuring the distance between data sets or point clouds. Our approach uses the joint factorization of two data matrices into non-negative matrices to derive a similarity measure that determines how well the shared basis approximates . We also propose a point cloud distance measure built upon this method and the learned factorization. Our method reveals structural differences in both image and text data. Potential applications include classification, detecting plagiarism or other manipulation, data denoising, and transfer learning.
View on arXiv