Joint NMF for Identification of Shared Features in Datasets and a
Dataset Distance Measure
Annual Conference on Information Sciences and Systems (CISS), 2022
Abstract
In this paper, we derive a new method for determining shared features of datasets by employing joint non-negative matrix factorization and analyzing the resulting factorizations. Our approach uses the joint factorization of two dataset matrices into non-negative matrices to derive a similarity measure that determines how well a shared basis for approximates each dataset. We also propose a dataset distance measure built upon this method and the learned factorization. Our method is able to successfully identity differences in structure in both image and text datasets. Potential applications include classification, detecting plagiarism or other manipulation, and learning relationships between data sets.
View on arXivComments on this paper
