88

Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM

International Cross-Domain Conference on Machine Learning and Knowledge Extraction (CD-MAKE), 2020
Main:12 Pages
8 Figures
Bibliography:2 Pages
8 Tables
Appendix:6 Pages
Abstract

The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.

View on arXiv
Comments on this paper