20
7

Multiscale Analysis of Count Data through Topic Alignment

Abstract

Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics KK. Since there is no definitive way to choose KK and since a true value might not exist, we develop techniques to study the relationships across models with different KK. This can show how many topics are consistently present across different models, if a topic is only transiently present, or if a topic splits in two when KK increases. This strategy gives more insight into the process generating the data than choosing a single value of KK would. We design a visual representation of these cross-model relationships, which we call a topic alignment, and present three diagnostics based on it. We show the effectiveness of these tools for interpreting the topics on simulated and real data, and we release an accompanying R package, \href{https://lasy.github.io/alto}{\texttt{alto}}.

View on arXiv
Comments on this paper