337
v1v2 (latest)

Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Main:5 Pages
12 Figures
Bibliography:2 Pages
9 Tables
Appendix:10 Pages
Abstract

Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.

View on arXiv
Comments on this paper