131

Business Entity Entropy

Main:23 Pages
14 Figures
Bibliography:1 Pages
2 Tables
Abstract

Organizations generate vast amounts of interconnected content across various platforms. While language models enable sophisticated reasoning for use in business applications, retrieving and contextualizing information from organizational memory remains challenging. We explore this challenge through the lens of entropy, proposing a measure of entity entropy to quantify the distribution of an entity's knowledge across documents as well as a novel generative model inspired by diffusion models in order to provide an explanation for observed behaviours. Empirical analysis on a large-scale enterprise corpus reveals heavy-tailed entropy distributions, a correlation between entity size and entropy, and category-specific entropy patterns. These findings suggest that not all entities are equally retrievable, motivating the need for entity-centric retrieval or pre-processing strategies for a subset of, but not all, entities. We discuss practical implications and theoretical models to guide the design of more efficient knowledge retrieval systems.

View on arXiv
Comments on this paper