19
0

IDF revisited: A simple new derivation within the Robertson-Spärck Jones probabilistic model

Lillian Lee
Abstract

There have been a number of prior attempts to theoretically justify the effectiveness of the inverse document frequency (IDF). Those that take as their starting point Robertson and Sparck Jones's probabilistic model are based on strong or complex assumptions. We show that a more intuitively plausible assumption suffices. Moreover, the new assumption, while conceptually very simple, provides a solution to an estimation problem that had been deemed intractable by Robertson and Walker (1997).

View on arXiv
Comments on this paper