A corpus-based toy model for DisCoCat

We construct an abstract categorical model for DisCoCat starting from a generic corpus annotated with constituent structure trees. Concretely, we will work with context-free grammars \`{a} la Chomsky, but Combinatory Categorial Grammar (CCG) and dependency grammars could also be used. We begin by dividing words in the corpus according to three semantic functions: (i) object words, directly modelled in the semantic space; (ii) modifier words, acting on individual object words; (iii) interaction words, connecting the meaning of distinct object words. We then consider the compact closed symmetric monoidal category of -semimodules over an involutive commutative semiring , and we model object words as vectors in a free -semimodule , constructed from the corpus. Based on the grammatical structure annotating the corpus, we use Frobenius algebras to model modifier words as unary operators on , and interaction words as binary operators on . We discuss some possible future extensions and improvements of this model.
View on arXiv