141

A corpus-based toy model for DisCoCat

Abstract

We construct an abstract categorical model for DisCoCat starting from a generic corpus annotated with constituent structure trees. Concretely, we will work with context-free grammars \`{a} la Chomsky, but Combinatory Categorial Grammar (CCG) and dependency grammars could also be used. We begin by dividing words in the corpus according to three semantic functions: (i) object words, directly modelled in the semantic space; (ii) modifier words, acting on individual object words; (iii) interaction words, connecting the meaning of distinct object words. We then consider the compact closed symmetric monoidal category of RR-semimodules over an involutive commutative semiring RR, and we model object words as vectors in a free RR-semimodule H\mathcal{H}, constructed from the corpus. Based on the grammatical structure annotating the corpus, we use Frobenius algebras to model modifier words as unary operators on H\mathcal{H}, and interaction words as binary operators on H\mathcal{H}. We discuss some possible future extensions and improvements of this model.

View on arXiv
Comments on this paper