289

A Mathematical Model for Linguistic Universals

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Abstract

Inspired by chemical kinetics and neurobiology, we propose a mathematical theory for pattern recurrence in text documents, applicable to a wide variety of languages. We present a Markov model at the discourse level for Steven Pinker's ``mentalese'', or chains of mental states that transcend the spoken/written forms. Such (potentially) universal temporal structures of textual patterns lead us to a language-independent semantic representation, or a translationally-invariant word embedding, thereby forming the common ground for both comprehensibility within a given language and translatability between different languages. Applying our model to documents of moderate lengths, without relying on external knowledge bases, we reconcile Noam Chomsky's ``poverty of stimulus'' paradox with statistical learning of natural languages.

View on arXiv
Comments on this paper