Word Embeddings Inherently Recover the Conceptual Organization of the Human Mind

Machine learning is a means to uncover deep patterns from rich sources of data. Here, we find that machine learning can recover the conceptual organization of the human mind when applied to the natural language use of millions of people. Utilizing text from billions of webpages, we recover most of the concepts contained in English, Dutch, and Japanese, as represented in large scale Word Association networks. Our results justify machine learning as a means to probe the human mind, at a depth and scale that has been unattainable using self-report and observational methods. Beyond direct psychological applications, our methods may prove useful for projects concerned with defining, assessing, relating, or uncovering concepts in any scientific field.
View on arXiv