410

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

Abstract

Recently, large-scale pre-trained language models have demonstrated impressive performance on several commonsense benchmark datasets. However, building machines with common-sense to compose realistically plausible sentences remains challenging. In this paper, we present a constrained text generation task, CommonGen associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. Given a set of common concepts (e.g., {dog, frisbee, catch, throw}); the task is to generate a coherent sentence describing an everyday scenario using these concepts (e.g., "a man throws a frisbee and his dog catches it"). CommonGen is challenging because it inherently requires 1) relational reasoning using background commonsense knowledge, and 2) compositional generalization ability to work on unseen concept combinations. Our dataset, constructed through a combination of crowdsourcing and existing caption corpora, consists of 30k concept-sets and 50k sentences. Experiments show that there is a large gap between state-of-the-art text generation models (e.g., T5) and human performance (30.6% v.s. 63.5% in SPICE metric). The models struggle at the task, often generating grammatically sound yet realistically implausible sentences -- pointing to interesting future research.

View on arXiv
Comments on this paper