Generics are puzzling. Can language models find the missing piece?

International Conference on Computational Linguistics (COLING), 2024

15 December 2024

Gustavo Cilleruelo Calderón

Main:8 Pages

7 Figures

Bibliography:4 Pages

11 Tables

Appendix:6 Pages

Abstract

Generic sentences express generalisations about the world without explicit quantification. Although generics are central to everyday communication, building a precise semantic framework has proven difficult, in part because speakers use generics to generalise properties with widely different statistical prevalence. In this work, we study the implicit quantification and context-sensitivity of generics by leveraging language models as models of language. We create ConGen, a dataset of 2873 naturally occurring generic and quantified sentences in context, and define p-acceptability, a metric based on surprisal that is sensitive to quantification. Our experiments show generics are more context-sensitive than determiner quantifiers and about 20% of naturally occurring generics we analyze express weak generalisations. We also explore how human biases in stereotypes can be observed in language models.

View on arXiv

Comments on this paper