Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers

Over the past decade, various studies have addressed how speakers solve the so-called `The Paradigm Cell Filling Problem' (PCFP) \citep{ackerman2009parts} across different languages. The PCFP addresses a fundamental question in morphological processing: how do speakers accurately generate inflected forms of words when presented with incomplete paradigms? This problem is particularly salient when modeling complex inflectional systems. We focus on Spanish verbal paradigms, where certain verbs follow an irregular L-shaped pattern, where the first-person singular present indicative stem matches the stem used throughout the present subjunctive mood. We formulate the problem as a morphological reinflection task. Specifically, we investigate the role of input frequency in the acquisition of regular versus irregular L-shaped patterns in transformer models. By systematically manipulating the input distributions and analyzing model behavior, we reveal four key findings: 1) Models perform better on L-shaped verbs compared to regular verbs, especially in uneven frequency conditions; 2) Robust primacy effects are observed, but no consistent recency effects; 3) Memorization becomes more prominent as the proportion of L-shaped verbs increases; 4) There is a tendency to regularize L-shaped verbs when their consonant alternation pairs are rare or absent in the training data.
View on arXiv@article{ramarao2025_2410.21013, title={ Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers }, author={ Akhilesh Kakolu Ramarao and Kevin Tang and Dinah Baer-Henney }, journal={arXiv preprint arXiv:2410.21013}, year={ 2025 } }