ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.05970
20
0

Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models

9 May 2025
Lennart Stöpler
Rufat Asadli
Mitja Nikolaus
Ryan Cotterell
Alex Warstadt
    LRM
ArXivPDFHTML
Abstract

We propose a method for training language models in an interactive setting inspired by child language acquisition. In our setting, a speaker attempts to communicate some information to a listener in a single-turn dialogue and receives a reward if communicative success is achieved. Unlike earlier related work using image--caption data for interactive reference games, we operationalize communicative success in a more abstract language-only question--answering setting. First, we present a feasibility study demonstrating that our reward provides an indirect signal about grammaticality. Second, we conduct experiments using reinforcement learning to fine-tune language models. We observe that cognitively plausible constraints on the communication channel lead to interpretable changes in speaker behavior. However, we do not yet see improvements on linguistic evaluations from our training regime. We outline potential modifications to the task design and training configuration that could better position future work to use our methodology to observe the benefits of interaction on language learning in computational cognitive models.

View on arXiv
@article{stöpler2025_2505.05970,
  title={ Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models },
  author={ Lennart Stöpler and Rufat Asadli and Mitja Nikolaus and Ryan Cotterell and Alex Warstadt },
  journal={arXiv preprint arXiv:2505.05970},
  year={ 2025 }
}
Comments on this paper