ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.17338
57
1

Capturing Individual Human Preferences with Reward Features

21 March 2025
André Barreto
Vincent Dumoulin
Yiran Mao
Nicolas Perez-Nieves
Bobak Shahriari
Yann Dauphin
Doina Precup
Hugo Larochelle
    ALM
ArXivPDFHTML
Abstract

Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.

View on arXiv
@article{barreto2025_2503.17338,
  title={ Capturing Individual Human Preferences with Reward Features },
  author={ André Barreto and Vincent Dumoulin and Yiran Mao and Nicolas Perez-Nieves and Bobak Shahriari and Yann Dauphin and Doina Precup and Hugo Larochelle },
  journal={arXiv preprint arXiv:2503.17338},
  year={ 2025 }
}
Comments on this paper