ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.09187
14
26

Vision-Language Models as a Source of Rewards

14 December 2023
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
Harris Chan
Gheorghe Comanici
Sebastian Flennerhag
Maxime Gazeau
Kristian Holsheimer
Dan Horgan
Michael Laskin
Clare Lyle
Hussain Masoom
Kay McKinney
Volodymyr Mnih
Alexander Neitz
Dmitry Nikulin
Fabio Pardo
Jack Parker-Holder
John Quan
Tim Rocktaschel
Himanshu Sahni
Tom Schaul
Yannick Schroecker
Stephen Spencer
Richie Steigerwald
Luyu Wang
Lei Zhang
    VLM
    LRM
ArXivPDFHTML
Abstract

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.

View on arXiv
Comments on this paper