ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.12921
  4. Cited By
Vision-Language Models are Zero-Shot Reward Models for Reinforcement
  Learning
v1v2 (latest)

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

19 October 2023
Juan Rocamonde
Victoriano Montesinos
Elvis Nava
Ethan Perez
David Lindner
    VLM
ArXiv (abs)PDFHTMLHuggingFace (20 upvotes)

Papers citing "Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning"

50 / 86 papers shown
Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning
Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning
Q. Wang
Mian Wu
Y. Zhang
Mingqi Yuan
Wenyao Zhang
Haoxiang You
Yunbo Wang
Xin Jin
Xiaokang Yang
Wenjun Zeng
VGen
190
1
0
30 Nov 2025
Leveraging LLMs for reward function design in reinforcement learning control tasks
Leveraging LLMs for reward function design in reinforcement learning control tasks
Franklin Cardenoso
Wouter Caarls
124
1
0
24 Nov 2025
AutoFocus-IL: VLM-based Saliency Maps for Data-Efficient Visual Imitation Learning without Extra Human Annotations
AutoFocus-IL: VLM-based Saliency Maps for Data-Efficient Visual Imitation Learning without Extra Human Annotations
Litian Gong
Fatemeh Bahrani
Yutai Zhou
Amin Banayeeanzade
Jiachen Li
Erdem Bıyık
185
2
0
23 Nov 2025
Automated Reward Design for Gran Turismo
Automated Reward Design for Gran Turismo
Michel Ma
Takuma Seno
K. Subramanian
Peter R. Wurman
Peter Stone
Craig Sherstan
231
1
0
03 Nov 2025
World-in-World: World Models in a Closed-Loop World
World-in-World: World Models in a Closed-Loop World
Jiahan Zhang
Muqing Jiang
Nanru Dai
Taiming Lu
Arda Uzunoglu
...
Rama Chellappa
Tianmin Shu
Alan Yuille
Yilun Du
Jieneng Chen
VGenVLM
283
13
0
20 Oct 2025
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
Roger Creus Castanyer
Faisal Mohamed
Pablo Samuel Castro
Cyrus Neary
Glen Berseth
OffRLLRMAI4CE
266
2
0
16 Oct 2025
CDE: Concept-Driven Exploration for Reinforcement Learning
CDE: Concept-Driven Exploration for Reinforcement Learning
Le Mao
Andrew H. Liu
Renos Zabounidis
Zachary Kingston
Joseph Campbell
Joseph Campbell
131
0
0
09 Oct 2025
Zero-shot reasoning for simulating scholarly peer-review
Zero-shot reasoning for simulating scholarly peer-review
Khalid M. Saqr
152
0
0
02 Oct 2025
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Qinsi Wang
Bo Liu
Tianyi Zhou
Jing Shi
Yueqian Lin
Yiran Chen
Hai Helen Li
Kun Wan
Wentian Zhao
OffRLVLMLRM
160
12
0
29 Sep 2025
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
Abdul Monaf Chowdhury
Akm Moshiur Rahman Mazumder
Rabeya Akter
S. Arib
LM&Ro
172
1
0
27 Sep 2025
OpenGVL -- Benchmarking Visual Temporal Progress for Data Curation
OpenGVL -- Benchmarking Visual Temporal Progress for Data Curation
Paweł Budzianowski
Emilia Wisnios
Gracjan Góral
Igor Kulakov
Viktor Petrenko
Krzysztof Walas
Krzysztof Walas
227
2
0
22 Sep 2025
CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks
CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks
Seoyeon Choi
Kanghyun Ryu
Jonghoon Ock
Negar Mehr
216
2
0
17 Sep 2025
Human-Aligned Procedural Level Generation Reinforcement Learning via Text-Level-Sketch Shared Representation
Human-Aligned Procedural Level Generation Reinforcement Learning via Text-Level-Sketch Shared Representation
In-Chang Baek
Seoyoung Lee
Sung-Hyun Kim
Geumhwan Hwang
KyungJoong Kim
141
1
0
13 Aug 2025
Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
Tung M. Luu
Donghoon Lee
Younghwan Lee
Chang D. Yoo
OffRL
250
3
0
31 Jul 2025
GoalLadder: Incremental Goal Discovery with Vision-Language Models
GoalLadder: Incremental Goal Discovery with Vision-Language Models
Alexey Zakharov
Shimon Whiteson
331
1
0
19 Jun 2025
Reward Models in Deep Reinforcement Learning: A Survey
Reward Models in Deep Reinforcement Learning: A SurveyInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Rui Yu
Shenghua Wan
Yucen Wang
Chen-Xiao Gao
Le Gan
Zongzhang Zhang
De-Chuan Zhan
OffRL
206
17
0
18 Jun 2025
RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills
RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills
Chunru Lin
Haotian Yuan
Yian Wang
Xiaowen Qiu
Tsun-Hsuan Wang
Minghao Guo
Bohan Wang
Yashraj S. Narang
Dieter Fox
Chuang Gan
242
4
0
17 Jun 2025
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
Tung M. Luu
Younghwan Lee
Donghoon Lee
Sunho Kim
Min Jun Kim
Chang D. Yoo
ALMVLM
230
9
0
15 Jun 2025
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models
Christos Ziakas
Alessandra Russo
TTA
352
0
0
11 Jun 2025
Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
Tennison Liu
M. Schaar
AIFinLRM
438
6
0
05 Jun 2025
DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework for Autonomous Driving
DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework for Autonomous Driving
Dawood Wasif
T. Moore
Chandan K Reddy
Jin-Hee Cho
Seunghyun Yoon
Hyuk Lim
Dan Dongseong Kim
Jin-Hee Cho
VLMLRM
228
0
0
01 Jun 2025
TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning
TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning
Yuhui Chen
Haoran Li
Zhennan Jiang
Haowei Wen
Dongbin Zhao
333
10
0
26 May 2025
Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation
Sample Efficient Reinforcement Learning via Large Vision Language Model DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Donghoon Lee
Tung M. Luu
Younghwan Lee
Chang D. Yoo
OffRLVLM
353
1
0
16 May 2025
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
Sumedh Anand Sontakke
Joseph J Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRLLM&Ro
472
34
0
16 May 2025
MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos
MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos
Xinyu Wang
Xinming Zhang
Yanjun Chen
Xiaoyu Shen
Wei Zhang
316
0
0
13 May 2025
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with DemonstrationsIEEE International Conference on Robotics and Automation (ICRA), 2025
Shuaiyi Huang
Mara Levy
Anubhav Gupta
Daniel Ekpo
Ruijie Zheng
Abhinav Shrivastava
304
6
0
09 May 2025
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRLVLM
401
3
0
06 May 2025
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li
Zhi Gao
Bofei Zhang
Yapeng Mi
Xiaojian Ma
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LLMAG
699
0
0
30 Apr 2025
PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations
PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations
Haowen Sun
Jian Shu
Chengzhong Ma
Shaolong Zhang
Jiawei Ye
Xingyu Chen
Xuguang Lan
OffRL
304
1
0
29 Apr 2025
Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
Shilin Zhang
Zican Hu
Wenhao Wu
Xinyi Xie
Jianxiang Tang
Chunlin Chen
Daoyi Dong
Yu Cheng
Zhenhong Sun
Zhi Wang
OffRL
1.2K
0
0
21 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
428
39
0
12 Apr 2025
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical SkillComputer Vision and Pattern Recognition (CVPR), 2025
Jieming Cui
Tengyu Liu
Ziyu Meng
Jiale Yu
Ran Song
Wei Zhang
Yixin Zhu
Siyuan Huang
VLM
468
7
0
05 Apr 2025
Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning
Reward Generation via Large Vision-Language Model in Offline Reinforcement LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Younghwan Lee
Tung M. Luu
Donghoon Lee
Chang D. Yoo
3DVVLMOffRL
388
1
0
03 Apr 2025
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning
Chak Lam Shek
Erfaun Noorani
262
4
0
24 Mar 2025
LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning
LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning
Chan Kim
Seung-Woo Seo
Seong-Woo Kim
OODD
1.2K
1
0
21 Mar 2025
Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models
Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models
Zhaoxin Li
Zhang Xi-Jia
Batuhan Altundas
Letian Chen
Rohan R. Paleja
Matthew C. Gombolay
OffRL
427
1
0
20 Mar 2025
PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing
PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing
Yanjia Huang
Renjie Li
Zhengzhong Tu
VGen
388
1
0
17 Mar 2025
LuciBot: Automated Robot Policy Learning from Generated Videos
LuciBot: Automated Robot Policy Learning from Generated Videos
Xiaowen Qiu
Yian Wang
Jiting Cai
Zhehuan Chen
Chunru Lin
Tsun-Hsuan Wang
Chuang Gan
LM&RoVGen
417
3
0
12 Mar 2025
Provably Correct Automata Embeddings for Optimal Automata-Conditioned Reinforcement Learning
Provably Correct Automata Embeddings for Optimal Automata-Conditioned Reinforcement Learning
Beyazit Yalcinkaya
Niklas Lauffer
Marcell Vazquez-Chanlatte
Sanjit A. Seshia
OffRL
428
3
0
06 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
927
4
0
05 Mar 2025
Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
Adrià López Escoriza
Nicklas Hansen
Stone Tao
Tongzhou Mu
H. Su
OffRL
378
5
0
03 Mar 2025
SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models
SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models
Cansu Sancaktar
Christian Gumbsch
Antonios Tragoudaras
Pavel Kolev
Georg Martius
LM&RoVLM
900
4
0
03 Mar 2025
Offline RLAIF: Piloting VLM Feedback for RL via SFO
Offline RLAIF: Piloting VLM Feedback for RL via SFO
Jacob Beck
OffRL
594
0
0
02 Mar 2025
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Subtask-Aware Visual Reward Learning from Segmented DemonstrationsInternational Conference on Learning Representations (ICLR), 2025
Changyeon Kim
Minho Heo
Doohyun Lee
Jinwoo Shin
Honglak Lee
Joseph J. Lim
Kimin Lee
265
6
0
28 Feb 2025
The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning
The Evolving Landscape of LLM- and VLM-Integrated Reinforcement LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Sheila Schoepp
Masoud Jafaripour
Yingyue Cao
Zhenxing Ge
Fatemeh Abdollahi
Shadan Golestan
Zahin Sufiyan
Osmar Zaiane
Matthew E. Taylor
OffRLLM&Ro
504
10
0
24 Feb 2025
Imitation Learning from a Single Temporally Misaligned Video
Imitation Learning from a Single Temporally Misaligned Video
William Huey
Huaxiaoyue Wang
Anne Wu
Yoav Artzi
Sanjiban Choudhury
AI4TS
503
3
0
08 Feb 2025
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Udita Ghosh
Dripta S. Raychaudhuri
Jiachen Li
Konstantinos Karydis
Amit K. Roy-Chowdhury
VLM
294
2
0
03 Feb 2025
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation
Dianwei Chen
Zifan Zhang
Yuchen Liu
Xianfeng Terry Yang
VLM
578
8
0
01 Feb 2025
LLM-Based Offline Learning for Embodied Agents via Consistency-Guided
  Reward Ensemble
LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward EnsembleConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yujeong Lee
Sangwoo Shin
Wei-Jin Park
Honguk Woo
OffRL3DV
332
3
0
26 Nov 2024
Vision Language Models are In-Context Value Learners
Vision Language Models are In-Context Value LearnersInternational Conference on Learning Representations (ICLR), 2024
Yecheng Jason Ma
Joey Hejna
Ayzaan Wahid
Chuyuan Fu
Dhruv Shah
...
Dinesh Jayaraman
Wenhao Yu
Tingnan Zhang
Dorsa Sadigh
Fei Xia
287
59
0
07 Nov 2024
12
Next
Page 1 of 2