ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.12822
  4. Cited By
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

15 June 2025
Tung M. Luu
Younghwan Lee
Donghoon Lee
Sunho Kim
Min Jun Kim
Chang D. Yoo
    ALMVLM
ArXiv (abs)PDFHTML

Papers citing "Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models"

6 / 6 papers shown
Title
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
Derek Shi
Ruben Glatt
Christine Klymko
Shubham Mohole
Hongjun Choi
Shashank Kushwaha
Sam Sakla
Felipe Leno Da Silva
AI4TSVLM
156
0
0
02 Oct 2025
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
Xiefeng Wu
Jing Zhao
Shu Zhang
Mingyu Hu
OffRL
60
1
0
25 Sep 2025
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Zongxia Li
Wenhao Yu
Chengsong Huang
Rui Liu
Zhenwen Liang
...
Jingxi Che
Dian Yu
Jordan L. Boyd-Graber
Haitao Mi
Dong Yu
ReLMVLMLRM
110
31
0
27 Aug 2025
Occlusion-robust Stylization for Drawing-based 3D Animation
Occlusion-robust Stylization for Drawing-based 3D Animation
Sunjae Yoon
Gwanhyeong Koo
Younghwan Lee
Ji Woo Hong
C. Yoo
3DH
140
1
0
01 Aug 2025
Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
Tung M. Luu
Donghoon Lee
Younghwan Lee
Chang D. Yoo
OffRL
139
0
0
31 Jul 2025
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton
Ji Woo Hong
Chang D. Yoo
VGen
242
3
0
08 Apr 2025
1