ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.20751
  4. Cited By
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

28 August 2025
Y. Wang
Zhimin Li
Yuhang Zang
Yujie Zhou
Jiazi Bu
Chunyu Wang
Qinglin Lu
Cheng Jin
Jiaqi Wang
    EGVM
ArXiv (abs)PDFHTMLHuggingFace (85 upvotes)Github (24224★)

Papers citing "Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning"

10 / 10 papers shown
Title
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
Feng Han
Y. Wang
Chenglin Li
Zheming Liang
Dianyi Wang
...
Zhipeng Wei
Chao Gong
Cheng Jin
Yue Yu
J. Wang
56
0
0
03 Nov 2025
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Yifu Luo
Penghui Du
Bo Li
Sinan Du
Tiantian Zhang
Yongzhe Chang
Kai Wu
Kun Gai
Xueqian Wang
64
0
0
24 Oct 2025
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
Yibin Wang
Zhimin Li
Yuhang Zang
Jiazi Bu
Yujie Zhou
...
Junjun He
Chunyu Wang
Qinglin Lu
Cheng Jin
J. Wang
EGVMVLM
137
1
0
21 Oct 2025
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Zongjian Li
Zheyuan Liu
Qihui Zhang
Bin Lin
Feize Wu
...
Wangbo Yu
Yuwei Niu
Shaodong Wang
Xinhua Cheng
Li Yuan
195
1
0
19 Oct 2025
Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning
Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning
Boyin Liu
Zhuo Zhang
Sen Huang
Lipeng Xie
Qingxu Fu
...
LI YU
Tianyi Hu
Zhaoyang Liu
Bolin Ding
Dongbin Zhao
89
0
0
17 Oct 2025
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
Mansi Sakarvadia
Kareem Hegazy
A. Totounferoush
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
SupR
160
8
0
08 Oct 2025
Smart-GRPO: Smartly Sampling Noise for Efficient RL of Flow-Matching Models
Smart-GRPO: Smartly Sampling Noise for Efficient RL of Flow-Matching Models
Benjamin Yu
Jackie Liu
Justin Cui
76
0
0
03 Oct 2025
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Xiangwei Shen
Zhimin Li
Zhantao Yang
Shiyi Zhang
Yingfang Zhang
Donghao Li
Chunyu Wang
Qinglin Lu
Yansong Tang
149
6
0
08 Sep 2025
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Ouxiang Li
Yuan Wang
Xinting Hu
Huijuan Huang
Rui Chen
Jiarong Ou
Xin Tao
Pengfei Wan
Xiaojuan Qi
Fuli Feng
EGVMCoGeLRM
165
2
0
03 Sep 2025
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Yibin Wang
Zhimin Li
Yuhang Zang
Chunyu Wang
Qinglin Lu
Cheng Jin
Jinqiao Wang
LRM
348
30
0
06 May 2025
1