ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXivPDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 7,062 papers shown
Title
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
Haiquan Wen
Yiwei He
Zhenglin Huang
Tianxiao Li
Zihan Yu
Xingru Huang
Lu Qi
Baoyuan Wu
Xuelong Li
Guangliang Cheng
VGen
24
0
0
19 May 2025
Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion
Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion
Zhuoheng Wang
Jinyin Zhou
Qi Wu
27
0
0
19 May 2025
Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability
Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability
Jingyi Ren
Yekun Xu
Xiaolong Wang
Weitao Li
Weizhi Ma
Yang Liu
RALM
22
0
0
19 May 2025
KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture
KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture
R. James Cotton
12
0
0
19 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELM
LRM
19
0
0
19 May 2025
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Soumya Rani Samineni
Durgesh Kalwar
Karthik Valmeekam
Kaya Stechly
Subbarao Kambhampati
OffRL
19
0
0
19 May 2025
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Hengli Li
Chenxi Li
Tong Wu
Xuekai Zhu
Yuxuan Wang
...
Eric Hanchen Jiang
Song-Chun Zhu
Zixia Jia
Ying Nian Wu
Zilong Zheng
LRM
24
0
0
19 May 2025
Action-Dependent Optimality-Preserving Reward Shaping
Action-Dependent Optimality-Preserving Reward Shaping
Grant C. Forbes
Jianxun Wang
Leonardo Villalobos-Arias
Arnav Jhala
David L. Roberts
OffRL
27
0
0
19 May 2025
LiBOG: Lifelong Learning for Black-Box Optimizer Generation
LiBOG: Lifelong Learning for Black-Box Optimizer Generation
Jiyuan Pei
Yi Mei
Jialin Liu
Mengjie Zhang
22
0
0
19 May 2025
A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut
A Dataless Reinforcement Learning Approach to Rounding Hyperplane Optimization for Max-Cut
Gabriel Malikal
Ismail R. Alkhouri
Alvaro Velasquez
Adam M Alessio
S. Ravishankar
12
0
0
19 May 2025
Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning
Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning
Wei-Chen Liao
Ti-Rong Wu
I-Chen Wu
24
0
0
19 May 2025
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
Haoyuan Wu
Rui Ming
Jilong Gao
Hangyu Zhao
Xueyi Chen
Yikai Yang
Haisheng Zheng
Zhuolun He
Bei Yu
26
0
0
19 May 2025
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Zhuo Yang
Lingli Ge
Dong Han
Tianfan Fu
Yuqiang Li
32
0
0
19 May 2025
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Dongha Lee
Jinyoung Yeo
ALM
22
0
0
19 May 2025
A universal policy wrapper with guarantees
A universal policy wrapper with guarantees
Anton Bolychev
Georgiy Malaniya
Grigory Yaremenko
Anastasia Krasnaya
Pavel Osinenko
OffRL
24
0
0
18 May 2025
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
Longxi Gao
Li Lyna Zhang
Mengwei Xu
22
0
0
18 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
24
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
27
0
0
18 May 2025
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward
Han Weng
Boyi Liu
Yuanfeng Song
Dun Zeng
Yingxiang Yang
Yi Zhan
Longjie Cui
Xiaoming Yin
Yang Sun
22
0
0
18 May 2025
Enriching Patent Claim Generation with European Patent Dataset
Enriching Patent Claim Generation with European Patent Dataset
Lekang Jiang
Chengzu Li
Stephan Goetz
22
0
0
18 May 2025
Design of a 3-DOF Hopping Robot with an Optimized Gearbox: An Intermediate Platform Toward Bipedal Robots
Design of a 3-DOF Hopping Robot with an Optimized Gearbox: An Intermediate Platform Toward Bipedal Robots
JongHun Choe
Gijeong Kim
Hajun Kim
Dongyun Kang
Min-Su Kim
Hae-Won Park
19
0
0
18 May 2025
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
19
0
0
18 May 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan
Jian Zhang
K. Li
Zhuoxuan Cai
Lujian Yao
...
Enguang Wang
Qibin Hou
Jinwei Chen
Peng-Tao Jiang
Bo Li
19
0
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Wenbin Zhang
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
14
0
0
18 May 2025
Table-R1: Region-based Reinforcement Learning for Table Understanding
Table-R1: Region-based Reinforcement Learning for Table Understanding
Zhenhe Wu
Jian Yang
Jiaheng Liu
Xianjie Wu
Changzai Pan
Jie Zhang
Yu Zhao
Shuangyong Song
Yongxiang Li
Zhoujun Li
LMTD
LRM
17
0
0
18 May 2025
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Georgiy Malaniya
Anton Bolychev
Grigory Yaremenko
Anastasia Krasnaya
Pavel Osinenko
22
0
0
18 May 2025
Growable and Interpretable Neural Control with Online Continual Learning for Autonomous Lifelong Locomotion Learning Machines
Growable and Interpretable Neural Control with Online Continual Learning for Autonomous Lifelong Locomotion Learning Machines
Arthicha Srisuchinnawong
Poramate Manoonpong
CLL
LRM
29
0
0
17 May 2025
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies
Matthew Landers
Taylor W. Killian
Thomas Hartvigsen
Afsaneh Doryab
32
0
0
17 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
24
0
0
17 May 2025
Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Isidoros Marougkas
Dhruv Metha Ramesh
Joe H. Doerr
Edgar Granados
Aravind Sivaramakrishnan
Abdeslam Boularias
Kostas E. Bekris
OffRL
17
0
0
17 May 2025
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Yiting Wang
Guoheng Sun
Wanghao Ye
Gang Qu
Ang Li
OffRL
3DV
LRM
VLM
17
0
0
17 May 2025
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
Jing Zou
Qingqiu Li
Chenyu Lian
Lihao Liu
Xiaohan Yan
Shujun Wang
Jing Qin
VLM
19
0
0
17 May 2025
PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter
PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter
Dhruv Metha Ramesh
Aravind Sivaramakrishnan
Shreesh Keskar
Kostas E. Bekris
Jingjin Yu
Abdeslam Boularias
16
0
0
17 May 2025
JULI: Jailbreak Large Language Models by Self-Introspection
JULI: Jailbreak Large Language Models by Self-Introspection
Jesson Wang
Zhanhao Hu
David Wagner
17
0
0
17 May 2025
Master Rules from Chaos: Learning to Reason, Plan, and Interact from Chaos for Tangram Assembly
Master Rules from Chaos: Learning to Reason, Plan, and Interact from Chaos for Tangram Assembly
Chao Zhao
Chunli Jiang
Lifan Luo
Guanlan Zhang
Hongyu Yu
Michael Yu Wang
Qifeng Chen
LRM
27
0
0
17 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
30
0
0
17 May 2025
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Ninghan Zhong
Steven Caro
Avraiem Iskandar
Megnath Ramesh
Stephen L. Smith
17
0
0
17 May 2025
CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World
CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World
Zoya Volovikova
G. Gorbov
Petr Kuderov
Aleksandr I. Panov
A. Skrynnik
22
0
0
17 May 2025
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Bo Yue
Shuqi Guo
Kaiyu Hu
Chujiao Wang
Benyou Wang
Kui Jia
Guiliang Liu
LRM
32
0
0
16 May 2025
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Fu-Yun Wang
Yunhao Shui
Jingtan Piao
Keqiang Sun
Hongsheng Li
34
0
0
16 May 2025
Meta-World+: An Improved, Standardized, RL Benchmark
Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
...
Ryan Julian
Jordan Terry
Isaac Woungang
Nariman Farsad
Pablo Samuel Castro
OffRL
21
0
0
16 May 2025
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
Chengyu Huang
Zhengxin Zhang
Claire Cardie
LRM
24
0
0
16 May 2025
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Manyu Li
Ruian He
Zixian Zhang
Weimin Tan
Bo Yan
VLM
24
0
0
16 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Junyao Xing
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
34
0
0
16 May 2025
REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning
REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning
Pawin Taechoyotin
Daniel Acuna
LRM
27
0
0
16 May 2025
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Kehan Long
Jorge Cortés
Nikolay Atanasov
22
0
0
16 May 2025
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
Rui Liu
Rui Xie
Zijun Yao
Yanjie Fu
Dongjie Wang
12
0
0
16 May 2025
Exploration by Random Distribution Distillation
Exploration by Random Distribution Distillation
Zhirui Fang
Kai Yang
Jian Tao
Jiafei Lyu
Lusong Li
Li Shen
Xiu Li
19
0
0
16 May 2025
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRM
24
0
0
16 May 2025
Scalability of Reinforcement Learning Methods for Dispatching in Semiconductor Frontend Fabs: A Comparison of Open-Source Models with Real Industry Datasets
Scalability of Reinforcement Learning Methods for Dispatching in Semiconductor Frontend Fabs: A Comparison of Open-Source Models with Real Industry Datasets
Patrick Stöckermann
Henning Südfeld
Alessandro Immordino
Thomas Altenmüller
Marc Wegmann
Martin Gebser
Konstantin Schekotihin
Georg Seidel
Chew Wye Chan
Fei Fei Zhang
OffRL
22
0
0
16 May 2025
Previous
12345...140141142
Next