ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,419 papers shown
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI ArchitecturesInternational Symposium on Computer Architecture (ISCA), 2025
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yun Wang
Yuxuan Liu
Y. X. Wei
MoE
253
33
0
24 Dec 2025
Tactile-based Object Retrieval From Granular Media
Tactile-based Object Retrieval From Granular Media
Jingxi Xu
Yinsen Jia
Dongxiao Yang
Patrick Meng
Xinyue Zhu
Zihan Guo
Shuran Song
M. Ciocarlie
195
11
0
24 Dec 2025
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
C2^22GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
Haotian Liu
Shuo Wang
Hongteng Xu
LRM
181
0
0
24 Dec 2025
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Rongxin Cheng
Kai Zhou
Xingda Wei
Siyuan Liu
Mingcong Han
...
Yeju Zhou
Baoquan Zhong
W. L. Xiao
Rong Chen
Haibo Chen
OffRLLRM
436
0
0
24 Dec 2025
Reinforcement Learning for Large Model: A Survey
Reinforcement Learning for Large Model: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
316
2
0
24 Dec 2025
RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow
RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow
Liang Yao
Fan Liu
Hongbo Lu
Chuanyi Zhang
Rui Min
Shengxiang Xu
Shimin Di
Pai Peng
LRM
235
7
0
24 Dec 2025
Deformable Cluster Manipulation via Whole-Arm Policy Learning
Deformable Cluster Manipulation via Whole-Arm Policy Learning
Jayadeep Jacob
Wenzheng Zhang
Houston Warren
Paulo Borges
T. Bandyopadhyay
Fabio Ramos
219
0
0
24 Dec 2025
CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent
CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent
Leyang Shen
Y. Zhang
Chun Kai Ling
Xiaoyan Zhao
Tat-Seng Chua
131
0
0
04 Dec 2025
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Hyeongyu Kang
Jaewoo Lee
Woocheol Shin
Kiyoung Om
Jinkyoo Park
101
0
0
04 Dec 2025
Learning to Orchestrate Agents in Natural Language with the Conductor
Learning to Orchestrate Agents in Natural Language with the Conductor
Stefan Nielsen
Edoardo Cetin
Peter Schwendeman
Qi Sun
Jinglue Xu
Yujin Tang
LLMAG
100
1
0
04 Dec 2025
Structured Document Translation via Format Reinforcement Learning
Structured Document Translation via Format Reinforcement Learning
Haiyue Song
Johannes Eschbach-Dymanus
Hour Kaing
Sumire Honda
Hideki Tanaka
Bianka Buschbeck
Masao Utiyama
60
0
0
04 Dec 2025
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Cong Wang
Changfeng Gao
Yang Xiang
Zhihao Du
Keyu An
Han Zhao
Qian Chen
Xiangang Li
Yingming Gao
Ya Li
35
0
0
04 Dec 2025
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Joey Hong
Kang Liu
Zhan Ling
Jiecao Chen
Sergey Levine
LLMAGOffRL
160
0
0
04 Dec 2025
Using Machine Learning to Take Stay-or-Go Decisions in Data-driven Drone Missions
Using Machine Learning to Take Stay-or-Go Decisions in Data-driven Drone Missions
Giorgos Polychronis
Foivos Pournaropoulos
C. Antonopoulos
S. Lalis
252
0
0
04 Dec 2025
Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty
Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty
Kailiang Liu
Ying Chen
Ralf Borndörfer
Thorsten Koch
7
0
0
04 Dec 2025
LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving
LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving
Muyu Pan
Matthew Walter
Dheeraj Kodakandla
Mahfuza Farooque
28
0
0
04 Dec 2025
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
Chengyang He
Ge Sun
Yue Bai
Junkai Lu
Jiadong Zhao
Guillaume Sartoretti
143
0
0
04 Dec 2025
Value Gradient Guidance for Flow Matching Alignment
Value Gradient Guidance for Flow Matching Alignment
Zhen Liu
Tim Z. Xiao
Carles Domingo-Enrich
Weiyang Liu
Dinghuai Zhang
57
0
0
04 Dec 2025
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
Junjie Zheng
Chunbo Hao
Guobin Ma
Xiaoyu Zhang
Gongyu Chen
Chaofan Ding
Zihao Chen
Lei Xie
DiffM
157
0
0
04 Dec 2025
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
Wenlong Deng
Yushu Li
Boying Gong
Yi Ren
Christos Thrampoulidis
Xiaoxiao Li
52
2
0
03 Dec 2025
PretrainZero: Reinforcement Active Pretraining
PretrainZero: Reinforcement Active Pretraining
Xingrun Xing
Zhiyuan Fan
Jie Lou
G. Li
Jiajun Zhang
Debing Zhang
OffRLAIMatReLMLRMAI4CE
443
1
0
03 Dec 2025
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Jingyang Ou
Jiaqi Han
Minkai Xu
Shaoxuan Xu
Jianwen Xie
Stefano Ermon
Yi Wu
Chongxuan Li
DiffM
120
0
0
03 Dec 2025
Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning
Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning
Ying-Kuan Tsai
Yi-Ping Chen
V. Karkaria
Wei Chen
45
1
0
03 Dec 2025
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Huy Nghiem
Swetasudha Panda
Devashish Khatwani
Huy Nguyen
Krishnaram Kenthapadi
Hal Daumé III
LM&MA
133
0
0
03 Dec 2025
Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control
Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control
Kenneth Stewart
Samantha Chapin
Roxana Leontie
C. Henshaw
47
2
0
03 Dec 2025
Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA
Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+(λλλ,λλλ))-GA
Tai Nguyen
Phong Le
André Biedenkapp
Carola Doerr
Nguyen Dang
62
0
0
03 Dec 2025
Towards better dense rewards in Reinforcement Learning Applications
Towards better dense rewards in Reinforcement Learning Applications
Shuyuan Zhang
OffRL
91
0
0
03 Dec 2025
RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL
RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL
Yinzhou Tang
Yu Shang
Yinuo Chen
Bingwen Wei
Xin Zhang
...
Liangzhi Shi
Chao Yu
Chen Gao
Wei Wu
Yong Li
107
0
0
03 Dec 2025
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
Jiazhe Wei
Ken Li
Tianyu Lao
Haofan Wang
Liang Wang
Caifeng Shan
Chenyang Si
97
0
0
03 Dec 2025
LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling
LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling
Hong-Kai Zheng
Piji Li
58
0
0
03 Dec 2025
A Learning-based Control Methodology for Transitioning VTOL UAVs
A Learning-based Control Methodology for Transitioning VTOL UAVs
Zexin Lin
Yebin Zhong
Hanwen Wan
Jiu Cheng
Zhenglong Sun
Xiaoqiang Ji
65
0
0
03 Dec 2025
Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware
Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware
Kenneth Stewart
Roxana Leontie
Samantha Chapin
Joe Hays
Sumit Bam Shrestha
C. Henshaw
99
0
0
03 Dec 2025
Thinking with Programming Vision: Towards a Unified View for Thinking with Images
Thinking with Programming Vision: Towards a Unified View for Thinking with Images
Zirun Guo
Minjie Hong
Feng Zhang
Kai Jia
Tao Jin
OffRLLRMVLM
207
0
0
03 Dec 2025
MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
Yizhou Zhao
Zhiwei Steven Wu
Adam Block
119
0
0
03 Dec 2025
Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving
Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving
Johannes Fischer
Marlon Steiner
Omer Sahin Tas
Christoph Stiller
52
3
0
03 Dec 2025
Generative Multi-modal Feedback for Singing Voice Synthesis Evaluation
Generative Multi-modal Feedback for Singing Voice Synthesis Evaluation
Xueyan Li
Y. Wang
Mengjie Jiang
Qingzi Zhu
Jiang Zhang
Zoey Kim
Yazhe Niu
EGVM
81
0
0
02 Dec 2025
Dynamic Configuration of On-Street Parking Spaces using Multi Agent Reinforcement Learning
Dynamic Configuration of On-Street Parking Spaces using Multi Agent Reinforcement Learning
Oshada Jayasinghe
Farhana Choudhury
E. Tanin
S. Karunasekera
AI4CE
105
0
0
02 Dec 2025
Zero-Shot Instruction Following in RL via Structured LTL Representations
Zero-Shot Instruction Following in RL via Structured LTL Representations
Mattia Giuri
Mathias Jackermeier
Alessandro Abate
OffRL
146
0
0
02 Dec 2025
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Y. Li
Yingda Yin
Lingting Zhu
Weikai Chen
Shengju Qian
Xin Wang
Yanwei Fu
VOSLRM
385
0
0
02 Dec 2025
OptPO: Optimal Rollout Allocation for Test-time Policy Optimization
OptPO: Optimal Rollout Allocation for Test-time Policy Optimization
Youkang Wang
Jian Wang
Rubing Chen
Tianyi Zeng
Xiao-Yong Wei
Qing Li
59
0
0
02 Dec 2025
Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation
Nav-R2R^2R2 Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation
Wentao Xiang
H. Zhang
Tianhang Yang
Zedong Chu
Ruihang Chu
...
Zhining Gu
Junjie Wang
Xiaolong Wu
Mu Xu
Yujiu Yang
156
1
0
02 Dec 2025
GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies
GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies
Chubin Zhang
Zhenglin Wan
Feng Chen
Xingrui Yu
Ivor W. Tsang
Bo An
83
0
0
02 Dec 2025
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Xinyue Ai
Yutong He
Albert Gu
Ruslan Salakhutdinov
J. Zico Kolter
Nicholas Matthew Boffi
Max Simchowitz
64
1
0
02 Dec 2025
Plantain: Plan-Answer Interleaved Reasoning
Plantain: Plan-Answer Interleaved Reasoning
Anthony Liang
Jonathan Berant
Adam Fisch
Abhimanyu Goyal
Kalpesh Krishna
Jacob Eisenstein
ReLMLRM
232
0
0
02 Dec 2025
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
Yuxuan Mu
Ziyu Zhang
Yi Shi
Minami Matsumoto
Kotaro Imamura
...
Chuan Guo
Michael Taylor
Chang Shu
Pengcheng Xi
Xue Bin Peng
115
0
0
02 Dec 2025
ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce
ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerceAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Zheng Fang
Donghao Xie
Ming Pang
Chunyuan Yuan
Xue Jiang
Changping Peng
Zhangang Lin
Zheng Luo
92
2
0
02 Dec 2025
Vehicle Dynamics Embedded World Models for Autonomous Driving
Vehicle Dynamics Embedded World Models for Autonomous Driving
Huiqian Li
Wei Pan
Haodong Zhang
Jin Huang
Zhihua Zhong
148
0
0
02 Dec 2025
Artemis: Structured Visual Reasoning for Perception Policy Learning
Artemis: Structured Visual Reasoning for Perception Policy Learning
Wei Tang
Yanpeng Sun
Shan Zhang
Xiaofan Li
Piotr Koniusz
Wei Li
Na Zhao
Z. Li
LRMVLM
110
0
0
01 Dec 2025
Learning Dexterous Manipulation Skills from Imperfect Simulations
Elvis Hsieh
Wen-Han Hsieh
Yen-Jen Wang
Toru Lin
Jitendra Malik
Koushil Sreenath
Haozhi Qi
220
1
0
01 Dec 2025
Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning
Diyuan Shi
Shangke Lyu
Donglin Wang
127
0
0
01 Dec 2025
1234...227228229
Next