ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,418 papers shown
Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning
Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning
Shanwei Fan
Bin Zhang
Zhiwei Xu
Yingxuan Teng
Siqi Dai
Lin Cheng
Guoliang Fan
161
0
0
26 Nov 2025
Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation
Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation
Kaiyan Xiao
Zihan Xu
Cheng Zhe
Chengju Liu
Qijun Chen
AI4CE
442
0
0
26 Nov 2025
Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO
Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO
Daniel Jiang
Jalaj Bhandari
Yukai Yang
Rémi Munos
Tyler Lu
OffRL
585
1
0
26 Nov 2025
Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning
Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning
Alex Ning
Yen-Ling Kuo
Gabe Gomes
OffRLReLMLRM
263
0
0
26 Nov 2025
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
Chenliang Li
Adel Elmahdy
Alex Boyd
Zhongruo Wang
Alfredo García
Parminder Bhatia
Taha A. Kass-Hout
Cao Xiao
Mingyi Hong
OffRL
172
0
0
25 Nov 2025
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
Yuanhao Li
Mingshan Liu
Hongbo Wang
Yiding Zhang
Yifei Ma
Wei Tan
AI4TSKELMLRMAI4CE
390
0
0
25 Nov 2025
Quantum-Enhanced Reinforcement Learning for Accelerating Newton-Raphson Convergence with Ising Machines: A Case Study for Power Flow Analysis
Quantum-Enhanced Reinforcement Learning for Accelerating Newton-Raphson Convergence with Ising Machines: A Case Study for Power Flow Analysis
Zeynab Kaseb
Matthias Möller
Lindsay Spoor
Jerry Guo
Yu Xiang
Peter Palensky
Pedro P. Vergara
113
0
0
25 Nov 2025
Complex Instruction Following with Diverse Style Policies in Football Games
Complex Instruction Following with Diverse Style Policies in Football Games
Chenglu Sun
Shuo Shen
Haonan Hu
Wei Zhou
Chen Chen
85
0
0
25 Nov 2025
A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs
A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs
Bowen Zhi
51
0
0
25 Nov 2025
Reinforcing Action Policies by Prophesying
Reinforcing Action Policies by Prophesying
Jiahui Zhang
Ze Huang
Chun Gu
Zipei Ma
Li Zhang
233
1
0
25 Nov 2025
CostNav: A Navigation Benchmark for Cost-Aware Evaluation of Embodied Agents
CostNav: A Navigation Benchmark for Cost-Aware Evaluation of Embodied Agents
Haebin Seong
Sungmin Kim
Minchan Kim
Yongjun Cho
Myunchul Joe
...
Yoonshik Kim
Samwoo Seong
Yubeen Park
Youngjae Yu
Yunsung Lee
128
1
0
25 Nov 2025
Improving Language Agents through BREW
Improving Language Agents through BREW
Shashank Kirtania
Param Biyani
Priyanshu Gupta
Yasharth Bajpai
Roshni Iyer
Sumit Gulwani
Gustavo Soares
LLMAGOffRL
252
0
0
25 Nov 2025
SOMBRL: Scalable and Optimistic Model-Based RL
SOMBRL: Scalable and Optimistic Model-Based RL
Bhavya Sukhija
Lenart Treven
Carmelo Sferrazza
Florian Dorfler
Pieter Abbeel
Andreas Krause
OffRL
249
2
0
25 Nov 2025
Energy Costs and Neural Complexity Evolution in Changing Environments
Energy Costs and Neural Complexity Evolution in Changing EnvironmentsIEEE Symposium on Artificial Life (AL), 2025
Sian Heesom-Green
Jonathan Shock
Geoff Nitschke
30
0
0
25 Nov 2025
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
Karen Ullrich
Jingtong Su
Claudia Shi
Arjun Subramonian
Amir Bar
Ivan Evtimov
Nikolaos Tsilivis
Randall Balestriero
Julia Kempe
Mark Ibrahim
117
0
0
25 Nov 2025
QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation
QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation
Xinguo Zhu
Shaohui Peng
Jiaming Guo
Yunji Chen
Qi Guo
...
Qirui Zhou
Ke Gao
Yanjun Wu
Chen Zhao
Ling Li
84
1
0
25 Nov 2025
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
Shengqiong Wu
Weicai Ye
Y. Zhang
Jiahao Wang
Quande Liu
Xintao Wang
Pengfei Wan
Kun Gai
Hao Fei
Tat-Seng Chua
VGenLRM
184
0
0
25 Nov 2025
Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning
Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning
Jingchu Gai
Guanning Zeng
Huaqing Zhang
Aditi Raghunathan
109
0
0
25 Nov 2025
MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models
MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models
Chieh-Yun Chen
Zhonghao Wang
Qi-An Chen
Zhifan Ye
Min Shi
...
Wei-An Lin
Yiru Shen
Ajinkya Kale
Irfan Essa
Humphrey Shi
130
0
0
25 Nov 2025
QiMeng-CRUX: Narrowing the Gap between Natural Language and Verilog via Core Refined Understanding eXpression
QiMeng-CRUX: Narrowing the Gap between Natural Language and Verilog via Core Refined Understanding eXpression
Lei Huang
Rui Zhang
Jiaming Guo
Yang Zhang
Di Huang
...
Chongxiao Li
Zidong Du
Xing Hu
Qi Guo
Y. Chen
101
0
0
25 Nov 2025
Manifold Percolation: from generative model to Reinforce learning
Manifold Percolation: from generative model to Reinforce learning
Rui Tong
34
0
0
25 Nov 2025
CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
Miguel Carvalho
Helder Dias
Bruno Martins
VLM
215
0
0
25 Nov 2025
MIMIC-MJX: Neuromechanical Emulation of Animal Behavior
MIMIC-MJX: Neuromechanical Emulation of Animal Behavior
Charles Y. Zhang
Yuanjia Yang
Aidan Sirbu
Elliott T.T. Abe
Emil Wärnberg
...
Blake A. Richards
Bingni W. Brunton
Eiman Azim
Bence Olveczky
Talmo Pereira
81
1
0
25 Nov 2025
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
Xuelu Feng
Yunsheng Li
Ziyu Wan
Zixuan Gao
Junsong Yuan
Dongdong Chen
Chunming Qiao
EGVM
274
0
0
25 Nov 2025
Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning
Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning
Charlotte Beylier
Hannah Selder
Arthur Fleig
S. M. Hofmann
Nico Scherf
121
0
0
25 Nov 2025
HAFO: A Force-Adaptive Control Framework for Humanoid Robots in Intense Interaction Environments
HAFO: A Force-Adaptive Control Framework for Humanoid Robots in Intense Interaction Environments
Chenhui Dong
HaoZhe Xu
Wenhao Feng
Zhipeng Wang
Yanmin Zhou
Yifei Zhao
Bin He
141
0
0
25 Nov 2025
BRIC: Bridging Kinematic Plans and Physical Control at Test Time
BRIC: Bridging Kinematic Plans and Physical Control at Test Time
Dohun Lim
Minji Kim
Jaewoon Lim
Sungchan Kim
TTA
333
0
0
25 Nov 2025
Leveraging weights signals - Predicting and improving generalizability in reinforcement learning
Leveraging weights signals - Predicting and improving generalizability in reinforcement learning
Olivier Moulin
Vincent François-Lavet
Paul Elbers
Mark Hoogendoorn
88
0
0
25 Nov 2025
Dynamic Mixture of Experts Against Severe Distribution Shifts
Dynamic Mixture of Experts Against Severe Distribution Shifts
Donghu Kim
CLL
149
0
0
24 Nov 2025
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation
Ziqi Ni
Yuanzhi Liang
Rui Li
Yi Zhou
H. Huang
Chi Zhang
Xuelong Li
115
0
0
24 Nov 2025
Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation
Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation
Ruiying Liu
Yuanzhi Liang
Haibin Huang
Tianshu Yu
Chi Zhang
93
0
0
24 Nov 2025
SENTINEL: A Fully End-to-End Language-Action Model for Humanoid Whole Body Control
SENTINEL: A Fully End-to-End Language-Action Model for Humanoid Whole Body Control
Yuxuan Wang
Haobin Jiang
Shiqing Yao
Ziluo Ding
Zongqing Lu
LM&Ro
372
1
0
24 Nov 2025
ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion
ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion
Zhenghan Fang
Jian Zheng
Qiaozi Gao
Xiaofeng Gao
Jeremias Sulam
212
0
0
24 Nov 2025
Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning
Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning
Jian Lu
Yi Luo
224
0
0
24 Nov 2025
Test-Time Preference Optimization for Image Restoration
Test-Time Preference Optimization for Image Restoration
Bingchen Li
Xin Li
Jiaqi Xu
Jiaming Guo
Wenbo Li
Renjing Pei
Zhibo Chen
125
0
0
24 Nov 2025
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
Xin Yuan
S. Li
Jiateng Wei
Chengrui Zhu
Yanming Wu
Qingpeng Li
Jiajun Lv
Xiaoke Lan
Jun Chen
Yong-Jin Liu
OffRL
373
0
0
24 Nov 2025
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Y. Xu
Chaofan Fan
J. Hu
Yu Zhang
Zeng Xiaoyi
J. Zhang
157
1
0
24 Nov 2025
Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation
Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation
Y. Wang
Yonghui Yang
Le Wu
Y. Zhang
Richang Hong
AI4TS
280
0
0
24 Nov 2025
SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning
SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning
Taewook Nam
Sung Ju Hwang
79
0
0
24 Nov 2025
Learning Massively Multitask World Models for Continuous Control
Learning Massively Multitask World Models for Continuous Control
Nicklas Hansen
Hao Su
Xiaolong Wang
OffRLCLLLM&Ro
528
0
0
24 Nov 2025
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
X. Hou
Shaoyuan Xu
Manan Biyani
Mayan Li
Jia-Wei Liu
Todd C. Hollon
Bryan Wang
136
0
0
24 Nov 2025
LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems
LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems
Tianyang Duan
Zongyuan Zhang
Zheng Lin
Songxiao Guo
Xiuxian Guan
...
Xia Du
Ji-Zhe Zhou
Heming Cui
Jun Luo
Yue Gao
85
2
0
24 Nov 2025
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Meng Lu
Ran Xu
Yi Fang
Wenxuan Zhang
Yue Yu
...
Guanghua Xiao
Hanrui Wang
Di Jin
W. Shi
Xuan Wang
LRM
139
1
0
24 Nov 2025
An Anatomy Aware Hybrid Deep Learning Framework for Lung Cancer Tumor Stage Classification
An Anatomy Aware Hybrid Deep Learning Framework for Lung Cancer Tumor Stage Classification
Saniah Kayenat Chowdhury
Rusab Sarmun
M. Chowdhury
S. Zoghoul
Israa Al-Hashimi
Adam Mushtak
Amith Khandakar
109
0
0
24 Nov 2025
Object-centric Task Representation and Transfer using Diffused Orientation Fields
Object-centric Task Representation and Transfer using Diffused Orientation Fields
Cem Bilaloglu
Tobias Löw
Sylvain Calinon
85
0
0
23 Nov 2025
SafeFall: Learning Protective Control for Humanoid Robots
SafeFall: Learning Protective Control for Humanoid Robots
Ziyu Meng
Tengyu Liu
Le Ma
Yingying Wu
Ran Song
Wei Emma Zhang
Siyuan Huang
79
0
0
23 Nov 2025
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
276
0
0
23 Nov 2025
Wireless Power Transfer and Intent-Driven Network Optimization in AAVs-assisted IoT for 6G Sustainable Connectivity
Wireless Power Transfer and Intent-Driven Network Optimization in AAVs-assisted IoT for 6G Sustainable Connectivity
Yue Hu
Xiaoming He
Rui Yuan
Shahid Mumtaz
65
0
0
23 Nov 2025
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
R. Xu
Dakuan Lu
Zicheng Zhao
Xiaoyu Tan
X. Wang
Siyu Yuan
Jiangjie Chen
Yinghui Xu
84
0
0
23 Nov 2025
Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning
Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning
Radman Rakhshandehroo
Daniel Coombs
106
0
0
22 Nov 2025
Previous
123456...227228229
Next