Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1707.06347
Cited By
v1
v2 (latest)
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 11,418 papers shown
Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models
Ziyan Wang
Enmao Diao
Qi Le
Pu Wang
G. Wang
Minwoo Lee
Shu-ping Yeh
Li Yang
ReLM
LRM
VLM
124
0
0
01 Dec 2025
Learning Sim-to-Real Humanoid Locomotion in 15 Minutes
Younggyo Seo
Carmelo Sferrazza
Juyue Chen
Guanya Shi
Rocky Duan
Pieter Abbeel
168
0
0
01 Dec 2025
Learning Dexterous Manipulation Skills from Imperfect Simulations
Elvis Hsieh
Wen-Han Hsieh
Yen-Jen Wang
Toru Lin
Jitendra Malik
Koushil Sreenath
Haozhi Qi
218
1
0
01 Dec 2025
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Chujie Zheng
Kai Dang
Bowen Yu
Mingze Li
Huiqiang Jiang
...
Chencan Wu
Feng Hu
An Yang
Jingren Zhou
Junyang Lin
OffRL
240
2
0
01 Dec 2025
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
Songyan Zhang
Wenhui Huang
Zhan Chen
Chua Jiahao Collister
Qihang Huang
Chen Lv
OffRL
LRM
195
2
0
01 Dec 2025
Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning
Diyuan Shi
Shangke Lyu
Donglin Wang
121
0
0
01 Dec 2025
Improved Training Mechanism for Reinforcement Learning via Online Model Selection
Aida Afshar
Aldo Pacchiano
46
0
0
01 Dec 2025
PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards
Shulei Wang
Longhui Wei
Xin He
Jianbo Ouyang
H. Lu
Zhou Zhao
Qi Tian
EGVM
209
0
0
01 Dec 2025
From Atomic to Composite: Reinforcement Learning Enables Generalization in Complementary Reasoning
Sitao Cheng
Xunjian Yin
Ruiwen Zhou
Yuxuan Li
Xinyi Wang
Liangming Pan
William Yang Wang
Victor Zhong
OffRL
LRM
237
1
0
01 Dec 2025
Artemis: Structured Visual Reasoning for Perception Policy Learning
Wei Tang
Yanpeng Sun
Shan Zhang
Xiaofan Li
Piotr Koniusz
Wei Li
Na Zhao
Z. Li
LRM
VLM
107
0
0
01 Dec 2025
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
121
1
0
01 Dec 2025
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
Haoran Li
Jiayu Lv
Congying Han
Zicheng Zhang
Anqi Li
Y. Liu
Tiande Guo
Nan Jiang
AAML
138
0
0
01 Dec 2025
Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
Zirui Zhao
Boye Niu
David Hsu
W. Lee
GAN
183
0
0
01 Dec 2025
Directed evolution algorithm drives neural prediction
Yanlin Wang
Nancy M Young
Patrick C M Wong
125
0
0
01 Dec 2025
Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability
Jinghan Jia
Nathalie Baracaldo
Sijia Liu
OffRL
ReLM
LRM
229
0
0
01 Dec 2025
Agentic Policy Optimization via Instruction-Policy Co-Evolution
Han Zhou
Xingchen Wan
Ivan Vulić
Anna Korhonen
99
0
0
01 Dec 2025
On The Finetuning of MLIPs Through the Lens of Iterated Maps With BPTT
Evan Dramko
Yizhi Zhu
Aleksandar Krivokapic
Geoffroy Hautier
Thomas Reps
C. Jermaine
Anastasios Kyrillidis
73
0
0
30 Nov 2025
Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation
Chengzhi Yu
Yifan Xu
Yifan Chen
Wenyi Zhang
MLLM
OffRL
262
0
0
30 Nov 2025
Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
Hadi Nekoei
Alexandre Blondin Massé
Rachid Hassani
Sarath Chandar
Vincent Mai
61
0
0
30 Nov 2025
Automating the Refinement of Reinforcement Learning Specifications
Tanmay Ambadkar
Đorđe Žikelić
Abhinav Verma
67
0
0
30 Nov 2025
MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion
Sizhe Wei
Xulin Chen
Fengze Xie
Garrett E. Katz
Zhenyu Gan
Lu Gan
53
0
0
30 Nov 2025
What Is Preference Optimization Doing, How and Why?
Yue Wang
Qizhou Wang
Zizhuo Zhang
Ang Li
Gang Niu
Bo Han
Masashi Sugiyama
68
0
0
30 Nov 2025
Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking
Lingling Fu
MoMe
125
0
0
30 Nov 2025
When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
Yifan Xu
Xichen Ye
Yifan Chen
Qiaosheng Zhang
64
0
0
30 Nov 2025
Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search
Ziyang Zeng
Heming Jing
Jindong Chen
X. Li
Hongyu Liu
...
Yuqing Yang
Shaosheng Cao
Jun Fan
Yi-Chen Wu
Yao Hu
LRM
170
0
0
30 Nov 2025
Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
Qiang Lyu
Z. Chen
C. Wang
Haolin Shi
Shibo Gao
...
Jianlou Si
Fei Ding
Jing Li
Chun Pong Lau
Weiqiang Wang
EGVM
121
1
0
30 Nov 2025
The Silence that Speaks: Neural Estimation via Communication Gaps
Shubham Aggarwal
Dipankar Maity
Tamer Basar
45
0
0
30 Nov 2025
Reinforcement Learning for Gliding Projectile Guidance and Control
Joel Cahn
Antonin Thomas
Philippe Pastor
28
0
0
30 Nov 2025
GreenPlanner: Practical Floorplan Layout Generation via an Energy-Aware and Function-Feasible Generative Framework
Pengyu Zeng
Yuqin Dai
Jun Yin
Jing Zhong
Ziyang Han
Chaoyang Shi
ZhanXiang Jin
Maowei Jiang
Yuxing Han
Shuai Lu
59
0
0
29 Nov 2025
ESPO: Entropy Importance Sampling Policy Optimization
Yuepeng Sheng
Yuwei Huang
Shuman Liu
Haibo Zhang
Anxiang Zeng
49
1
0
29 Nov 2025
Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control
Mengting Yu
Shuiying Xiang
Changjian Xie
Yonghang Chen
Haowen Zhao
Xingxing Guo
Yahui Zhang
Yanan Han
Yue Hao
81
0
0
29 Nov 2025
Truthful and Trustworthy IoT AI Agents via Immediate-Penalty Enforcement under Approximate VCG Mechanisms
Xun Shao
Ryuuto Shimizu
Zhi Liu
K. Ota
M. Dong
48
0
0
29 Nov 2025
Adversarial Training for Process Reward Models
Gurusha Juneja
Deepak Nathani
William Yang Wang
LRM
134
0
0
28 Nov 2025
ORION: Teaching Language Models to Reason Efficiently in the Language of Thought
Kumar Tanmay
Kriti Aggarwal
Paul Pu Liang
Subhabrata Mukherjee
ReLM
LRM
253
0
0
28 Nov 2025
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction
Bao Shu
Yan Cai
Jianjian Sun
Chunrui Han
En Yu
...
Yuang Peng
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Xiangyu Yue
LLMAG
KELM
LRM
255
0
0
28 Nov 2025
OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning
Zixun Huang
Jiayi Sheng
Zeyu Zheng
OffRL
97
0
0
28 Nov 2025
Asking like Socrates: Socrates helps VLMs understand remote sensing images
Run Shao
Ziyu Li
Zhaoyang Zhang
Linrui Xu
Xinran He
...
Yongxing Dai
Yiming Yan
Yijun Chen
Wang Guo
Haifeng Li
LRM
129
1
0
27 Nov 2025
Beyond Egocentric Limits: Multi-View Depth-Based Learning for Robust Quadrupedal Locomotion
Rémy Rahem
Wael Suleiman
114
0
0
27 Nov 2025
Co-Evolving Agents: Learning from Failures as Hard Negatives
Yeonsung Jung
Trilok Padhi
Sina Shaham
Dipika Khullar
Joonhyun Jeong
Ninareh Mehrabi
Eunho Yang
87
0
0
27 Nov 2025
Exposing Vulnerabilities in RL: A Novel Stealthy Backdoor Attack through Reward Poisoning
Bokang Zhang
Chaojun Lu
Jianhui Li
Junfeng Wu
AAML
121
0
0
27 Nov 2025
Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions
Roland Stolz
Michael Eichelbeck
Matthias Althoff
21
0
0
27 Nov 2025
TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices
Mohd Ariful Haque
Fahad Rahman
Kishor Datta Gupta
Khalil Shujaee
Roy George
LLMAG
150
0
0
27 Nov 2025
TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning
Qingtao Yu
Changlin Song
Minghao Sun
Zhengyang Yu
Vinay Kumar Verma
Soumya Roy
Sumit Negi
Hongdong Li
Dylan Campbell
92
0
0
27 Nov 2025
Selecting User Histories to Generate LLM Users for Cold-Start Item Recommendation
Nachiket Subbaraman
Jaskinder Sarai
Aniruddh Nath
Lichan Hong
Lukasz Heldt
Li Wei
Zhe Zhao
RALM
93
0
0
27 Nov 2025
Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO
Daniel Jiang
Jalaj Bhandari
Yukai Yang
Rémi Munos
Tyler Lu
OffRL
585
1
0
26 Nov 2025
Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation
Tao Zhe
Huazhen Fang
Kunpeng Liu
Qian Lou
Tamzidul Hoque
Dongjie Wang
OffRL
56
0
0
26 Nov 2025
Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning
Sid Bharthulwar
Stone Tao
Hao Su
OffRL
208
0
0
26 Nov 2025
Maglev-Pentabot: Magnetic Levitation System for Non-Contact Manipulation using Deep Reinforcement Learning
Guoming Huang
Qingyi Zhou
Dianjing Liu
Shuai Zhang
M. Zhou
Zongfu Yu
117
0
0
26 Nov 2025
Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation
Kaiyan Xiao
Zihan Xu
Cheng Zhe
Chengju Liu
Qijun Chen
AI4CE
442
0
0
26 Nov 2025
Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems
Junkai Hu
Li Xia
372
0
0
26 Nov 2025
Previous
1
2
3
4
5
...
227
228
229
Next