Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1707.06347
Cited By

Proximal Policy Optimization Algorithms

v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017

Prafulla Dhariwal

ArXiv (abs)PDF HTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,422 papers shown

Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control

Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control

Christoph H. Lampert

207

0

0

01 Dec 2025

Learning Sim-to-Real Humanoid Locomotion in 15 Minutes

Learning Sim-to-Real Humanoid Locomotion in 15 Minutes

Carmelo Sferrazza

182

0

0

01 Dec 2025

Learning Dexterous Manipulation Skills from Imperfect Simulations

Koushil Sreenath

225

1

0

01 Dec 2025

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

...

249

2

0

01 Dec 2025

PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards

210

0

0

01 Dec 2025

OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

Chua Jiahao Collister

203

2

0

01 Dec 2025

Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning

130

0

0

01 Dec 2025

Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability

Nathalie Baracaldo

232

0

0

01 Dec 2025

Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation

Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation

193

0

0

01 Dec 2025

Artemis: Structured Visual Reasoning for Perception Policy Learning

Artemis: Structured Visual Reasoning for Perception Policy Learning

110

0

0

01 Dec 2025

Improved Training Mechanism for Reinforcement Learning via Online Model Selection

Improved Training Mechanism for Reinforcement Learning via Online Model Selection

56

0

0

01 Dec 2025

From Atomic to Composite: Reinforcement Learning Enables Generalization in Complementary Reasoning

From Atomic to Composite: Reinforcement Learning Enables Generalization in Complementary Reasoning

William Yang Wang

241

1

0

01 Dec 2025

Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models

Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models

128

0

0

01 Dec 2025

Rectifying LLM Thought from Lens of Optimization

Rectifying LLM Thought from Lens of Optimization

128

1

0

01 Dec 2025

Directed evolution algorithm drives neural prediction

Patrick C M Wong

130

0

0

01 Dec 2025

Agentic Policy Optimization via Instruction-Policy Co-Evolution

109

0

0

01 Dec 2025

On The Finetuning of MLIPs Through the Lens of Iterated Maps With BPTT

Aleksandar Krivokapic

Geoffroy Hautier

Anastasios Kyrillidis

77

0

0

30 Nov 2025

Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids

Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids

Alexandre Blondin Massé

61

0

0

30 Nov 2025

Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation

266

1

0

30 Nov 2025

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

Garrett E. Katz

60

0

0

30 Nov 2025

What Is Preference Optimization Doing, How and Why?

Masashi Sugiyama

72

0

0

30 Nov 2025

Automating the Refinement of Reinforcement Learning Specifications

Automating the Refinement of Reinforcement Learning Specifications

Tanmay Ambadkar

Đorđe Žikelić

70

0

0

30 Nov 2025

Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking

129

0

0

30 Nov 2025

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

...

176

1

0

30 Nov 2025

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

...

129

1

0

30 Nov 2025

When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF

When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF

Qiaosheng Zhang

70

0

0

30 Nov 2025

The Silence that Speaks: Neural Estimation via Communication Gaps

Shubham Aggarwal

58

0

0

30 Nov 2025

Reinforcement Learning for Gliding Projectile Guidance and Control

Reinforcement Learning for Gliding Projectile Guidance and Control

Philippe Pastor

34

0

0

30 Nov 2025

GreenPlanner: Practical Floorplan Layout Generation via an Energy-Aware and Function-Feasible Generative Framework

GreenPlanner: Practical Floorplan Layout Generation via an Energy-Aware and Function-Feasible Generative Framework

64

0

0

29 Nov 2025

ESPO: Entropy Importance Sampling Policy Optimization

56

1

0

29 Nov 2025

Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control

Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control

90

0

0

29 Nov 2025

Truthful and Trustworthy IoT AI Agents via Immediate-Penalty Enforcement under Approximate VCG Mechanisms

Truthful and Trustworthy IoT AI Agents via Immediate-Penalty Enforcement under Approximate VCG Mechanisms

48

0

0

29 Nov 2025

Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

...

262

0

0

28 Nov 2025

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought

Subhabrata Mukherjee

255

0

0

28 Nov 2025

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

97

0

0

28 Nov 2025

Adversarial Training for Process Reward Models

Adversarial Training for Process Reward Models

William Yang Wang

146

0

0

28 Nov 2025

Asking like Socrates: Socrates helps VLMs understand remote sensing images

Asking like Socrates: Socrates helps VLMs understand remote sensing images

...

135

1

0

27 Nov 2025

Beyond Egocentric Limits: Multi-View Depth-Based Learning for Robust Quadrupedal Locomotion

Beyond Egocentric Limits: Multi-View Depth-Based Learning for Robust Quadrupedal Locomotion

117

0

0

27 Nov 2025

Co-Evolving Agents: Learning from Failures as Hard Negatives

Co-Evolving Agents: Learning from Failures as Hard Negatives

Ninareh Mehrabi

91

0

0

27 Nov 2025

Selecting User Histories to Generate LLM Users for Cold-Start Item Recommendation

Selecting User Histories to Generate LLM Users for Cold-Start Item Recommendation

Nachiket Subbaraman

Jaskinder Sarai

110

0

0

27 Nov 2025

Exposing Vulnerabilities in RL: A Novel Stealthy Backdoor Attack through Reward Poisoning

Exposing Vulnerabilities in RL: A Novel Stealthy Backdoor Attack through Reward Poisoning

133

0

0

27 Nov 2025

Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions

Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions

Michael Eichelbeck

Matthias Althoff

23

0

0

27 Nov 2025

TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices

TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices

Mohd Ariful Haque

Kishor Datta Gupta

158

0

0

27 Nov 2025

TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning

TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning

Vinay Kumar Verma

96

0

0

27 Nov 2025

Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation

Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation

60

0

0

26 Nov 2025

Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

593

1

0

26 Nov 2025

Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning

Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning

Sid Bharthulwar

215

0

0

26 Nov 2025

Maglev-Pentabot: Magnetic Levitation System for Non-Contact Manipulation using Deep Reinforcement Learning

Maglev-Pentabot: Magnetic Levitation System for Non-Contact Manipulation using Deep Reinforcement Learning

126

0

0

26 Nov 2025

Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation

Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation

449

0

0

26 Nov 2025

Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems

Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems

375

0

0

26 Nov 2025

1 2 3 4 5...227 228 229

Page 2 of 229

Pageof 229