Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1707.06347
Cited By

Proximal Policy Optimization Algorithms

v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017

Prafulla Dhariwal

ArXiv (abs)PDF HTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,421 papers shown

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI ArchitecturesInternational Symposium on Computer Architecture (ISCA), 2025

...

253

34

0

24 Dec 2025

Tactile-based Object Retrieval From Granular Media

Tactile-based Object Retrieval From Granular Media

Shuran Song

195

11

0

24 Dec 2025

C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning

^2

GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning

181

0

0

24 Dec 2025

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

...

436

0

0

24 Dec 2025

Reinforcement Learning for Large Model: A Survey

Reinforcement Learning for Large Model: A Survey

Mike Zheng Shou

316

2

0

24 Dec 2025

RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow

RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow

235

7

0

24 Dec 2025

Deformable Cluster Manipulation via Whole-Arm Policy Learning

Deformable Cluster Manipulation via Whole-Arm Policy Learning

T. Bandyopadhyay

219

0

0

24 Dec 2025

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

135

0

0

04 Dec 2025

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

101

0

0

04 Dec 2025

Learning to Orchestrate Agents in Natural Language with the Conductor

Learning to Orchestrate Agents in Natural Language with the Conductor

Peter Schwendeman

103

1

0

04 Dec 2025

Structured Document Translation via Format Reinforcement Learning

Structured Document Translation via Format Reinforcement Learning

Johannes Eschbach-Dymanus

Bianka Buschbeck

60

0

0

04 Dec 2025

RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

35

0

0

04 Dec 2025

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

160

0

0

04 Dec 2025

Using Machine Learning to Take Stay-or-Go Decisions in Data-driven Drone Missions

Using Machine Learning to Take Stay-or-Go Decisions in Data-driven Drone Missions

Giorgos Polychronis

Foivos Pournaropoulos

C. Antonopoulos

253

0

0

04 Dec 2025

Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty

Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty

Ralf Borndörfer

7

0

0

04 Dec 2025

LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving

LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving

Dheeraj Kodakandla

Mahfuza Farooque

28

0

0

04 Dec 2025

FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

Guillaume Sartoretti

146

0

0

04 Dec 2025

Value Gradient Guidance for Flow Matching Alignment

Value Gradient Guidance for Flow Matching Alignment

Carles Domingo-Enrich

57

0

0

04 Dec 2025

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

157

0

0

04 Dec 2025

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Christos Thrampoulidis

55

2

0

03 Dec 2025

PretrainZero: Reinforcement Active Pretraining

PretrainZero: Reinforcement Active Pretraining

OffRL AIMat ReLM LRM AI4CE

443

1

0

03 Dec 2025

Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective

Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective

120

0

0

03 Dec 2025

Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning

Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning

45

1

0

03 Dec 2025

Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment

Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment

Swetasudha Panda

Devashish Khatwani

Krishnaram Kenthapadi

133

0

0

03 Dec 2025

Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control

Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control

Kenneth Stewart

Samantha Chapin

53

2

0

03 Dec 2025

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+(

λ

λ

André Biedenkapp

62

0

0

03 Dec 2025

Towards better dense rewards in Reinforcement Learning Applications

Towards better dense rewards in Reinforcement Learning Applications

94

0

0

03 Dec 2025

RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL

RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL

...

107

0

0

03 Dec 2025

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

97

0

0

03 Dec 2025

LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling

LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling

58

0

0

03 Dec 2025

A Learning-based Control Methodology for Transitioning VTOL UAVs

A Learning-based Control Methodology for Transitioning VTOL UAVs

66

0

0

03 Dec 2025

Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware

Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware

Kenneth Stewart

Samantha Chapin

Sumit Bam Shrestha

100

0

0

03 Dec 2025

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

209

0

0

03 Dec 2025

MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking

MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking

Zhiwei Steven Wu

123

0

0

03 Dec 2025

Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving

Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving

Johannes Fischer

Christoph Stiller

56

3

0

03 Dec 2025

Generative Multi-modal Feedback for Singing Voice Synthesis Evaluation

Generative Multi-modal Feedback for Singing Voice Synthesis Evaluation

81

0

0

02 Dec 2025

Dynamic Configuration of On-Street Parking Spaces using Multi Agent Reinforcement Learning

Dynamic Configuration of On-Street Parking Spaces using Multi Agent Reinforcement Learning

Oshada Jayasinghe

Farhana Choudhury

S. Karunasekera

108

0

0

02 Dec 2025

Zero-Shot Instruction Following in RL via Structured LTL Representations

Zero-Shot Instruction Following in RL via Structured LTL Representations

Mathias Jackermeier

Alessandro Abate

146

0

0

02 Dec 2025

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

386

0

0

02 Dec 2025

OptPO: Optimal Rollout Allocation for Test-time Policy Optimization

OptPO: Optimal Rollout Allocation for Test-time Policy Optimization

59

0

0

02 Dec 2025

Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation

R^2

Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation

...

156

1

0

02 Dec 2025

GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

84

0

0

02 Dec 2025

Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

Ruslan Salakhutdinov

Nicholas Matthew Boffi

67

1

0

02 Dec 2025

Plantain: Plan-Answer Interleaved Reasoning

Plantain: Plan-Answer Interleaved Reasoning

Jonathan Berant

Abhimanyu Goyal

Kalpesh Krishna

Jacob Eisenstein

235

0

0

02 Dec 2025

SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

Minami Matsumoto

...

115

0

0

02 Dec 2025

ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce

ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerceAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025

92

2

0

02 Dec 2025

Vehicle Dynamics Embedded World Models for Autonomous Driving

Vehicle Dynamics Embedded World Models for Autonomous Driving

148

0

0

02 Dec 2025

Artemis: Structured Visual Reasoning for Perception Policy Learning

Artemis: Structured Visual Reasoning for Perception Policy Learning

110

0

0

01 Dec 2025

Learning Dexterous Manipulation Skills from Imperfect Simulations

Koushil Sreenath

220

1

0

01 Dec 2025

Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning

127

0

0

01 Dec 2025

1 2 3 4...227 228 229