ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,419 papers shown
PrefixGPT: Prefix Adder Optimization by a Generative Pre-trained Transformer
PrefixGPT: Prefix Adder Optimization by a Generative Pre-trained Transformer
Ruogu Ding
Xin Ning
Ulf Schlichtmann
Weikang Qian
81
0
0
22 Nov 2025
Deep Gaussian Process Proximal Policy Optimization
Deep Gaussian Process Proximal Policy Optimization
Matthijs van der Lende
Juan Cardenas-Cartagena
GPBDLUQCV
384
0
0
22 Nov 2025
The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality
The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality
Dou Liu
Ying Long
Sophia Zuoqiu
Kaipeng Xie
Runze Yang
...
Kang Li
Yiting Lin
Hanyi Liu
Rong Yin
Tian Tang
127
0
0
22 Nov 2025
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Yusong Wu
Stephen Brade
Teng Ma
Tia-Jane Fowler
Enning Yang
Berker Banar
Aaron Courville
Natasha Jaques
Cheng-Zhi Anna Huang
AAML
139
0
0
22 Nov 2025
Hierarchical biomarker thresholding: a model-agnostic framework for stability
Hierarchical biomarker thresholding: a model-agnostic framework for stability
O. Debeaupuis
8
0
0
22 Nov 2025
Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models
Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models
Mukul Singh
Ananya Singha
Aishni Parab
Pronita Mehrotra
Sumit Gulwani
LRMAI4CE
132
0
0
22 Nov 2025
Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning
Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning
Mukul Singh
Ananya Singha
Arjun Radhakrishna
Sumit Gulwani
ReLMLRM
88
0
0
22 Nov 2025
Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning
Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning
Radman Rakhshandehroo
Daniel Coombs
106
0
0
22 Nov 2025
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
Mengni Jia
Mengyu Zhou
Yihao Liu
Xiaoxi Jiang
Guanjun Jiang
100
0
0
22 Nov 2025
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Bochen Lyu
Yiyang Jia
Xiaohao Cai
Zhanxing Zhu
MoE
145
0
0
22 Nov 2025
Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models
Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models
Dailan He
Guanlin Feng
Xingtong Ge
Yazhe Niu
Yi Zhang
Bingqi Ma
Guanglu Song
Y. Liu
Hongsheng Li
228
0
0
21 Nov 2025
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle
Mario Markov
Stefan Maria Ailuro
Luc Van Gool
Konrad Schindler
D. Paudel
LRM
162
0
0
21 Nov 2025
Physical Reinforcement Learning
Physical Reinforcement Learning
Sam Dillavou
Shruti Mishra
OffRL
157
0
0
21 Nov 2025
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
Y. Tang
Daiki Shimada
Hang Hua
Chao Huang
Jing Bi
Rogerio Feris
Chenliang Xu
241
0
0
21 Nov 2025
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models
Fatemeh
Nourzad
Amirhossein Roknilamouki
Eylem Ekici
Ness B. Shroff
FedML
305
0
0
21 Nov 2025
The PLLuM Instruction Corpus
The PLLuM Instruction Corpus
Piotr Pęzik
Filip Żarnecki
Konrad Kaczyñski
A. Cichosz
Zuzanna Deckert
...
Konrad Wojtasik
Arkadiusz Janz
P. Kazienko
Julia Moska
Jan Kocoñ
104
0
0
21 Nov 2025
Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems
Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems
Zengyu Zou
Jingyuan Wang
Yixuan Huang
Junjie Wu
124
0
0
21 Nov 2025
MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
Wenrui Zhang
Xinggang Wang
Bin Feng
Wenyu Liu
95
0
0
21 Nov 2025
Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data
Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data
Yixuan Pan
Ruoyi Qiao
L. Chen
Kashyap Chitta
Liang Pan
...
Qingwen Bu
Hang Zhao
Cunyuan Zheng
Ping Luo
Hongyang Li
284
0
0
21 Nov 2025
Human Imitated Bipedal Locomotion with Frequency Based Gait Generator Network
Human Imitated Bipedal Locomotion with Frequency Based Gait Generator Network
Yusuf Baran Ates
Omer Morgul
105
0
0
21 Nov 2025
LEARN: Learning End-to-End Aerial Resource-Constrained Multi-Robot Navigation
LEARN: Learning End-to-End Aerial Resource-Constrained Multi-Robot Navigation
Darren Chiu
Zhehui Huang
Ruohai Ge
Gaurav Sukhatme
69
0
0
21 Nov 2025
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Qinghao Hu
S. Yang
Junxian Guo
Xiaozhe Yao
Yujun Lin
Yuxian Gu
Han Cai
Chuang Gan
Ana Klimovic
Song Han
122
2
0
20 Nov 2025
Revisiting Fairness-aware Interactive Recommendation: Item Lifecycle as a Control Knob
Yun Lu
Xiaoyu Shi
Hong Xie
Chongjun Xia
Zhenhui Gong
Mingsheng Shang
74
0
0
20 Nov 2025
HGCN2SP: Hierarchical Graph Convolutional Network for Two-Stage Stochastic ProgrammingInternational Conference on Machine Learning (ICML), 2025
Yang Wu
Yifan Zhang
Zhenxing Liang
Jian Cheng
174
4
0
20 Nov 2025
Stabilizing Policy Gradient Methods via Reward Profiling
Shihab Ahmed
El Houcine Bergou
A. Dutta
Yue Wang
204
0
0
20 Nov 2025
Flow-Aided Flight Through Dynamic Clutters From Point To Motion
Flow-Aided Flight Through Dynamic Clutters From Point To MotionIEEE Robotics and Automation Letters (IEEE RA-L), 2025
Bowen Xu
Zexuan Yan
Minghao Lu
Xiyu Fan
Yi Luo
Youshen Lin
Zhiqiang Chen
Yeke Chen
Qiyuan Qiao
Peng Lu
141
0
0
20 Nov 2025
Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense
Sayak Mukherjee
Samrat Chatterjee
Emilie Purvine
Ted Fujimoto
Tegan H. Emerson
68
0
0
20 Nov 2025
Optimizing Operation Recipes with Reinforcement Learning for Safe and Interpretable Control of Chemical Processes
D. Brandner
Sergio Lucia
143
0
0
20 Nov 2025
Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization
Haohui Chen
Zhiyong Chen
Aoxiang Liu
Wentuo Fang
132
0
0
20 Nov 2025
SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning
Wei Xia
Zhi-Hong Deng
ALM
271
0
0
20 Nov 2025
Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization
Yi Zhang
Che Liu
Xiancong Ren
Hanchu Ni
Yingji Zhang
...
Zenglin Xu
Bin Shen
Qifan Wang
Jian Tang
Xiaozhu Ju
VLM
157
0
0
20 Nov 2025
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Peng Xia
K. Zeng
Jiaqi Liu
Can Qin
Fang Wu
Yiyang Zhou
Caiming Xiong
Huaxiu Yao
LLMAGLM&RoSyDa
728
3
0
20 Nov 2025
A Hybrid Proactive And Predictive Framework For Edge Cloud Resource Management
Hrikshesh Kumar
Anika Garg
Anshul Gupta
Yashika Agarwal
180
0
0
20 Nov 2025
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
Kai Yang
Xin Xu
Yangkun Chen
Weijie Liu
Jiafei Lyu
Zichuan Lin
Deheng Ye
Saiyong Yang
237
1
0
19 Nov 2025
Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention
Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention
Gabriele Calzolari
Vidya Sumathy
Christoforos Kanellakis
G. Nikolakopoulos
139
0
0
19 Nov 2025
Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution
Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution
Tomas Espana
Yadh Hafsi
Fabrizio Lillo
Edoardo Vittori
152
0
0
19 Nov 2025
Symmetry-Breaking in Multi-Agent Navigation: Winding Number-Aware MPC with a Learned Topological Strategy
Symmetry-Breaking in Multi-Agent Navigation: Winding Number-Aware MPC with a Learned Topological Strategy
Tomoki Nakao
Kazumi Kasaura
Tadashi Kozuno
74
0
0
19 Nov 2025
VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation
VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation
Tairan He
Zi Wang
Haoru Xue
Qingwei Ben
Zhengyi Luo
...
S. Sastry
C. K. Liu
Guanya Shi
Linxi Fan
Yuke Zhu
254
0
0
19 Nov 2025
IPR-1: Interactive Physical Reasoner
IPR-1: Interactive Physical Reasoner
Mingyu Zhang
Lifeng Zhuo
Tianxi Tan
Guocan Xie
Xian Nie
...
Renjie Zhao
Zizhu He
Z. Wang
Jiting Cai
Yong-Lu Li
PINNLRMAI4CE
402
0
0
19 Nov 2025
Step-Audio-R1 Technical Report
Step-Audio-R1 Technical Report
Fei Tian
Xiangyu Zhang
Y. Zhang
Haoyang Zhang
Yuxin Li
...
Eng Siong Chng
Xuerui Yang
Xiangyu Zhang
Daxin Jiang
Gang Yu
AuLLMLRM
351
0
0
19 Nov 2025
Continual Reinforcement Learning for Cyber-Physical Systems: Lessons Learned and Open Challenges
Continual Reinforcement Learning for Cyber-Physical Systems: Lessons Learned and Open Challenges
Kim N. Nolle
Ivana Dusparic
Rhodri Cusack
Vinny Cahill
CLL
243
0
0
19 Nov 2025
Vehicle Routing Problems via Quantum Graph Attention Network Deep Reinforcement Learning
Vehicle Routing Problems via Quantum Graph Attention Network Deep Reinforcement Learning
Le Tung Giang
Vu Hoang Viet
Nguyen Xuan Tung
Trinh Van Chien
Won-Joo Hwang
GNN
251
0
0
19 Nov 2025
BD-Net: Has Depth-Wise Convolution Ever Been Applied in Binary Neural Networks?
BD-Net: Has Depth-Wise Convolution Ever Been Applied in Binary Neural Networks?
DoYoung Kim
Jin-Seop Lee
Noo-Ri Kim
SungJoon Lee
Jee-Hyong Lee
MQ
153
3
0
19 Nov 2025
DEPO: Dual-Efficiency Preference Optimization for LLM Agents
DEPO: Dual-Efficiency Preference Optimization for LLM Agents
Sirui Chen
Mengshi Zhao
Lei Xu
Yuying Zhao
B. Zhu
H. Zhang
Shengjie Zhao
Chaochao Lu
LLMAG
319
0
0
19 Nov 2025
Learning Where, What and How to Transfer: A Multi-Role Reinforcement Learning Approach for Evolutionary Multitasking
Learning Where, What and How to Transfer: A Multi-Role Reinforcement Learning Approach for Evolutionary Multitasking
Jiajun Zhan
Zeyuan Ma
Yue-Jiao Gong
Kay Chen Tan
OffRL
204
0
0
19 Nov 2025
Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
Samih Fadli
61
0
0
19 Nov 2025
GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
Yanchen Xu
Ziheng Jiao
H. Zhang
Xuelong Li
315
0
0
19 Nov 2025
Reasoning in Diffusion Large Language Models is Concentrated in Dynamic Confusion Zones
Reasoning in Diffusion Large Language Models is Concentrated in Dynamic Confusion Zones
Ranfei Chen
Ming Chen
Kaifei Wang
DiffMAI4CELRM
198
0
0
19 Nov 2025
Aligning Generative Music AI with Human Preferences: Methods and Challenges
Aligning Generative Music AI with Human Preferences: Methods and Challenges
Dorien Herremans
Abhinaba Roy
132
0
0
19 Nov 2025
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models
Senyu Fei
Siyin Wang
Li Ji
Ao Li
Shiduo Zhang
Liming Liu
Jinlong Hou
Jingjing Gong
Xianzhong Zhao
Xipeng Qiu
115
0
0
19 Nov 2025
Previous
12345...227228229
Next