ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.07940
  4. Cited By
Truly Proximal Policy Optimization
v1v2 (latest)

Truly Proximal Policy Optimization

Conference on Uncertainty in Artificial Intelligence (UAI), 2019
19 March 2019
Yuhui Wang
Hao He
Chao Wen
Xiaoyang Tan
ArXiv (abs)PDFHTML

Papers citing "Truly Proximal Policy Optimization"

50 / 54 papers shown
Peer-to-Peer Energy Trading in Dairy Farms using Multi-Agent Reinforcement Learning
Peer-to-Peer Energy Trading in Dairy Farms using Multi-Agent Reinforcement LearningApplied Energy (Appl. Energy), 2025
Mian Ibad Ali Shah
Marcos Eduardo Cruz Victorio
Maeve Duffy
Enda Barrett
Karl Mason
71
0
0
28 Nov 2025
Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories
Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories
Rushuai Yang
Zhiyuan Feng
Tianxiang Zhang
Kaixin Wang
Chuheng Zhang
Li Zhao
Xiu Su
Yi-Ling Chen
Jiang Bian
OffRL
205
0
0
24 Nov 2025
Directional-Clamp PPO
Directional-Clamp PPO
Gilad Karpel
Ruida Zhou
Shoham Sabach
Mohammad Ghavamzadeh
73
0
0
04 Nov 2025
Latent Chain-of-Thought for Visual Reasoning
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun
Hang Hua
Jian Wang
Jiebo Luo
S. Dianat
Majid Rabbani
Raghuveer Rao
Zhiqiang Tao
BDLOffRLLRM
273
7
0
27 Oct 2025
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
Mingyang Lyu
Yinqian Sun
Erliang Lin
Huangrui Li
Ruolin Chen
Feifei Zhao
Yi Zeng
113
0
0
11 Oct 2025
HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
X. Wang
Jinyi Han
Zishang Jiang
Tingyun Li
Jiaqing Liang
Sihang Jiang
Zhaoqian Dai
Shuguang Ma
Fei Yu
Yanghua Xiao
LRM
132
2
0
10 Oct 2025
TROLL: Trust Regions improve Reinforcement Learning for Large Language Models
TROLL: Trust Regions improve Reinforcement Learning for Large Language Models
P. Becker
Niklas Freymuth
Serge Thilges
Fabian Otto
Gerhard Neumann
84
0
0
04 Oct 2025
Failure Modes of Maximum Entropy RLHF
Failure Modes of Maximum Entropy RLHF
Ömer Veysel Çağatan
Barış Akgün
120
0
0
24 Sep 2025
BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
Azhar Ikhtiarudin
Aditi Das
Param Thakkar
Akash Kundu
130
4
0
16 Jul 2025
Relative Entropy Pathwise Policy Optimization
Relative Entropy Pathwise Policy Optimization
C. Voelcker
Axel Brunnbauer
Marcel Hussing
Michal Nauman
Pieter Abbeel
Eric Eaton
Radu Grosu
Amir-massoud Farahmand
Igor Gilitschenski
369
0
0
15 Jul 2025
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park
Jeehye Na
Jinyoung Kim
H. Kim
OffRL
366
22
0
09 Jun 2025
PPO in the Fisher-Rao geometry
PPO in the Fisher-Rao geometry
Razvan-Andrei Lascu
David Siska
Łukasz Szpruch
261
1
0
04 Jun 2025
Graph-attention-based Casual Discovery with Trust Region-navigated
  Clipping Policy Optimization
Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy OptimizationIEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021
Shixuan Liu
Yanghe Feng
Keyu Wu
Guangquan Cheng
Jincai Huang
Zhong Liu
CML
250
8
0
27 Dec 2024
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization
Gao Tianci
Konstantin A. Neusypin
Konstantin A. Neusypin
Yang Bo
Shengren Rao
OffRL
577
2
0
02 Sep 2024
Practical and Robust Safety Guarantees for Advanced Counterfactual
  Learning to Rank
Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to RankInternational Conference on Information and Knowledge Management (CIKM), 2024
Shashank Gupta
Harrie Oosterhuis
Maarten de Rijke
450
9
0
29 Jul 2024
Diminishing Stereotype Bias in Image Generation Model using
  Reinforcemenlent Learning Feedback
Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback
Xin Chen
Virgile Foussereau
EGVM
149
1
0
27 Jun 2024
Systematically Exploring the Landscape of Grasp Affordances via
  Behavioral Manifolds
Systematically Exploring the Landscape of Grasp Affordances via Behavioral Manifolds
Michael Zechmair
Yannick Morel
249
0
0
07 May 2024
Guidance Design for Escape Flight Vehicle Using Evolution Strategy
  Enhanced Deep Reinforcement Learning
Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement LearningIEEE Access (IEEE Access), 2024
Xiao Hu
Tianshu Wang
Min Gong
Shaoshi Yang
77
3
0
04 May 2024
No Representation, No Trust: Connecting Representation, Collapse, and
  Trust Issues in PPO
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Skander Moalla
Andrea Miele
Razvan Pascanu
Çağlar Gülçehre
321
17
0
01 May 2024
Discovering Temporally-Aware Reinforcement Learning Algorithms
Discovering Temporally-Aware Reinforcement Learning Algorithms
Matthew Jackson
Chris Xiaoxuan Lu
Louis Kirsch
R. T. Lange
Shimon Whiteson
Jakob N. Foerster
307
21
0
08 Feb 2024
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate
  Objective Variance in Policy Optimization Methods
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods
Zhengpeng Xie
Changdong Yu
Weizheng Qiao
393
2
0
31 Oct 2023
A Review of Reinforcement Learning for Natural Language Processing, and
  Applications in Healthcare
A Review of Reinforcement Learning for Natural Language Processing, and Applications in Healthcare
Ying Liu
Haozhu Wang
Huixue Zhou
Mingchen Li
Yu Hou
Sicheng Zhou
Fang Wang
Rama Hoetzlein
Rui Zhang
OffRLLM&MA
210
3
0
23 Oct 2023
Absolute Policy Optimization
Absolute Policy Optimization
Weiye Zhao
Feihan Li
Yifan Sun
Rui Chen
Tianhao Wei
Changliu Liu
434
5
0
20 Oct 2023
Machine Learning Meets Advanced Robotic Manipulation
Machine Learning Meets Advanced Robotic ManipulationInformation Fusion (Inf. Fusion), 2023
Saeid Nahavandi
R. Alizadehsani
D. Nahavandi
Chee Peng Lim
Kevin Kelly
Fernando Bello
235
24
0
22 Sep 2023
Reinforcement Learning Informed Evolutionary Search for Autonomous
  Systems Testing
Reinforcement Learning Informed Evolutionary Search for Autonomous Systems TestingACM Transactions on Software Engineering and Methodology (TOSEM), 2023
D. Humeniuk
Foutse Khomh
G. Antoniol
151
5
0
24 Aug 2023
Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent
  Policy Optimization
Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization
Mohammad Mehdi Nasiri
M. Rezghi
282
0
0
13 Aug 2023
Deep Q-Learning versus Proximal Policy Optimization: Performance
  Comparison in a Material Sorting Task
Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting TaskInternational Symposium on Industrial Electronics (ISIE), 2023
Reuf Kozlica
S. Wegenkittl
Simon Hirlaender
OffRL
119
13
0
02 Jun 2023
Neuroevolution of Recurrent Architectures on Control Tasks
Neuroevolution of Recurrent Architectures on Control Tasks
Maximilien Le Clei
Pierre C. Bellec
67
5
0
03 Apr 2023
Robustness of Utilizing Feedback in Embodied Visual Navigation
Robustness of Utilizing Feedback in Embodied Visual Navigation
Jenny Zhang
Samson Yu
Jiafei Duan
Cheston Tan
109
1
0
06 Mar 2023
Order Matters: Agent-by-agent Policy Optimization
Order Matters: Agent-by-agent Policy OptimizationInternational Conference on Learning Representations (ICLR), 2023
Xihuai Wang
Zheng Tian
Bo Liu
Ying Wen
Jun Wang
Weinan Zhang
308
43
0
13 Feb 2023
Sample Dropout: A Simple yet Effective Variance Reduction Technique in
  Deep Policy Optimization
Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization
Zichuan Lin
Xiapeng Wu
Mingfei Sun
Deheng Ye
Qiang Fu
Wei Yang
Wei Liu
226
3
0
05 Feb 2023
Partial advantage estimator for proximal policy optimization
Partial advantage estimator for proximal policy optimization
Xiulei Song
Yi-Fan Jin
Greg Slabaugh
Simon Lucas
OffRL
91
0
0
26 Jan 2023
Joint action loss for proximal policy optimization
Joint action loss for proximal policy optimization
Xiulei Song
Yi-Fan Jin
Greg Slabaugh
Simon Lucas
202
0
0
26 Jan 2023
Discovered Policy Optimisation
Discovered Policy OptimisationNeural Information Processing Systems (NeurIPS), 2022
Chris Xiaoxuan Lu
J. Kuba
Alistair Letcher
Luke Metz
Christian Schroeder de Witt
Jakob N. Foerster
OffRL
337
109
0
11 Oct 2022
Entropy Augmented Reinforcement Learning
Entropy Augmented Reinforcement Learning
Jianfei Ma
250
1
0
19 Aug 2022
Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to
  Cooperative MARL
Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL
J. Kuba
Xidong Feng
Shiyao Ding
Hao Dong
Jun Wang
Yaodong Yang
161
29
0
02 Aug 2022
Generalized Policy Improvement Algorithms with Theoretically Supported
  Sample Reuse
Generalized Policy Improvement Algorithms with Theoretically Supported Sample ReuseIEEE Transactions on Automatic Control (TAC), 2022
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
303
3
0
28 Jun 2022
Good Time to Ask: A Learning Framework for Asking for Help in Embodied
  Visual Navigation
Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation
Jenny Zhang
Samson Yu
Jiafei Duan
Cheston Tan
294
5
0
20 Jun 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still
  Insufficient according to an Off-Policy Measure
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy MeasureAAAI Conference on Artificial Intelligence (AAAI), 2022
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
423
23
0
20 May 2022
Proximal Policy Optimization Learning based Control of Congested Freeway
  Traffic
Proximal Policy Optimization Learning based Control of Congested Freeway TrafficOptimal control applications & methods (OCAM), 2022
Shurong Mo
Nailong Wu
Jie Qi
Anqi Pan
Zhiguang Feng
Huaicheng Yan
Yueying Wang
147
2
0
12 Apr 2022
Proximal Policy Optimization with Adaptive Threshold for Symmetric
  Relative Density Ratio
Proximal Policy Optimization with Adaptive Threshold for Symmetric Relative Density RatioResults in Control and Optimization (RCO), 2022
Taisuke Kobayashi
111
6
0
18 Mar 2022
Autonomous Drone Swarm Navigation and Multi-target Tracking in 3D
  Environments with Dynamic Obstacles
Autonomous Drone Swarm Navigation and Multi-target Tracking in 3D Environments with Dynamic ObstaclesIEEE Access (IEEE Access), 2022
Suleman Qamar
Dr. Saddam Hussain Khan
Muhammad Arif Arshad
Maryam Qamar
Asifullah Khan
145
25
0
13 Feb 2022
You May Not Need Ratio Clipping in PPO
You May Not Need Ratio Clipping in PPO
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
172
18
0
31 Jan 2022
Mirror Learning: A Unifying Framework of Policy Optimisation
Mirror Learning: A Unifying Framework of Policy OptimisationInternational Conference on Machine Learning (ICML), 2022
J. Kuba
Christian Schroeder de Witt
Jakob N. Foerster
704
35
0
07 Jan 2022
Generalized Proximal Policy Optimization with Sample Reuse
Generalized Proximal Policy Optimization with Sample ReuseNeural Information Processing Systems (NeurIPS), 2021
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
260
54
0
29 Oct 2021
CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric
CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric
Yunxiao Guo
Han Long
Xiaojun Duan
Kaiyuan Feng
Maochu Li
Xiaying Ma
79
4
0
20 Oct 2021
Offline Reinforcement Learning with Soft Behavior Regularization
Offline Reinforcement Learning with Soft Behavior Regularization
Haoran Xu
Xianyuan Zhan
Jianxiong Li
Honglei Yin
OffRL
129
33
0
14 Oct 2021
A Reinforcement Learning based Path Planning Approach in 3D Environment
A Reinforcement Learning based Path Planning Approach in 3D EnvironmentMathematical Methods in Technologies and Technics (MMTT), 2021
Geesara Kulathunga
183
33
0
21 May 2021
Proximal Policy Optimization Smoothed Algorithm
Proximal Policy Optimization Smoothed Algorithm
Wangshu Zhu
A. Rosendo
122
2
0
04 Dec 2020
Is Independent Learning All You Need in the StarCraft Multi-Agent
  Challenge?
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
Christian Schroeder de Witt
Tarun Gupta
Denys Makoviichuk
Viktor Makoviychuk
Juil Sock
Mingfei Sun
Shimon Whiteson
253
472
0
18 Nov 2020
12
Next