Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1903.07940
Cited By
v1
v2 (latest)
Truly Proximal Policy Optimization
Conference on Uncertainty in Artificial Intelligence (UAI), 2019
19 March 2019
Yuhui Wang
Hao He
Chao Wen
Xiaoyang Tan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Truly Proximal Policy Optimization"
50 / 54 papers shown
Peer-to-Peer Energy Trading in Dairy Farms using Multi-Agent Reinforcement Learning
Applied Energy (Appl. Energy), 2025
Mian Ibad Ali Shah
Marcos Eduardo Cruz Victorio
Maeve Duffy
Enda Barrett
Karl Mason
71
0
0
28 Nov 2025
Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories
Rushuai Yang
Zhiyuan Feng
Tianxiang Zhang
Kaixin Wang
Chuheng Zhang
Li Zhao
Xiu Su
Yi-Ling Chen
Jiang Bian
OffRL
205
0
0
24 Nov 2025
Directional-Clamp PPO
Gilad Karpel
Ruida Zhou
Shoham Sabach
Mohammad Ghavamzadeh
73
0
0
04 Nov 2025
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun
Hang Hua
Jian Wang
Jiebo Luo
S. Dianat
Majid Rabbani
Raghuveer Rao
Zhiqiang Tao
BDL
OffRL
LRM
273
7
0
27 Oct 2025
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
Mingyang Lyu
Yinqian Sun
Erliang Lin
Huangrui Li
Ruolin Chen
Feifei Zhao
Yi Zeng
113
0
0
11 Oct 2025
HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
X. Wang
Jinyi Han
Zishang Jiang
Tingyun Li
Jiaqing Liang
Sihang Jiang
Zhaoqian Dai
Shuguang Ma
Fei Yu
Yanghua Xiao
LRM
132
2
0
10 Oct 2025
TROLL: Trust Regions improve Reinforcement Learning for Large Language Models
P. Becker
Niklas Freymuth
Serge Thilges
Fabian Otto
Gerhard Neumann
84
0
0
04 Oct 2025
Failure Modes of Maximum Entropy RLHF
Ömer Veysel Çağatan
Barış Akgün
120
0
0
24 Sep 2025
BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
Azhar Ikhtiarudin
Aditi Das
Param Thakkar
Akash Kundu
130
4
0
16 Jul 2025
Relative Entropy Pathwise Policy Optimization
C. Voelcker
Axel Brunnbauer
Marcel Hussing
Michal Nauman
Pieter Abbeel
Eric Eaton
Radu Grosu
Amir-massoud Farahmand
Igor Gilitschenski
369
0
0
15 Jul 2025
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park
Jeehye Na
Jinyoung Kim
H. Kim
OffRL
366
22
0
09 Jun 2025
PPO in the Fisher-Rao geometry
Razvan-Andrei Lascu
David Siska
Łukasz Szpruch
261
1
0
04 Jun 2025
Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization
IEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021
Shixuan Liu
Yanghe Feng
Keyu Wu
Guangquan Cheng
Jincai Huang
Zhong Liu
CML
250
8
0
27 Dec 2024
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization
Gao Tianci
Konstantin A. Neusypin
Konstantin A. Neusypin
Yang Bo
Shengren Rao
OffRL
577
2
0
02 Sep 2024
Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank
International Conference on Information and Knowledge Management (CIKM), 2024
Shashank Gupta
Harrie Oosterhuis
Maarten de Rijke
450
9
0
29 Jul 2024
Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback
Xin Chen
Virgile Foussereau
EGVM
149
1
0
27 Jun 2024
Systematically Exploring the Landscape of Grasp Affordances via Behavioral Manifolds
Michael Zechmair
Yannick Morel
249
0
0
07 May 2024
Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
IEEE Access (IEEE Access), 2024
Xiao Hu
Tianshu Wang
Min Gong
Shaoshi Yang
77
3
0
04 May 2024
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Skander Moalla
Andrea Miele
Razvan Pascanu
Çağlar Gülçehre
321
17
0
01 May 2024
Discovering Temporally-Aware Reinforcement Learning Algorithms
Matthew Jackson
Chris Xiaoxuan Lu
Louis Kirsch
R. T. Lange
Shimon Whiteson
Jakob N. Foerster
307
21
0
08 Feb 2024
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods
Zhengpeng Xie
Changdong Yu
Weizheng Qiao
393
2
0
31 Oct 2023
A Review of Reinforcement Learning for Natural Language Processing, and Applications in Healthcare
Ying Liu
Haozhu Wang
Huixue Zhou
Mingchen Li
Yu Hou
Sicheng Zhou
Fang Wang
Rama Hoetzlein
Rui Zhang
OffRL
LM&MA
210
3
0
23 Oct 2023
Absolute Policy Optimization
Weiye Zhao
Feihan Li
Yifan Sun
Rui Chen
Tianhao Wei
Changliu Liu
434
5
0
20 Oct 2023
Machine Learning Meets Advanced Robotic Manipulation
Information Fusion (Inf. Fusion), 2023
Saeid Nahavandi
R. Alizadehsani
D. Nahavandi
Chee Peng Lim
Kevin Kelly
Fernando Bello
235
24
0
22 Sep 2023
Reinforcement Learning Informed Evolutionary Search for Autonomous Systems Testing
ACM Transactions on Software Engineering and Methodology (TOSEM), 2023
D. Humeniuk
Foutse Khomh
G. Antoniol
151
5
0
24 Aug 2023
Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization
Mohammad Mehdi Nasiri
M. Rezghi
282
0
0
13 Aug 2023
Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting Task
International Symposium on Industrial Electronics (ISIE), 2023
Reuf Kozlica
S. Wegenkittl
Simon Hirlaender
OffRL
119
13
0
02 Jun 2023
Neuroevolution of Recurrent Architectures on Control Tasks
Maximilien Le Clei
Pierre C. Bellec
67
5
0
03 Apr 2023
Robustness of Utilizing Feedback in Embodied Visual Navigation
Jenny Zhang
Samson Yu
Jiafei Duan
Cheston Tan
109
1
0
06 Mar 2023
Order Matters: Agent-by-agent Policy Optimization
International Conference on Learning Representations (ICLR), 2023
Xihuai Wang
Zheng Tian
Bo Liu
Ying Wen
Jun Wang
Weinan Zhang
308
43
0
13 Feb 2023
Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization
Zichuan Lin
Xiapeng Wu
Mingfei Sun
Deheng Ye
Qiang Fu
Wei Yang
Wei Liu
226
3
0
05 Feb 2023
Partial advantage estimator for proximal policy optimization
Xiulei Song
Yi-Fan Jin
Greg Slabaugh
Simon Lucas
OffRL
91
0
0
26 Jan 2023
Joint action loss for proximal policy optimization
Xiulei Song
Yi-Fan Jin
Greg Slabaugh
Simon Lucas
202
0
0
26 Jan 2023
Discovered Policy Optimisation
Neural Information Processing Systems (NeurIPS), 2022
Chris Xiaoxuan Lu
J. Kuba
Alistair Letcher
Luke Metz
Christian Schroeder de Witt
Jakob N. Foerster
OffRL
337
109
0
11 Oct 2022
Entropy Augmented Reinforcement Learning
Jianfei Ma
250
1
0
19 Aug 2022
Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL
J. Kuba
Xidong Feng
Shiyao Ding
Hao Dong
Jun Wang
Yaodong Yang
161
29
0
02 Aug 2022
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse
IEEE Transactions on Automatic Control (TAC), 2022
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
303
3
0
28 Jun 2022
Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation
Jenny Zhang
Samson Yu
Jiafei Duan
Cheston Tan
294
5
0
20 Jun 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
AAAI Conference on Artificial Intelligence (AAAI), 2022
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
423
23
0
20 May 2022
Proximal Policy Optimization Learning based Control of Congested Freeway Traffic
Optimal control applications & methods (OCAM), 2022
Shurong Mo
Nailong Wu
Jie Qi
Anqi Pan
Zhiguang Feng
Huaicheng Yan
Yueying Wang
147
2
0
12 Apr 2022
Proximal Policy Optimization with Adaptive Threshold for Symmetric Relative Density Ratio
Results in Control and Optimization (RCO), 2022
Taisuke Kobayashi
111
6
0
18 Mar 2022
Autonomous Drone Swarm Navigation and Multi-target Tracking in 3D Environments with Dynamic Obstacles
IEEE Access (IEEE Access), 2022
Suleman Qamar
Dr. Saddam Hussain Khan
Muhammad Arif Arshad
Maryam Qamar
Asifullah Khan
145
25
0
13 Feb 2022
You May Not Need Ratio Clipping in PPO
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
172
18
0
31 Jan 2022
Mirror Learning: A Unifying Framework of Policy Optimisation
International Conference on Machine Learning (ICML), 2022
J. Kuba
Christian Schroeder de Witt
Jakob N. Foerster
704
35
0
07 Jan 2022
Generalized Proximal Policy Optimization with Sample Reuse
Neural Information Processing Systems (NeurIPS), 2021
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
260
54
0
29 Oct 2021
CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric
Yunxiao Guo
Han Long
Xiaojun Duan
Kaiyuan Feng
Maochu Li
Xiaying Ma
79
4
0
20 Oct 2021
Offline Reinforcement Learning with Soft Behavior Regularization
Haoran Xu
Xianyuan Zhan
Jianxiong Li
Honglei Yin
OffRL
129
33
0
14 Oct 2021
A Reinforcement Learning based Path Planning Approach in 3D Environment
Mathematical Methods in Technologies and Technics (MMTT), 2021
Geesara Kulathunga
183
33
0
21 May 2021
Proximal Policy Optimization Smoothed Algorithm
Wangshu Zhu
A. Rosendo
122
2
0
04 Dec 2020
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
Christian Schroeder de Witt
Tarun Gupta
Denys Makoviichuk
Viktor Makoviychuk
Juil Sock
Mingfei Sun
Shimon Whiteson
253
472
0
18 Nov 2020
1
2
Next