Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.10897
Cited By
Revisiting Design Choices in Proximal Policy Optimization
23 September 2020
Chloe Ching-Yun Hsu
Celestine Mendler-Dünner
Moritz Hardt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisiting Design Choices in Proximal Policy Optimization"
27 / 27 papers shown
Title
RankPO: Preference Optimization for Job-Talent Matching
Yuyao Zhang
Hao Wu
Yu Wang
Xiaohui Wang
51
0
0
13 Mar 2025
Context Filtering with Reward Modeling in Question Answering
Sangryul Kim
James Thorne
66
0
0
16 Dec 2024
Beyond the Boundaries of Proximal Policy Optimization
Charlie B. Tan
Edan Toledo
Benjamin Ellis
Jakob Foerster
Ferenc Huszár
23
0
0
01 Nov 2024
Learning Coordinated Maneuver in Adversarial Environments
Zechen Hu
Manshi Limbu
Daigo Shishika
Xuesu Xiao
Xuan Wang
31
0
0
12 Jul 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
38
31
0
25 Apr 2024
Extremum-Seeking Action Selection for Accelerating Policy Optimization
Ya-Chien Chang
Sicun Gao
32
0
0
02 Apr 2024
Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration
Xin Mao
Fengming Li
Huimin Xu
Wei Zhang
A. Luu
ALM
45
6
0
25 Feb 2024
Guaranteed Trust Region Optimization via Two-Phase KL Penalization
K.R. Zentner
Ujjwal Puri
Zhehui Huang
Gaurav Sukhatme
OffRL
19
0
0
08 Dec 2023
Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization
Hongzheng Yang
Cheng Chen
Yueyao Chen
Markus Scheppach
Hon-Chi Yip
Qi Dou
EDL
UQCV
18
8
0
05 Nov 2023
A Statistical Guarantee for Representation Transfer in Multitask Imitation Learning
Bryan Chan
Karime Pereida
James Bergstra
44
1
0
02 Nov 2023
Hyperparameters in Reinforcement Learning and How To Tune Them
Theresa Eimer
Marius Lindauer
Roberta Raileanu
OffRL
27
34
0
02 Jun 2023
Lexicographic Multi-Objective Reinforcement Learning
Joar Skalse
Lewis Hammond
Charlie Griffin
Alessandro Abate
17
19
0
28 Dec 2022
Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration
Masood S. Mortazavi
Tiancheng Qin
Ning Yan
17
2
0
03 Nov 2022
Teacher-student curriculum learning for reinforcement learning
Yanick Schraner
OffRL
37
2
0
31 Oct 2022
RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control
Yanfei Xiang
Xin Wang
Shu Hu
Bin Zhu
Xiaomeng Huang
Xi Wu
Siwei Lyu
SSL
29
5
0
20 Oct 2022
Discovered Policy Optimisation
Chris Xiaoxuan Lu
J. Kuba
Alistair Letcher
Luke Metz
Christian Schroeder de Witt
Jakob N. Foerster
OffRL
39
75
0
11 Oct 2022
Entropy Augmented Reinforcement Learning
Jianfei Ma
28
0
0
19 Aug 2022
Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL
J. Kuba
Xidong Feng
Shiyao Ding
Hao Dong
Jun Wang
Yaodong Yang
23
16
0
02 Aug 2022
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
Samuel Sokota
Ryan DÓrazio
J. Zico Kolter
Nicolas Loizou
Marc Lanctot
Ioannis Mitliagkas
Noam Brown
Christian Kroer
23
1
0
12 Jun 2022
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization
Marco Pleines
Matthias Pallasch
F. Zimmer
Mike Preuss
23
13
0
23 May 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
30
8
0
20 May 2022
Automatic Parameter Optimization Using Genetic Algorithm in Deep Reinforcement Learning for Robotic Manipulation Tasks
Adarsh Sehgal
Nicholas Ward
Hung M. La
S. Louis
16
1
0
07 Apr 2022
Learning to Schedule Heuristics for the Simultaneous Stochastic Optimization of Mining Complexes
Yassine Yaakoubi
R. Dimitrakopoulos
33
10
0
25 Feb 2022
Mirror Learning: A Unifying Framework of Policy Optimisation
J. Kuba
Christian Schroeder de Witt
Jakob N. Foerster
20
24
0
07 Jan 2022
A general class of surrogate functions for stable and efficient reinforcement learning
Sharan Vaswani
Olivier Bachem
Simone Totaro
Robert Mueller
Shivam Garg
M. Geist
Marlos C. Machado
Pablo Samuel Castro
Nicolas Le Roux
OffRL
29
15
0
12 Aug 2021
Learning to Design and Construct Bridge without Blueprint
Yunfei Li
Tao Kong
Lei Li
Yifeng Li
Yi Wu
24
7
0
05 Aug 2021
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
Christian Schroeder de Witt
Tarun Gupta
Denys Makoviichuk
Viktor Makoviychuk
Philip H. S. Torr
Mingfei Sun
Shimon Whiteson
21
319
0
18 Nov 2020
1