ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.05990
  4. Cited By
What Matters In On-Policy Reinforcement Learning? A Large-Scale
  Empirical Study

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

10 June 2020
Marcin Andrychowicz
Anton Raichuk
Piotr Stańczyk
Manu Orsini
Sertan Girgin
Raphaël Marinier
Léonard Hussenot
Matthieu Geist
Olivier Pietquin
Marcin Michalski
Sylvain Gelly
Olivier Bachem
    OffRL
ArXiv (abs)PDFHTML

Papers citing "What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study"

50 / 136 papers shown
Tactile-based Object Retrieval From Granular Media
Tactile-based Object Retrieval From Granular Media
Jingxi Xu
Yinsen Jia
Dongxiao Yang
Patrick Meng
Xinyue Zhu
Zihan Guo
Shuran Song
M. Ciocarlie
195
11
0
24 Dec 2025
Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA
Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+(λλλ,λλλ))-GA
Tai Nguyen
Phong Le
André Biedenkapp
Carola Doerr
Nguyen Dang
62
0
0
03 Dec 2025
Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control
Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control
Fabian Kresse
Christoph H. Lampert
204
0
0
01 Dec 2025
Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design
Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design
M. Solbach
John K. Tsotsos
OffRL
162
0
0
17 Nov 2025
Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments
Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments
Bryan L. M. de Oliveira
Felipe V. Frujeri
Marcos P. C. M. Queiroz
Luana G. B. Martins
Telma W. de L. Soares
Luckeciano C. Melo
OffRL
175
0
0
05 Nov 2025
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning
Simin Li
Zihao Mao
Hanxiao Li
Zonglei Jing
Zhuohang bian
...
Yuqing Ma
Bo An
Yaodong Yang
Weifeng Lv
Xianglong Liu
146
0
0
13 Oct 2025
Single-stream Policy Optimization
Single-stream Policy Optimization
Zhongwen Xu
Zihan Ding
OffRL
187
5
0
16 Sep 2025
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Zihe Liu
Jiashun Liu
Yancheng He
Weixun Wang
Jiaheng Liu
...
Siran Yang
Jiamang Wang
Yuchi Xu
Bo Zheng
B. Zheng
OffRL
119
26
0
11 Aug 2025
Magistral
Magistral
Mistral-AI
Abhinav Rastogi
Albert Q. Jiang
Andy Lo
Gabrielle Berrada
...
Virgile Richard
Wen-Ding Li
William Marshall
Xuanyu Zhang
Yunhao Tang
OffRLReLMMoEAI4TSLRM
304
9
0
12 Jun 2025
FAuNO: Semi-Asynchronous Federated Reinforcement Learning Framework for Task Offloading in Edge Systems
FAuNO: Semi-Asynchronous Federated Reinforcement Learning Framework for Task Offloading in Edge Systems
Frederico Metelo
Alexandre Oliveira
Stevo Racković
Pedro Ákos Costa
Cláudia Soares
OffRLFedML
149
1
0
03 Jun 2025
Learning coordinated badminton skills for legged manipulators
Learning coordinated badminton skills for legged manipulators
Yuntao Ma
Andrei Cramariuc
Farbod Farshidian
Marco Hutter
269
21
0
29 May 2025
A critical assessment of reinforcement learning methods for microswimmer navigation in complex flows
A critical assessment of reinforcement learning methods for microswimmer navigation in complex flows
Selim Mecanna
Aurore Loisy
Christophe Eloy
250
1
0
08 May 2025
Dynamic Action Interpolation: A Universal Approach for Accelerating Reinforcement Learning with Expert Guidance
Dynamic Action Interpolation: A Universal Approach for Accelerating Reinforcement Learning with Expert Guidance
Wenjun Cao
223
0
0
26 Apr 2025
Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes
Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes
Stella C. Dong
James R. Finlay
154
1
0
13 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Christian Schroeder de Witt
Matthias Bethge
ReLMALMLRM
602
67
0
09 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRLLRM
427
13
0
03 Apr 2025
Differentiable Information Enhanced Model-Based Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2025
Xiaoyuan Zhang
Xinyan Cai
Bo Liu
Weidong Huang
Song-Chun Zhu
Siyuan Qi
Y. Yang
248
3
0
03 Mar 2025
Average-Reward Soft Actor-Critic
Average-Reward Soft Actor-Critic
Jacob Adamczyk
Volodymyr Makarenko
Stas Tiomkin
R. Kulkarni
OOD
264
2
0
15 Jan 2025
Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam
  Timesteps
Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam TimestepsNeural Information Processing Systems (NeurIPS), 2024
Benjamin Ellis
Matthew Jackson
Andrei Lupu
Alexander David Goldie
Mattie Fellows
Shimon Whiteson
Jakob Foerster
354
6
0
22 Dec 2024
Multi-Task Reinforcement Learning for Quadrotors
Multi-Task Reinforcement Learning for QuadrotorsIEEE Robotics and Automation Letters (RA-L), 2024
Jiaxu Xing
Ismail Geles
Yunlong Song
Elie Aljalbout
Davide Scaramuzza
360
18
0
17 Dec 2024
A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning
A Method for Evaluating Hyperparameter Sensitivity in Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Jacob Adkins
Michael Bowling
Adam White
334
17
0
10 Dec 2024
Beyond the Boundaries of Proximal Policy Optimization
Beyond the Boundaries of Proximal Policy Optimization
Charlie B. Tan
Edan Toledo
Benjamin Ellis
Jakob Foerster
Ferenc Huszár
219
1
0
01 Nov 2024
Fast Deep Hedging with Second-Order Optimization
Fast Deep Hedging with Second-Order OptimizationInternational Conference on AI in Finance (ICAF), 2024
Konrad Mueller
Amira Akkari
Lukas Gonon
Ben Wood
ODL
247
4
0
29 Oct 2024
AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design
AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent DesignInternational Conference on Agents and Artificial Intelligence (ICAART), 2024
Francisco Erivaldo Fernandes Junior
Antti Oulasvirta
1.1K
0
0
25 Oct 2024
Streaming Deep Reinforcement Learning Finally Works
Streaming Deep Reinforcement Learning Finally Works
Mohamed Elsayed
Gautham Vasan
A. R. Mahmood
OffRL
277
13
0
18 Oct 2024
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2024
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Hyunseung Kim
Jun Jet Tai
K. Subramanian
Peter R. Wurman
Jaegul Choo
Peter Stone
Takuma Seno
OffRL
464
41
0
13 Oct 2024
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter EfficientInternational Conference on Learning Representations (ICLR), 2024
Wenlong Wang
Ivana Dusparic
Yucheng Shi
Ke Zhang
Vinny Cahill
Mamba
1.0K
3
0
11 Oct 2024
Effective Tuning Strategies for Generalist Robot Manipulation Policies
Effective Tuning Strategies for Generalist Robot Manipulation PoliciesIEEE International Conference on Robotics and Automation (ICRA), 2024
Wenbo Zhang
Yang Li
Yanyuan Qiao
Siyuan Huang
Jiajun Liu
Feras Dayoub
Xiao Ma
Lingqiao Liu
173
5
0
02 Oct 2024
Gradient Boosting Reinforcement Learning
Gradient Boosting Reinforcement Learning
Benjamin Fuhrer
Chen Tessler
Gal Dalal
OffRLAI4CE
476
4
0
11 Jul 2024
Structural Design Through Reinforcement Learning
Structural Design Through Reinforcement Learning
Thomas Rochefort-Beaudoin
Aurelian Vadean
Niels Aage
S. Achiche
AI4CE
143
2
0
10 Jul 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed
  Dialogue with a Multi-Turn Planner
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
264
10
0
17 Jun 2024
FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement
  Learning
FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Wenzhe Li
Zihan Ding
Seth Karten
Chi Jin
347
9
0
04 Jun 2024
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
Arthur Juliani
Jordan T. Ash
OffRLOnRLCLL
280
18
0
29 May 2024
Bigger, Regularized, Optimistic: scaling for compute and
  sample-efficient continuous control
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
Michal Nauman
M. Ostaszewski
Krzysztof Jankowski
Piotr Milo's
Marek Cygan
OffRL
253
62
0
25 May 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
219
61
0
23 May 2024
Decentralized Coordination of Distributed Energy Resources through Local
  Energy Markets and Deep Reinforcement Learning
Decentralized Coordination of Distributed Energy Resources through Local Energy Markets and Deep Reinforcement Learning
Daniel May
Matthew E. Taylor
Petr Musílek
155
4
0
19 Apr 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations
  Through Their Preferred Concept Descriptions
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
Reza Esfandiarpoor
Cristina Menghini
Stephen H. Bach
CoGeVLM
305
15
0
25 Mar 2024
Simple Ingredients for Offline Reinforcement Learning
Simple Ingredients for Offline Reinforcement Learning
Edoardo Cetin
Andrea Tirinzoni
Matteo Pirotta
A. Lazaric
Yann Ollivier
Ahmed Touati
OffRL
327
2
0
19 Mar 2024
Generalising Multi-Agent Cooperation through Task-Agnostic Communication
Generalising Multi-Agent Cooperation through Task-Agnostic Communication
Dulhan Jayalath
Steven D. Morad
Amanda Prorok
164
0
0
11 Mar 2024
A Case for Validation Buffer in Pessimistic Actor-Critic
A Case for Validation Buffer in Pessimistic Actor-Critic
Michal Nauman
M. Ostaszewski
Marek Cygan
227
0
0
01 Mar 2024
Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter
  Lesson of Reinforcement Learning
Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning
Michal Nauman
Michal Bortkiewicz
Piotr Milo's
Tomasz Trzciñski
M. Ostaszewski
Marek Cygan
OffRL
343
41
0
01 Mar 2024
Beacon, a lightweight deep reinforcement learning benchmark library for
  flow control
Beacon, a lightweight deep reinforcement learning benchmark library for flow control
J. Viquerat
P. Meliga
Pablo Jeken
E. Hachem
AI4CE
216
1
0
27 Feb 2024
Natural Language Reinforcement Learning
Natural Language Reinforcement Learning
Xidong Feng
Bo Liu
Mengyue Yang
Ziyan Wang
Girish A. Koushiks
Yali Du
Ying Wen
Jun Wang
OffRL
278
12
0
11 Feb 2024
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement
  Learning
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Shengyi Huang
Quentin Gallouedec
Florian Felten
Antonin Raffin
Rousslan Fernand Julien Dossa
...
Alexander Nikulin
Xiao Hu
Tianlin Liu
Jongwook Choi
Brent Yi
OffRL
263
20
0
05 Feb 2024
Behind the Myth of Exploration in Policy Gradients
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
358
1
0
31 Jan 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning:
  Theory, Algorithms and Implementations
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
283
8
0
24 Jan 2024
Retrieval-Guided Reinforcement Learning for Boolean Circuit Minimization
Retrieval-Guided Reinforcement Learning for Boolean Circuit MinimizationInternational Conference on Learning Representations (ICLR), 2024
A. B. Chowdhury
Marco Romanelli
Benjamin Tan
Ramesh Karri
Siddharth Garg
193
14
0
22 Jan 2024
ReFT: Reasoning with Reinforced Fine-Tuning
ReFT: Reasoning with Reinforced Fine-TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Trung Quoc Luong
Xinbo Zhang
Zhanming Jie
Yang Liu
Xiaoran Jin
Hang Li
OffRLLRMReLM
321
238
0
17 Jan 2024
EgoGen: An Egocentric Synthetic Data Generator
EgoGen: An Egocentric Synthetic Data GeneratorComputer Vision and Pattern Recognition (CVPR), 2024
Gen Li
Kai Zhao
Siwei Zhang
X. Lyu
Mihai Dusmanu
Yan Zhang
Marc Pollefeys
Siyu Tang
EgoVVGen
447
24
0
16 Jan 2024
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
Alexander Nikulin
Vladislav Kurenkov
Ilya Zisman
Artem Agarkov
Viacheslav Sinii
Sergey Kolesnikov
455
47
0
19 Dec 2023
123
Next