Title
Wasserstein Policy Optimization David Pfau Ian Davies Diana Borsa Joao G. M. Araujo Brendan D. Tracey H. V. Hasselt 29 0 0 01 May 2025
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas Zihao Guo Richard Willis Richard Willis Tristan Tomilin Joel Z Leibo Yali Du 55 0 0 18 Mar 2025
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining Jie Cheng Ruixi Qiao Gang Xiong Binhua Li Yingwei Ma Binhua Li Yongbin Li Yisheng Lv OffRL OnRL LM&Ro 50 3 0 01 Oct 2024
Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning Yuanyang Zhu Zhi Wang Yuanheng Zhu Chunlin Chen Dongbin Zhao 21 0 0 01 Aug 2024
Natural Gradient Interpretation of Rank-One Update in CMA-ES Ryoki Hamano Shinichi Shirakawa Masahiro Nomura 34 0 0 24 Jun 2024
Advantage Alignment Algorithms Juan Agustin Duque Milad Aghajohari Tim Cooijmans Tianyu Zhang Aaron C. Courville Gauthier Gidel Aaron Courville 23 0 0 20 Jun 2024
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning Khimya Khetarpal Z. Guo Bernardo Avila-Pires Yunhao Tang Clare Lyle Mark Rowland N. Heess Diana Borsa A. Guez Will Dabney 37 2 0 04 Jun 2024
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning Lucas-Andrei Thil Mirela Popa Gerasimos Spanakis LLMAG 27 2 0 01 May 2024
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Quentin Gallouedec E. Beeching Clément Romac Emmanuel Dellandréa 21 11 0 15 Feb 2024
SPO: Sequential Monte Carlo Policy Optimisation Matthew Macfarlane Edan Toledo Donal Byrne Paul Duckworth Alexandre Laterre 30 1 0 12 Feb 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations Matthias Lehmann 38 0 0 24 Jan 2024
A dynamical clipping approach with task feedback for Proximal Policy Optimization Ziqi Zhang Jingzehua Xu Zifeng Zhuang Jinxin Liu Donglin Wang Shuai Zhang 22 1 0 12 Dec 2023
Guaranteed Trust Region Optimization via Two-Phase KL Penalization K.R. Zentner Ujjwal Puri Zhehui Huang Gaurav Sukhatme OffRL 19 0 0 08 Dec 2023
H-GAP: Humanoid Control with a Generalist Planner Zhengyao Jiang Yingchen Xu Nolan Wagener Yicheng Luo Michael Janner Edward Grefenstette Tim Rocktaschel Yuandong Tian AI4CE 21 5 0 05 Dec 2023
Replay across Experiments: A Natural Extension of Off-Policy RL Dhruva Tirumala Thomas Lampe José Enrique Chen Tuomas Haarnoja Sandy Huang ... Tim Hertweck Leonard Hasenclever Martin Riedmiller N. Heess Markus Wulfmeier OffRL 32 8 0 27 Nov 2023
Agent as Cerebrum, Controller as Cerebellum: Implementing an Embodied LMM-based Agent on Drones Haoran Zhao Fengxing Pan Huqiuyue Ping Yaoming Zhou AI4CE 42 12 0 25 Nov 2023
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization Guowei Xu Ruijie Zheng Yongyuan Liang Xiyao Wang Zhecheng Yuan ... Shuzhen Li Yanjie Ze Hal Daumé Furong Huang Huazhe Xu 40 28 0 30 Oct 2023
Absolute Policy Optimization Weiye Zhao Feihan Li Yifan Sun Rui Chen Tianhao Wei Changliu Liu 31 4 0 20 Oct 2023
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents Marco Pleines Matthias Pallasch Frank Zimmer Mike Preuss OffRL 29 0 0 29 Sep 2023
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking Homanga Bharadhwaj Jay Vakil Mohit Sharma Abhi Gupta Shubham Tulsiani Vikash Kumar LM&Ro 21 116 0 05 Sep 2023
Reinforced Self-Training (ReST) for Language Modeling Çağlar Gülçehre T. Paine S. Srinivasan Ksenia Konyushkova L. Weerts ... Chenjie Gu Wolfgang Macherey Arnaud Doucet Orhan Firat Nando de Freitas OffRL 31 274 0 17 Aug 2023
RLBoost: Boosting Supervised Models using Deep Reinforcement Learning Eloy Anguiano Batanero Ángela Fernández Pascual Á. Jiménez OffRL 13 0 0 23 May 2023
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints Akhil Agnihotri R. Jain Haipeng Luo 18 2 0 02 Feb 2023
On Transforming Reinforcement Learning by Transformer: The Development Trajectory Shengchao Hu Li Shen Ya-Qin Zhang Yixin Chen Dacheng Tao OffRL 27 25 0 29 Dec 2022
Understanding Self-Predictive Learning for Reinforcement Learning Yunhao Tang Z. Guo Pierre Harvey Richemond Bernardo Avila-Pires Yash Chandak ... S. Thakoor Will Dabney Bilal Piot Daniele Calandriello Michal Valko SSL 27 28 0 06 Dec 2022
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes Aviral Kumar Rishabh Agarwal Xinyang Geng George Tucker Sergey Levine OffRL 39 48 0 28 Nov 2022
Melting Pot 2.0 J. Agapiou A. Vezhnevets Edgar A. Duénez-Guzmán Jayd Matyas Yiran Mao ... Sukhdeep Singh Julia Haas Igor Mordatch D. Mobbs Joel Z Leibo 30 31 0 24 Nov 2022
Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments Daniel Jarrett Corentin Tallec Florent Altché Thomas Mesnard Rémi Munos Michal Valko 42 5 0 18 Nov 2022
Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization Burcu Küçükoglu Walraaf Borkent Bodo Rueckauer Nasir Ahmad Umut Güçlü Marcel van Gerven 23 2 0 11 Nov 2022
Leveraging Demonstrations with Latent Space Priors Jonas Gehring Deepak Gopinath Jungdam Won Andreas Krause Gabriel Synnaeve Nicolas Usunier 33 4 0 26 Oct 2022
Augmentative Topology Agents For Open-Ended Learning Muhammad Umair Nasir Michael Beukman Steven D. James C. Cleghorn 27 3 0 20 Oct 2022
Deep Black-Box Reinforcement Learning with Movement Primitives Fabian Otto Onur Celik Hongyi Zhou Hanna Ziesche Ngo Anh Vien Gerhard Neumann OffRL 24 19 0 18 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals Rohin Shah Vikrant Varma Ramana Kumar Mary Phuong Victoria Krakovna J. Uesato Zachary Kenton 34 68 0 04 Oct 2022
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 227 502 0 28 Sep 2022
Human-level Atari 200x faster Steven Kapturowski Victor Campos Ray Jiang Nemanja Rakićević Hado van Hasselt Charles Blundell Adria Puigdomenech Badia OffRL 52 28 0 15 Sep 2022
A model-based approach to meta-Reinforcement Learning: Transformers and tree search Brieuc Pinon Jean-Charles Delvenne Raphaël Jungers OffRL 24 3 0 24 Aug 2022
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse James Queeney I. Paschalidis Christos G. Cassandras OffRL 24 2 0 28 Jun 2022
BYOL-Explore: Exploration by Bootstrapped Prediction Z. Guo S. Thakoor Miruna Pislar Bernardo Avila-Pires Florent Altché ... Yunhao Tang Michal Valko Rémi Munos M. G. Azar Bilal Piot 22 68 0 16 Jun 2022
Intra-agent speech permits zero-shot task acquisition Chen Yan Federico Carnevale Petko Georgiev Adam Santoro Aurelia Guy Alistair Muldal Chia-Chun Hung Josh Abramson Timothy Lillicrap Greg Wayne LM&Ro 36 9 0 07 Jun 2022
Critic Sequential Monte Carlo Vasileios Lioutas J. Lavington Justice Sefas Matthew Niedoba Yunpeng Liu Berend Zwartsenberg Setareh Dabiri Frank D. Wood Adam Scibior 44 7 0 30 May 2022
Data augmentation for efficient learning from parametric experts Alexandre Galashov J. Merel N. Heess OffRL 14 5 0 23 May 2022
A Generalist Agent Scott E. Reed Konrad Zolna Emilio Parisotto Sergio Gomez Colmenarejo Alexander Novikov ... Yutian Chen R. Hadsell Oriol Vinyals Mahyar Bordbar Nando de Freitas LM&Ro LLMAG AI4CE 56 785 0 12 May 2022
Learning to Constrain Policy Optimization with Virtual Trust Region Hung Le Thommen Karimpanal George Majid Abdolshah D. Nguyen Kien Do Sunil R. Gupta Svetha Venkatesh 28 3 0 20 Apr 2022
JORLDY: a fully customizable open source framework for reinforcement learning Kyushik Min Hyunho Lee Kwansu Shin Tae-woo Lee Hojoon Lee Jinwon Choi Sung-Hyun Son OnRL 14 0 0 11 Apr 2022
Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors Steven Bohez S. Tunyasuvunakool Philemon Brakel Fereshteh Sadeghi Leonard Hasenclever ... Nathan Batchelor Federico Casarini J. Merel R. Hadsell N. Heess 32 51 0 31 Mar 2022
Zipfian environments for Reinforcement Learning Stephanie C. Y. Chan Andrew Kyle Lampinen Pierre Harvey Richemond Felix Hill OffRL 13 15 0 15 Mar 2022
A data-driven approach for learning to control computers Peter C. Humphreys David Raposo Tobias Pohlen Gregory Thornton Rachita Chhaparia ... Josh Abramson Petko Georgiev Alex Goldin Adam Santoro Timothy Lillicrap 25 97 0 16 Feb 2022
Constrained Variational Policy Optimization for Safe Reinforcement Learning Zuxin Liu Zhepeng Cen Vladislav Isenbaev Wei Liu Zhiwei Steven Wu Bo-wen Li Ding Zhao 14 76 0 28 Jan 2022
How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy Badr AlKhamissi Akshay Srinivasan Zeb-Kurth Nelson Samuel Ritter 28 1 0 14 Dec 2021
Towards an Understanding of Default Policies in Multitask Policy Optimization Theodore H. Moskovitz Michael Arbel Jack Parker-Holder Aldo Pacchiano 19 9 0 04 Nov 2021