ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.12238
  4. Cited By
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete
  and Continuous Control

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

26 September 2019
H. F. Song
A. Abdolmaleki
Jost Tobias Springenberg
Aidan Clark
Hubert Soyer
Jack W. Rae
Seb Noury
Arun Ahuja
Siqi Liu
Dhruva Tirumala
N. Heess
Dan Belov
Martin Riedmiller
M. Botvinick
ArXivPDFHTML

Papers citing "V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control"

50 / 88 papers shown
Title
Wasserstein Policy Optimization
Wasserstein Policy Optimization
David Pfau
Ian Davies
Diana Borsa
Joao G. M. Araujo
Brendan D. Tracey
H. V. Hasselt
29
0
0
01 May 2025
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
Zihao Guo
Richard Willis
Richard Willis
Tristan Tomilin
Joel Z Leibo
Yali Du
55
0
0
18 Mar 2025
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng
Ruixi Qiao
Gang Xiong
Binhua Li
Yingwei Ma
Binhua Li
Yongbin Li
Yisheng Lv
OffRL
OnRL
LM&Ro
50
3
0
01 Oct 2024
Discretizing Continuous Action Space with Unimodal Probability
  Distributions for On-Policy Reinforcement Learning
Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning
Yuanyang Zhu
Zhi Wang
Yuanheng Zhu
Chunlin Chen
Dongbin Zhao
21
0
0
01 Aug 2024
Natural Gradient Interpretation of Rank-One Update in CMA-ES
Natural Gradient Interpretation of Rank-One Update in CMA-ES
Ryoki Hamano
Shinichi Shirakawa
Masahiro Nomura
34
0
0
24 Jun 2024
Advantage Alignment Algorithms
Advantage Alignment Algorithms
Juan Agustin Duque
Milad Aghajohari
Tim Cooijmans
Tianyu Zhang
Aaron C. Courville
Gauthier Gidel
Aaron Courville
23
0
0
20 Jun 2024
A Unifying Framework for Action-Conditional Self-Predictive
  Reinforcement Learning
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Khimya Khetarpal
Z. Guo
Bernardo Avila-Pires
Yunhao Tang
Clare Lyle
Mark Rowland
N. Heess
Diana Borsa
A. Guez
Will Dabney
37
2
0
04 Jun 2024
Navigating WebAI: Training Agents to Complete Web Tasks with Large
  Language Models and Reinforcement Learning
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andrei Thil
Mirela Popa
Gerasimos Spanakis
LLMAG
27
2
0
01 May 2024
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
Quentin Gallouedec
E. Beeching
Clément Romac
Emmanuel Dellandréa
21
11
0
15 Feb 2024
SPO: Sequential Monte Carlo Policy Optimisation
SPO: Sequential Monte Carlo Policy Optimisation
Matthew Macfarlane
Edan Toledo
Donal Byrne
Paul Duckworth
Alexandre Laterre
30
1
0
12 Feb 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning:
  Theory, Algorithms and Implementations
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
38
0
0
24 Jan 2024
A dynamical clipping approach with task feedback for Proximal Policy
  Optimization
A dynamical clipping approach with task feedback for Proximal Policy Optimization
Ziqi Zhang
Jingzehua Xu
Zifeng Zhuang
Jinxin Liu
Donglin Wang
Shuai Zhang
22
1
0
12 Dec 2023
Guaranteed Trust Region Optimization via Two-Phase KL Penalization
Guaranteed Trust Region Optimization via Two-Phase KL Penalization
K.R. Zentner
Ujjwal Puri
Zhehui Huang
Gaurav Sukhatme
OffRL
19
0
0
08 Dec 2023
H-GAP: Humanoid Control with a Generalist Planner
H-GAP: Humanoid Control with a Generalist Planner
Zhengyao Jiang
Yingchen Xu
Nolan Wagener
Yicheng Luo
Michael Janner
Edward Grefenstette
Tim Rocktaschel
Yuandong Tian
AI4CE
21
5
0
05 Dec 2023
Replay across Experiments: A Natural Extension of Off-Policy RL
Replay across Experiments: A Natural Extension of Off-Policy RL
Dhruva Tirumala
Thomas Lampe
José Enrique Chen
Tuomas Haarnoja
Sandy Huang
...
Tim Hertweck
Leonard Hasenclever
Martin Riedmiller
N. Heess
Markus Wulfmeier
OffRL
32
8
0
27 Nov 2023
Agent as Cerebrum, Controller as Cerebellum: Implementing an Embodied
  LMM-based Agent on Drones
Agent as Cerebrum, Controller as Cerebellum: Implementing an Embodied LMM-based Agent on Drones
Haoran Zhao
Fengxing Pan
Huqiuyue Ping
Yaoming Zhou
AI4CE
42
12
0
25 Nov 2023
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio
  Minimization
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
Guowei Xu
Ruijie Zheng
Yongyuan Liang
Xiyao Wang
Zhecheng Yuan
...
Shuzhen Li
Yanjie Ze
Hal Daumé
Furong Huang
Huazhe Xu
40
28
0
30 Oct 2023
Absolute Policy Optimization
Absolute Policy Optimization
Weiye Zhao
Feihan Li
Yifan Sun
Rui Chen
Tianhao Wei
Changliu Liu
31
4
0
20 Oct 2023
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of
  Agents
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Marco Pleines
Matthias Pallasch
Frank Zimmer
Mike Preuss
OffRL
29
0
0
29 Sep 2023
RoboAgent: Generalization and Efficiency in Robot Manipulation via
  Semantic Augmentations and Action Chunking
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking
Homanga Bharadhwaj
Jay Vakil
Mohit Sharma
Abhi Gupta
Shubham Tulsiani
Vikash Kumar
LM&Ro
21
116
0
05 Sep 2023
Reinforced Self-Training (ReST) for Language Modeling
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
31
274
0
17 Aug 2023
RLBoost: Boosting Supervised Models using Deep Reinforcement Learning
RLBoost: Boosting Supervised Models using Deep Reinforcement Learning
Eloy Anguiano Batanero
Ángela Fernández Pascual
Á. Jiménez
OffRL
13
0
0
23 May 2023
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Akhil Agnihotri
R. Jain
Haipeng Luo
18
2
0
02 Feb 2023
On Transforming Reinforcement Learning by Transformer: The Development
  Trajectory
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Shengchao Hu
Li Shen
Ya-Qin Zhang
Yixin Chen
Dacheng Tao
OffRL
27
25
0
29 Dec 2022
Understanding Self-Predictive Learning for Reinforcement Learning
Understanding Self-Predictive Learning for Reinforcement Learning
Yunhao Tang
Z. Guo
Pierre Harvey Richemond
Bernardo Avila-Pires
Yash Chandak
...
S. Thakoor
Will Dabney
Bilal Piot
Daniele Calandriello
Michal Valko
SSL
27
28
0
06 Dec 2022
Offline Q-Learning on Diverse Multi-Task Data Both Scales And
  Generalizes
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes
Aviral Kumar
Rishabh Agarwal
Xinyang Geng
George Tucker
Sergey Levine
OffRL
39
48
0
28 Nov 2022
Melting Pot 2.0
Melting Pot 2.0
J. Agapiou
A. Vezhnevets
Edgar A. Duénez-Guzmán
Jayd Matyas
Yiran Mao
...
Sukhdeep Singh
Julia Haas
Igor Mordatch
D. Mobbs
Joel Z Leibo
30
31
0
24 Nov 2022
Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments
Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments
Daniel Jarrett
Corentin Tallec
Florent Altché
Thomas Mesnard
Rémi Munos
Michal Valko
42
5
0
18 Nov 2022
Efficient Deep Reinforcement Learning with Predictive Processing
  Proximal Policy Optimization
Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization
Burcu Küçükoglu
Walraaf Borkent
Bodo Rueckauer
Nasir Ahmad
Umut Güçlü
Marcel van Gerven
23
2
0
11 Nov 2022
Leveraging Demonstrations with Latent Space Priors
Leveraging Demonstrations with Latent Space Priors
Jonas Gehring
Deepak Gopinath
Jungdam Won
Andreas Krause
Gabriel Synnaeve
Nicolas Usunier
33
4
0
26 Oct 2022
Augmentative Topology Agents For Open-Ended Learning
Augmentative Topology Agents For Open-Ended Learning
Muhammad Umair Nasir
Michael Beukman
Steven D. James
C. Cleghorn
27
3
0
20 Oct 2022
Deep Black-Box Reinforcement Learning with Movement Primitives
Deep Black-Box Reinforcement Learning with Movement Primitives
Fabian Otto
Onur Celik
Hongyi Zhou
Hanna Ziesche
Ngo Anh Vien
Gerhard Neumann
OffRL
24
19
0
18 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For
  Correct Goals
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
34
68
0
04 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Human-level Atari 200x faster
Human-level Atari 200x faster
Steven Kapturowski
Victor Campos
Ray Jiang
Nemanja Rakićević
Hado van Hasselt
Charles Blundell
Adria Puigdomenech Badia
OffRL
52
28
0
15 Sep 2022
A model-based approach to meta-Reinforcement Learning: Transformers and
  tree search
A model-based approach to meta-Reinforcement Learning: Transformers and tree search
Brieuc Pinon
Jean-Charles Delvenne
Raphaël Jungers
OffRL
24
3
0
24 Aug 2022
Generalized Policy Improvement Algorithms with Theoretically Supported
  Sample Reuse
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
24
2
0
28 Jun 2022
BYOL-Explore: Exploration by Bootstrapped Prediction
BYOL-Explore: Exploration by Bootstrapped Prediction
Z. Guo
S. Thakoor
Miruna Pislar
Bernardo Avila-Pires
Florent Altché
...
Yunhao Tang
Michal Valko
Rémi Munos
M. G. Azar
Bilal Piot
22
68
0
16 Jun 2022
Intra-agent speech permits zero-shot task acquisition
Intra-agent speech permits zero-shot task acquisition
Chen Yan
Federico Carnevale
Petko Georgiev
Adam Santoro
Aurelia Guy
Alistair Muldal
Chia-Chun Hung
Josh Abramson
Timothy Lillicrap
Greg Wayne
LM&Ro
36
9
0
07 Jun 2022
Critic Sequential Monte Carlo
Critic Sequential Monte Carlo
Vasileios Lioutas
J. Lavington
Justice Sefas
Matthew Niedoba
Yunpeng Liu
Berend Zwartsenberg
Setareh Dabiri
Frank D. Wood
Adam Scibior
44
7
0
30 May 2022
Data augmentation for efficient learning from parametric experts
Data augmentation for efficient learning from parametric experts
Alexandre Galashov
J. Merel
N. Heess
OffRL
14
5
0
23 May 2022
A Generalist Agent
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
56
785
0
12 May 2022
Learning to Constrain Policy Optimization with Virtual Trust Region
Learning to Constrain Policy Optimization with Virtual Trust Region
Hung Le
Thommen Karimpanal George
Majid Abdolshah
D. Nguyen
Kien Do
Sunil R. Gupta
Svetha Venkatesh
28
3
0
20 Apr 2022
JORLDY: a fully customizable open source framework for reinforcement
  learning
JORLDY: a fully customizable open source framework for reinforcement learning
Kyushik Min
Hyunho Lee
Kwansu Shin
Tae-woo Lee
Hojoon Lee
Jinwon Choi
Sung-Hyun Son
OnRL
14
0
0
11 Apr 2022
Imitate and Repurpose: Learning Reusable Robot Movement Skills From
  Human and Animal Behaviors
Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors
Steven Bohez
S. Tunyasuvunakool
Philemon Brakel
Fereshteh Sadeghi
Leonard Hasenclever
...
Nathan Batchelor
Federico Casarini
J. Merel
R. Hadsell
N. Heess
32
51
0
31 Mar 2022
Zipfian environments for Reinforcement Learning
Zipfian environments for Reinforcement Learning
Stephanie C. Y. Chan
Andrew Kyle Lampinen
Pierre Harvey Richemond
Felix Hill
OffRL
13
15
0
15 Mar 2022
A data-driven approach for learning to control computers
A data-driven approach for learning to control computers
Peter C. Humphreys
David Raposo
Tobias Pohlen
Gregory Thornton
Rachita Chhaparia
...
Josh Abramson
Petko Georgiev
Alex Goldin
Adam Santoro
Timothy Lillicrap
25
97
0
16 Feb 2022
Constrained Variational Policy Optimization for Safe Reinforcement
  Learning
Constrained Variational Policy Optimization for Safe Reinforcement Learning
Zuxin Liu
Zhepeng Cen
Vladislav Isenbaev
Wei Liu
Zhiwei Steven Wu
Bo-wen Li
Ding Zhao
14
76
0
28 Jan 2022
How to Learn and Represent Abstractions: An Investigation using Symbolic
  Alchemy
How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy
Badr AlKhamissi
Akshay Srinivasan
Zeb-Kurth Nelson
Samuel Ritter
28
1
0
14 Dec 2021
Towards an Understanding of Default Policies in Multitask Policy
  Optimization
Towards an Understanding of Default Policies in Multitask Policy Optimization
Theodore H. Moskovitz
Michael Arbel
Jack Parker-Holder
Aldo Pacchiano
19
9
0
04 Nov 2021
12
Next