ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.06920
  4. Cited By
Maximum a Posteriori Policy Optimisation

Maximum a Posteriori Policy Optimisation

14 June 2018
A. Abdolmaleki
Jost Tobias Springenberg
Yuval Tassa
Rémi Munos
N. Heess
Martin Riedmiller
ArXivPDFHTML

Papers citing "Maximum a Posteriori Policy Optimisation"

50 / 132 papers shown
Title
Trust-Region Twisted Policy Improvement
Trust-Region Twisted Policy Improvement
Joery A. de Vries
Jinke He
Yaniv Oren
M. Spaan
OffRL
LRM
30
0
0
08 Apr 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
93
0
0
06 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
44
16
0
28 Jan 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
77
3
0
17 Jan 2025
DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with
  multi-fingered robots
DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots
Maria Bauzá
José Enrique Chen
Valentin Dalibard
Nimrod Gileadi
Roland Hafner
...
Martin Riedmiller
Jon Scholz
Konstantinos Bousmalis
Francesco Nori
Nicolas Heess
34
5
0
10 Sep 2024
Game On: Towards Language Models as RL Experimenters
Game On: Towards Language Models as RL Experimenters
Jingwei Zhang
Thomas Lampe
A. Abdolmaleki
Jost Tobias Springenberg
Martin Riedmiller
LM&Ro
36
0
0
05 Sep 2024
Functional Acceleration for Policy Mirror Descent
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
30
0
0
23 Jul 2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not
  mitigate heavy-tailed reward misspecification
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification
Thomas Kwa
Drake Thomas
Adrià Garriga-Alonso
26
1
0
19 Jul 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
59
14
0
24 Jun 2024
Natural Gradient Interpretation of Rank-One Update in CMA-ES
Natural Gradient Interpretation of Rank-One Update in CMA-ES
Ryoki Hamano
Shinichi Shirakawa
Masahiro Nomura
34
0
0
24 Jun 2024
Value Improved Actor Critic Algorithms
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
31
0
0
03 Jun 2024
S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor
  Critic
S2^22AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic
Safa Messaoud
Billel Mokeddem
Zhenghai Xue
L. Pang
Bo An
Haipeng Chen
Sanjay Chawla
41
3
0
02 May 2024
Shared learning of powertrain control policies for vehicle fleets
Shared learning of powertrain control policies for vehicle fleets
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
29
0
0
27 Apr 2024
Model-based Reinforcement Learning for Parameterized Action Spaces
Model-based Reinforcement Learning for Parameterized Action Spaces
Renhao Zhang
Haotian Fu
Yilin Miao
George Konidaris
28
3
0
03 Apr 2024
A Model-Based Approach for Improving Reinforcement Learning Efficiency
  Leveraging Expert Observations
A Model-Based Approach for Improving Reinforcement Learning Efficiency Leveraging Expert Observations
E. C. Ozcan
Vittorio Giammarino
James Queeney
I. Paschalidis
OffRL
36
0
0
29 Feb 2024
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg
A. Abdolmaleki
Jingwei Zhang
Oliver Groth
Michael Bloesch
...
Sarah Bechtle
Steven Kapturowski
Roland Hafner
N. Heess
Martin Riedmiller
OffRL
LRM
27
12
0
08 Feb 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning:
  Theory, Algorithms and Implementations
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
38
0
0
24 Jan 2024
Colored Noise in PPO: Improved Exploration and Performance through
  Correlated Action Sampling
Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
Jakob J. Hollenstein
Georg Martius
J. Piater
16
3
0
18 Dec 2023
Enhancing Robotic Navigation: An Evaluation of Single and
  Multi-Objective Reinforcement Learning Strategies
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies
Vicki Young
Jumman Hossain
Nirmalya Roy
24
1
0
13 Dec 2023
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
Michal Nauman
Marek Cygan
35
1
0
30 Oct 2023
Boosting Continuous Control with Consistency Policy
Boosting Continuous Control with Consistency Policy
Yuhui Chen
Haoran Li
Dongbin Zhao
OffRL
41
20
0
10 Oct 2023
Confronting Reward Model Overoptimization with Constrained RLHF
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
T. Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
34
47
0
06 Oct 2023
Simplified Temporal Consistency Reinforcement Learning
Simplified Temporal Consistency Reinforcement Learning
Yi Zhao
Wenshuai Zhao
Rinu Boney
Arno Solin
Joni Pajarinen
OffRL
30
12
0
15 Jun 2023
Coherent Soft Imitation Learning
Coherent Soft Imitation Learning
Joe Watson
Sandy H. Huang
Nicholas Heess
32
11
0
25 May 2023
Constrained Proximal Policy Optimization
Constrained Proximal Policy Optimization
Chengbin Xuan
Feng Zhang
Faliang Yin
H. Lam
16
0
0
23 May 2023
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement
  Learning
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
Tuomas Haarnoja
Ben Moran
Guy Lever
Sandy H. Huang
Dhruva Tirumala
...
Andrea Huber
N. Hurley
F. Nori
R. Hadsell
N. Heess
39
140
0
26 Apr 2023
Latent-Conditioned Policy Gradient for Multi-Objective Deep
  Reinforcement Learning
Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning
T. Kanazawa
Chetan Gupta
26
0
0
15 Mar 2023
Optimal Transport Perturbations for Safe Reinforcement Learning with
  Robustness Guarantees
Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees
James Queeney
E. C. Ozcan
I. Paschalidis
Christos G. Cassandras
OOD
OffRL
31
5
0
31 Jan 2023
Risk-Averse Model Uncertainty for Distributionally Robust Safe
  Reinforcement Learning
Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning
James Queeney
M. Benosman
OOD
OffRL
33
5
0
30 Jan 2023
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance
  System
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System
Habtamu Hailemichael
B. Ayalew
Lindsey Kerbel
Andrej Ivanco
K. Loiselle
14
4
0
03 Jan 2023
Policy Optimization to Learn Adaptive Motion Primitives in Path Planning
  with Dynamic Obstacles
Policy Optimization to Learn Adaptive Motion Primitives in Path Planning with Dynamic Obstacles
Brian Angulo
Aleksandr I. Panov
Konstantin Yakovlev
16
12
0
29 Dec 2022
Driver Assistance Eco-driving and Transmission Control with Deep
  Reinforcement Learning
Driver Assistance Eco-driving and Transmission Control with Deep Reinforcement Learning
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
K. Loiselle
OffRL
6
8
0
15 Dec 2022
SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended
  Exploration
SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration
Giulia Vezzani
Dhruva Tirumala
Markus Wulfmeier
Dushyant Rao
A. Abdolmaleki
...
Tim Hertweck
Thomas Lampe
Fereshteh Sadeghi
N. Heess
Martin Riedmiller
OffRL
33
6
0
24 Nov 2022
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness
  to Model Misspecification
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
Takumi Tanabe
Reimi Sato
Kazuto Fukuchi
Jun Sakuma
Youhei Akimoto
OffRL
24
8
0
07 Nov 2022
Solving Continuous Control via Q-learning
Solving Continuous Control via Q-learning
Tim Seyde
Peter Werner
Wilko Schwarting
Igor Gilitschenski
Martin Riedmiller
Daniela Rus
Markus Wulfmeier
OffRL
LRM
35
22
0
22 Oct 2022
Bridging the Gap Between Target Networks and Functional Regularization
Alexandre Piché
Valentin Thomas
Joseph Marino
Rafael Pardiñas
Gian Maria Marconi
C. Pal
Mohammad Emtiyaz Khan
14
1
0
21 Oct 2022
Adaptive patch foraging in deep reinforcement learning agents
Adaptive patch foraging in deep reinforcement learning agents
Nathan J. Wispinski
Andrew Butcher
K. Mathewson
Craig S. Chapman
M. Botvinick
P. Pilarski
16
8
0
14 Oct 2022
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing
  Local and Remote Computers
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers
Yan Wang
G. Vasan
A. R. Mahmood
36
15
0
05 Oct 2022
Revisiting Discrete Soft Actor-Critic
Revisiting Discrete Soft Actor-Critic
Haibin Zhou
Zichuan Lin
Junyou Li
Qiang Fu
Wei Yang
Deheng Ye
46
12
0
21 Sep 2022
Age of Semantics in Cooperative Communications: To Expedite Simulation
  Towards Real via Offline Reinforcement Learning
Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning
Xianfu Chen
Zhifeng Zhao
S. Mao
Celimuge Wu
Honggang Zhang
M. Bennis
OffRL
20
3
0
19 Sep 2022
Minimum Description Length Control
Minimum Description Length Control
Theodore H. Moskovitz
Ta-Chu Kao
M. Sahani
M. Botvinick
26
1
0
17 Jul 2022
Offline Equilibrium Finding
Offline Equilibrium Finding
Shuxin Li
Xinrun Wang
Youzhi Zhang
Jakub Cerny
Pengdeng Li
Hau Chan
Bo An
OffRL
43
2
0
12 Jul 2022
Overcoming the Spectral Bias of Neural Value Approximation
Overcoming the Spectral Bias of Neural Value Approximation
Ge Yang
Anurag Ajay
Pulkit Agrawal
32
25
0
09 Jun 2022
Critic Sequential Monte Carlo
Critic Sequential Monte Carlo
Vasileios Lioutas
J. Lavington
Justice Sefas
Matthew Niedoba
Yunpeng Liu
Berend Zwartsenberg
Setareh Dabiri
Frank D. Wood
Adam Scibior
44
7
0
30 May 2022
DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated
  and Musculoskeletal Systems
DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems
Pierre Schumacher
Daniel Haeufle
Le Chen
Syn Schmitt
Georg Martius
17
31
0
30 May 2022
Regret-Aware Black-Box Optimization with Natural Gradients,
  Trust-Regions and Entropy Control
Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control
Maximilian Hüttenrauch
Gerhard Neumann
11
1
0
24 May 2022
A Generalist Agent
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
56
787
0
12 May 2022
How to Spend Your Robot Time: Bridging Kickstarting and Offline
  Reinforcement Learning for Vision-based Robotic Manipulation
How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation
Alex X. Lee
Coline Devin
Jost Tobias Springenberg
Yuxiang Zhou
Thomas Lampe
A. Abdolmaleki
Konstantinos Bousmalis
OffRL
OnRL
16
15
0
06 May 2022
Revisiting Gaussian mixture critics in off-policy reinforcement
  learning: a sample-based approach
Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Bobak Shahriari
A. Abdolmaleki
Arunkumar Byravan
A. Friesen
Siqi Liu
Jost Tobias Springenberg
N. Heess
Matthew W. Hoffman
Martin Riedmiller
OffRL
41
9
0
21 Apr 2022
Learning to Constrain Policy Optimization with Virtual Trust Region
Learning to Constrain Policy Optimization with Virtual Trust Region
Hung Le
Thommen Karimpanal George
Majid Abdolshah
D. Nguyen
Kien Do
Sunil R. Gupta
Svetha Venkatesh
28
3
0
20 Apr 2022
123
Next