Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.09814
Cited By
Mirror Descent Policy Optimization
20 May 2020
Manan Tomar
Lior Shani
Yonathan Efroni
Mohammad Ghavamzadeh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mirror Descent Policy Optimization"
50 / 58 papers shown
Title
FedDuA: Doubly Adaptive Federated Learning
Shokichi Takakura
Seng Pei Liew
Satoshi Hasegawa
FedML
12
0
0
16 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
58
1
0
30 Apr 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
83
31
0
20 Mar 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
93
0
0
06 Feb 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Zhengyuan Yang
VLM
ALM
OffRL
AI4TS
LRM
117
150
0
22 Jan 2025
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic
Johannes Müller
Nico Scherf
25
1
0
05 Nov 2024
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang
Chengdong Ma
Qizhi Chen
Linjian Meng
Yang Han
Jiancong Xiao
Zhaowei Zhang
Jing Huo
Weijie Su
Yaodong Yang
32
4
0
22 Oct 2024
Dual Approximation Policy Optimization
Zhihan Xiong
Maryam Fazel
Lin Xiao
30
1
0
02 Oct 2024
Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation
Jean Seong Bjorn Choe
Jong-Kook Kim
46
2
0
25 Jul 2024
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
30
0
0
23 Jul 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
62
14
0
24 Jun 2024
Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees
Yilei Chen
Vittorio Giammarino
James Queeney
I. Paschalidis
31
0
0
26 May 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
49
26
0
23 May 2024
Configurable Mirror Descent: Towards a Unification of Decision Making
Pengdeng Li
Shuxin Li
Chang Yang
Xinrun Wang
Shuyue Hu
Xiao Huang
Hau Chan
Bo An
36
1
0
20 May 2024
Policy Mirror Descent with Lookahead
Kimon Protopapas
Anas Barakat
29
1
0
21 Mar 2024
Learning mirror maps in policy mirror descent
Carlo Alfano
Sebastian Towers
Silvia Sapora
Chris Xiaoxuan Lu
Patrick Rebeschini
32
0
0
07 Feb 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
14
1
0
23 Jan 2024
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction
Jie Feng
Ke Wei
Jinchi Chen
36
1
0
02 Jan 2024
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods
Zhengpeng Xie
Changdong Yu
Weizheng Qiao
29
1
0
31 Oct 2023
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
Yufei Zhang
29
7
0
04 Oct 2023
Vision Transformer Adapters for Generalizable Multitask Learning
Deblina Bhattacharjee
Sabine Süsstrunk
Mathieu Salzmann
ViT
21
8
0
23 Aug 2023
Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization
Mohammad Mehdi Nasiri
M. Rezghi
38
0
0
13 Aug 2023
Acceleration in Policy Optimization
Veronica Chelu
Tom Zahavy
A. Guez
Doina Precup
Sebastian Flennerhag
45
0
0
18 Jun 2023
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Paul Roit
Johan Ferret
Lior Shani
Roee Aharoni
Geoffrey Cideron
...
Olivier Bachem
G. Elidan
Avinatan Hassidim
Olivier Pietquin
Idan Szpektor
HILM
28
77
0
31 May 2023
Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Sharan Vaswani
A. Kazemi
Reza Babanezhad
Nicolas Le Roux
OffRL
32
3
0
24 May 2023
Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization
Zichuan Lin
Xiapeng Wu
Mingfei Sun
Deheng Ye
Qiang Fu
Wei Yang
Wei Liu
18
3
0
05 Feb 2023
ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
Theodore H. Moskovitz
Brendan O'Donoghue
Vivek Veeriah
Sebastian Flennerhag
Satinder Singh
Tom Zahavy
47
19
0
02 Feb 2023
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano
Rui Yuan
Patrick Rebeschini
65
15
0
30 Jan 2023
Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games
Batuhan Yardim
Semih Cayci
M. Geist
Niao He
53
27
0
29 Dec 2022
Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Dong-Sig Han
Hyunseok Kim
Hyun-Dong Lee
Je-hwan Ryu
Byoung-Tak Zhang
28
2
0
20 Oct 2022
Entropy Augmented Reinforcement Learning
Jianfei Ma
30
0
0
19 Aug 2022
How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies
E. Small
Wei Shao
Zeliang Zhang
Peihan Liu
Jeffrey Chan
Kacper Sokol
Flora D. Salim
60
2
0
11 Jul 2022
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
Samuel Sokota
Ryan DÓrazio
J. Zico Kolter
Nicolas Loizou
Marc Lanctot
Ioannis Mitliagkas
Noam Brown
Christian Kroer
23
1
0
12 Jun 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
32
8
0
20 May 2022
Learning to Constrain Policy Optimization with Virtual Trust Region
Hung Le
Thommen Karimpanal George
Majid Abdolshah
D. Nguyen
Kien Do
Sunil R. Gupta
Svetha Venkatesh
28
3
0
20 Apr 2022
You May Not Need Ratio Clipping in PPO
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
13
15
0
31 Jan 2022
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces
Amrit Singh Bedi
Souradip Chakraborty
Anjaly Parayil
Brian M Sadler
Pratap Tokekar
Alec Koppel
43
17
0
28 Jan 2022
Mirror Learning: A Unifying Framework of Policy Optimisation
J. Kuba
Christian Schroeder de Witt
Jakob N. Foerster
23
24
0
07 Jan 2022
Faster Deep Reinforcement Learning with Slower Online Network
Kavosh Asadi
Rasool Fakoor
Omer Gottesman
Taesup Kim
Michael L. Littman
Alexander J. Smola
OnRL
11
6
0
10 Dec 2021
Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL
Aarush Gupta
25
0
0
23 Oct 2021
Approximate Newton policy gradient algorithms
Haoya Li
Samarth Gupta
Hsiangfu Yu
Lexing Ying
Inderjit Dhillon
51
2
0
05 Oct 2021
Batch size-invariance for policy optimization
Jacob Hilton
K. Cobbe
John Schulman
17
11
0
01 Oct 2021
Bootstrapped Meta-Learning
Sebastian Flennerhag
Yannick Schroecker
Tom Zahavy
Hado van Hasselt
David Silver
Satinder Singh
38
59
0
09 Sep 2021
A general class of surrogate functions for stable and efficient reinforcement learning
Sharan Vaswani
Olivier Bachem
Simone Totaro
Robert Mueller
Shivam Garg
M. Geist
Marlos C. Machado
Pablo Samuel Castro
Nicolas Le Roux
OffRL
32
15
0
12 Aug 2021
A general sample complexity analysis of vanilla policy gradient
Rui Yuan
Robert Mansel Gower
A. Lazaric
76
62
0
23 Jul 2021
Bregman Gradient Policy Optimization
Feihu Huang
Shangqian Gao
Heng-Chiao Huang
25
16
0
23 Jun 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Amrit Singh Bedi
Anjaly Parayil
Junyu Zhang
Mengdi Wang
Alec Koppel
30
15
0
15 Jun 2021
Reward is enough for convex MDPs
Tom Zahavy
Brendan O'Donoghue
Guillaume Desjardins
Satinder Singh
72
72
0
01 Jun 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
Wenhao Zhan
Shicong Cen
Baihe Huang
Yuxin Chen
Jason D. Lee
Yuejie Chi
19
76
0
24 May 2021
Muesli: Combining Improvements in Policy Optimization
Matteo Hessel
Ivo Danihelka
Fabio Viola
A. Guez
Simon Schmitt
Laurent Sifre
T. Weber
David Silver
H. V. Hasselt
16
66
0
13 Apr 2021
1
2
Next