ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.06440
  4. Cited By
Equivalence Between Policy Gradients and Soft Q-Learning

Equivalence Between Policy Gradients and Soft Q-Learning

21 April 2017
John Schulman
Xi Chen
Pieter Abbeel
    OffRL
ArXivPDFHTML

Papers citing "Equivalence Between Policy Gradients and Soft Q-Learning"

50 / 66 papers shown
Title
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
Axel Friedrich Wolter
Tobias Sutter
OffRL
37
0
0
07 May 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Tianyi Zhou
Jieyu Zhao
LRM
78
1
0
07 Apr 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
39
16
0
28 Jan 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Value Improved Actor Critic Algorithms
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
31
0
0
03 Jun 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
22
0
29 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Reinforcing Language Agents via Policy Optimization with Action
  Decomposition
Reinforcing Language Agents via Policy Optimization with Action Decomposition
Muning Wen
Ziyu Wan
Weinan Zhang
Jun Wang
Ying Wen
43
7
0
23 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
28
0
0
25 Apr 2024
Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
Koshi Oishi
Yota Hashizume
Tomohiko Jimbo
Hirotaka Kaji
Kenji Kashima
OOD
40
2
0
28 Feb 2024
Reinforcement Learning in the Era of LLMs: What is Essential? What is
  needed? An RL Perspective on RLHF, Prompting, and Beyond
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond
Hao Sun
OffRL
34
21
0
09 Oct 2023
Fairness in Preference-based Reinforcement Learning
Fairness in Preference-based Reinforcement Learning
Umer Siddique
Abhinav Sinha
Yongcan Cao
11
4
0
16 Jun 2023
Policy Representation via Diffusion Probability Model for Reinforcement
  Learning
Policy Representation via Diffusion Probability Model for Reinforcement Learning
Long Yang
Zhixiong Huang
Fenghao Lei
Yucun Zhong
Yiming Yang
Cong Fang
Shiting Wen
Binbin Zhou
Zhouchen Lin
DiffM
28
39
0
22 May 2023
Efficient Quality-Diversity Optimization through Diverse Quality Species
Efficient Quality-Diversity Optimization through Diverse Quality Species
Ryan Wickman
Bibek Poudel
Taylor Michael Villarreal
Xiaofei Zhang
Weizi Li
23
6
0
14 Apr 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and
  Global Optimality
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
23
0
0
22 Mar 2023
Fast Rates for Maximum Entropy Exploration
Fast Rates for Maximum Entropy Exploration
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Pierre Perrault
Yunhao Tang
Michal Valko
Pierre Menard
41
17
0
14 Mar 2023
Inference on Optimal Dynamic Policies via Softmax Approximation
Inference on Optimal Dynamic Policies via Softmax Approximation
Qizhao Chen
Morgane Austern
Vasilis Syrgkanis
OffRL
29
1
0
08 Mar 2023
Model-based Constrained MDP for Budget Allocation in Sequential
  Incentive Marketing
Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing
Shuai Xiao
Le Guo
Zaifan Jiang
Lei Lv
Yuanbo Chen
Jun Zhu
Shuang Yang
19
21
0
02 Mar 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
37
5
0
05 Feb 2023
A general Markov decision process formalism for action-state
  entropy-regularized reward maximization
A general Markov decision process formalism for action-state entropy-regularized reward maximization
D. Grytskyy
Jorge Ramírez-Ruiz
R. Moreno-Bote
22
3
0
02 Feb 2023
An Efficient Solution to s-Rectangular Robust Markov Decision Processes
An Efficient Solution to s-Rectangular Robust Markov Decision Processes
Navdeep Kumar
Kfir Y. Levy
Kaixin Wang
Shie Mannor
36
1
0
31 Jan 2023
Policy Gradient for Rectangular Robust Markov Decision Processes
Policy Gradient for Rectangular Robust Markov Decision Processes
Navdeep Kumar
E. Derman
M. Geist
Kfir Y. Levy
Shie Mannor
18
19
0
31 Jan 2023
Cognitive Level-$k$ Meta-Learning for Safe and Pedestrian-Aware
  Autonomous Driving
Cognitive Level-kkk Meta-Learning for Safe and Pedestrian-Aware Autonomous Driving
Haozhe Lei
Quanyan Zhu
20
0
0
17 Dec 2022
MAN: Multi-Action Networks Learning
MAN: Multi-Action Networks Learning
Keqin Wang
Alison Bartsch
A. Farimani
16
3
0
19 Sep 2022
Entropy Augmented Reinforcement Learning
Entropy Augmented Reinforcement Learning
Jianfei Ma
28
0
0
19 Aug 2022
Minimum Description Length Control
Minimum Description Length Control
Theodore H. Moskovitz
Ta-Chu Kao
M. Sahani
M. Botvinick
20
1
0
17 Jul 2022
q-Learning in Continuous Time
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
45
67
0
02 Jul 2022
Intra-agent speech permits zero-shot task acquisition
Intra-agent speech permits zero-shot task acquisition
Chen Yan
Federico Carnevale
Petko Georgiev
Adam Santoro
Aurelia Guy
Alistair Muldal
Chia-Chun Hung
Josh Abramson
Timothy Lillicrap
Greg Wayne
LM&Ro
36
9
0
07 Jun 2022
Policy Gradient and Actor-Critic Learning in Continuous Time and Space:
  Theory and Algorithms
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Yanwei Jia
X. Zhou
OffRL
30
78
0
22 Nov 2021
Towards an Understanding of Default Policies in Multitask Policy
  Optimization
Towards an Understanding of Default Policies in Multitask Policy Optimization
Theodore H. Moskovitz
Michael Arbel
Jack Parker-Holder
Aldo Pacchiano
17
9
0
04 Nov 2021
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via
  pT-Learning
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Wenzhuo Zhou
Ruoqing Zhu
A. Qu
27
22
0
20 Oct 2021
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with
  On-Policy Experience
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience
C. Banerjee
Zhiyong Chen
N. Noman
11
30
0
24 Sep 2021
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic
  Reinforcement Learning and Global Convergence of Policy Gradient Methods
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
Xin Guo
Anran Hu
Junzi Zhang
OffRL
16
6
0
13 Sep 2021
OptiDICE: Offline Policy Optimization via Stationary Distribution
  Correction Estimation
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
Jongmin Lee
Wonseok Jeon
Byung-Jun Lee
J. Pineau
Kee-Eung Kim
OffRL
26
90
0
21 Jun 2021
Characterizing the Gap Between Actor-Critic and Policy Gradient
Characterizing the Gap Between Actor-Critic and Policy Gradient
Junfeng Wen
Saurabh Kumar
Ramki Gummadi
Dale Schuurmans
24
14
0
13 Jun 2021
A New Formalism, Method and Open Issues for Zero-Shot Coordination
A New Formalism, Method and Open Issues for Zero-Shot Coordination
Johannes Treutlein
Michael Dennis
Caspar Oesterheld
Jakob N. Foerster
OffRL
21
35
0
11 Jun 2021
An Entropy Regularization Free Mechanism for Policy-based Reinforcement
  Learning
An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning
Changnan Xiao
Haosen Shi
Jiajun Fan
Shihong Deng
18
5
0
01 Jun 2021
Hierarchical Reinforcement Learning for Air-to-Air Combat
Hierarchical Reinforcement Learning for Air-to-Air Combat
Adrian P. Pope
J. Ide
Daria Mićović
Henry Diaz
D. Rosenbluth
Lee Ritholtz
Jason C. Twedt
Thayne T. Walker
K. Alcedo
D. Javorsek
17
72
0
03 May 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Erik Cambria
OffRL
44
73
0
01 Jan 2021
A Tutorial on Sparse Gaussian Processes and Variational Inference
A Tutorial on Sparse Gaussian Processes and Variational Inference
Felix Leibfried
Vincent Dutordoir
S. T. John
N. Durrande
GP
32
49
0
27 Dec 2020
Behavior Priors for Efficient Reinforcement Learning
Behavior Priors for Efficient Reinforcement Learning
Dhruva Tirumala
Alexandre Galashov
Hyeonwoo Noh
Leonard Hasenclever
Razvan Pascanu
...
Guillaume Desjardins
Wojciech M. Czarnecki
Arun Ahuja
Yee Whye Teh
N. Heess
37
39
0
27 Oct 2020
Sample Efficient Reinforcement Learning with REINFORCE
Sample Efficient Reinforcement Learning with REINFORCE
Junzi Zhang
Jongho Kim
Brendan O'Donoghue
Stephen P. Boyd
37
99
0
22 Oct 2020
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
  Reinforcement Learning
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Kimin Lee
Michael Laskin
A. Srinivas
Pieter Abbeel
OffRL
11
199
0
09 Jul 2020
Can Temporal-Difference and Q-Learning Learn Representation? A
  Mean-Field Theory
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
Yufeng Zhang
Qi Cai
Zhuoran Yang
Yongxin Chen
Zhaoran Wang
OOD
MLT
78
11
0
08 Jun 2020
Leverage the Average: an Analysis of KL Regularization in RL
Leverage the Average: an Analysis of KL Regularization in RL
Nino Vieillard
Tadashi Kozuno
B. Scherrer
Olivier Pietquin
Rémi Munos
M. Geist
17
42
0
31 Mar 2020
Off-Policy Deep Reinforcement Learning with Analogous Disentangled
  Exploration
Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration
Anji Liu
Yitao Liang
Guy Van den Broeck
OffRL
12
3
0
25 Feb 2020
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for
  Addressing Value Estimation Errors
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Jingliang Duan
Yang Guan
Shengbo Eben Li
Yangang Ren
B. Cheng
OffRL
17
173
0
09 Jan 2020
A Survey of Deep Reinforcement Learning in Video Games
A Survey of Deep Reinforcement Learning in Video Games
Kun Shao
Zhentao Tang
Yuanheng Zhu
Nannan Li
Dongbin Zhao
OffRL
AI4TS
34
188
0
23 Dec 2019
Direct and indirect reinforcement learning
Direct and indirect reinforcement learning
Yang Guan
Shengbo Eben Li
Jingliang Duan
Jie Li
Yangang Ren
Qi Sun
B. Cheng
OffRL
30
34
0
23 Dec 2019
A Regularized Opponent Model with Maximum Entropy Objective
A Regularized Opponent Model with Maximum Entropy Objective
Zheng Tian
Ying Wen
Zhichen Gong
Faiz Punakkath
Shihao Zou
Jun Wang
22
30
0
17 May 2019
12
Next