ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.01626
  4. Cited By
Combining policy gradient and Q-learning

Combining policy gradient and Q-learning

5 November 2016
Brendan O'Donoghue
Rémi Munos
Koray Kavukcuoglu
Volodymyr Mnih
    OffRL
    OnRL
ArXivPDFHTML

Papers citing "Combining policy gradient and Q-learning"

50 / 90 papers shown
Title
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
Chen-Hao Chao
Chien Feng
Wei-Fang Sun
Cheng-Kuang Lee
Simon See
Chun-Yi Lee
41
1
0
22 May 2024
On-Policy Policy Gradient Reinforcement Learning Without On-Policy
  Sampling
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Nicholas Corrado
Josiah P. Hanna
OffRL
20
1
0
14 Nov 2023
Inverse Decision Modeling: Learning Interpretable Representations of
  Behavior
Inverse Decision Modeling: Learning Interpretable Representations of Behavior
Daniel Jarrett
Alihan Huyuk
M. Schaar
AI4CE
17
27
0
28 Oct 2023
Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous
  Control with Discrete RL
Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous Control with Discrete RL
Ye Zhang
Jian Sun
G. Wang
Zhuoxian Li
Wei Chen
OffRL
21
0
0
20 Aug 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and
  Global Optimality
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
33
0
0
22 Mar 2023
Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
Brendan O'Donoghue
OffRL
35
6
0
18 Feb 2023
Distillation Policy Optimization
Distillation Policy Optimization
Jianfei Ma
OffRL
26
1
0
01 Feb 2023
Extending Open Bandit Pipeline to Simulate Industry Challenges
Extending Open Bandit Pipeline to Simulate Industry Challenges
Bram van den Akker
N. Weber
Felipe Moraes
Dmitri Goldenberg
OffRL
18
1
0
09 Sep 2022
A Parametric Class of Approximate Gradient Updates for Policy
  Optimization
A Parametric Class of Approximate Gradient Updates for Policy Optimization
Ramki Gummadi
Saurabh Kumar
Junfeng Wen
Dale Schuurmans
26
0
0
17 Jun 2022
Reinforcement Learning for Navigation of Mobile Robot with LiDAR
Reinforcement Learning for Navigation of Mobile Robot with LiDAR
Inhwan Kim
S. Nengroo
Dongsoo Har
27
13
0
06 Dec 2021
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka
Tim Welschehold
Joschka Boedecker
Wolfram Burgard
OffRL
30
9
0
24 Nov 2021
Generalized Proximal Policy Optimization with Sample Reuse
Generalized Proximal Policy Optimization with Sample Reuse
James Queeney
I. Paschalidis
Christos G. Cassandras
OffRL
32
47
0
29 Oct 2021
Variational Bayesian Optimistic Sampling
Variational Bayesian Optimistic Sampling
Brendan O'Donoghue
Tor Lattimore
7
6
0
29 Oct 2021
Greedification Operators for Policy Optimization: Investigating Forward
  and Reverse KL Divergences
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Alan Chan
Hugo Silva
Sungsu Lim
Tadashi Kozuno
A. R. Mahmood
Martha White
25
29
0
17 Jul 2021
A Max-Min Entropy Framework for Reinforcement Learning
A Max-Min Entropy Framework for Reinforcement Learning
Seungyul Han
Y. Sung
30
20
0
19 Jun 2021
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement
  Learning
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning
Jie Ren
Yewen Li
Zihan Ding
Wei Pan
Hao Dong
BDL
MoE
21
25
0
19 Apr 2021
A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular
  Control
A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control
Zahra Gharaee
Karl Holmquist
Linbo He
M. Felsberg
BDL
17
4
0
08 Apr 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
47
73
0
01 Jan 2021
Reinforcement Learning for Robust Missile Autopilot Design
Reinforcement Learning for Robust Missile Autopilot Design
Bernardo Cortez
11
2
0
26 Nov 2020
Weighted Entropy Modification for Soft Actor-Critic
Weighted Entropy Modification for Soft Actor-Critic
Yizhou Zhao
Song-Chun Zhu
16
0
0
18 Nov 2020
Average-reward model-free reinforcement learning: a systematic review
  and literature mapping
Average-reward model-free reinforcement learning: a systematic review and literature mapping
Vektor Dewanto
George Dunn
A. Eshragh
M. Gallagher
Fred Roosta
14
28
0
18 Oct 2020
Energy-based Surprise Minimization for Multi-Agent Value Factorization
Energy-based Surprise Minimization for Multi-Agent Value Factorization
Karush Suri
Xiaolong Shi
Konstantinos Plataniotis
Y. Lawryshyn
16
1
0
16 Sep 2020
Visualizing the Loss Landscape of Actor Critic Methods with Applications
  in Inventory Optimization
Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization
Recep Yusuf Bekci
M. Gümüş
12
4
0
04 Sep 2020
Monte-Carlo Tree Search as Regularized Policy Optimization
Monte-Carlo Tree Search as Regularized Policy Optimization
Jean-Bastien Grill
Florent Altché
Yunhao Tang
Thomas Hubert
Michal Valko
Ioannis Antonoglou
Rémi Munos
27
73
0
24 Jul 2020
Matrix games with bandit feedback
Matrix games with bandit feedback
Brendan O'Donoghue
Tor Lattimore
Ian Osband
6
8
0
09 Jun 2020
Can Temporal-Difference and Q-Learning Learn Representation? A
  Mean-Field Theory
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
Yufeng Zhang
Qi Cai
Zhuoran Yang
Yongxin Chen
Zhaoran Wang
OOD
MLT
123
11
0
08 Jun 2020
Diversity Actor-Critic: Sample-Aware Entropy Regularization for
  Sample-Efficient Exploration
Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration
Seungyul Han
Y. Sung
6
24
0
02 Jun 2020
Deep Reinforcement Learning for Intelligent Transportation Systems: A
  Survey
Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey
Ammar Haydari
Y. Yilmaz
AI4TS
20
453
0
02 May 2020
First return, then explore
First return, then explore
Adrien Ecoffet
Joost Huizinga
Joel Lehman
Kenneth O. Stanley
Jeff Clune
47
350
0
27 Apr 2020
Comprehensive Review of Deep Reinforcement Learning Methods and
  Applications in Economics
Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics
Amir H. Mosavi
Pedram Ghamisi
Yaser Faghan
Puhong Duan
OffRL
16
152
0
21 Mar 2020
Review, Analysis and Design of a Comprehensive Deep Reinforcement
  Learning Framework
Review, Analysis and Design of a Comprehensive Deep Reinforcement Learning Framework
Ngoc Duy Nguyen
Thanh Thi Nguyen
Hai V. Nguyen
Doug Creighton
S. Nahavandi
38
3
0
27 Feb 2020
BRPO: Batch Residual Policy Optimization
BRPO: Batch Residual Policy Optimization
Kentaro Kanamori
Yinlam Chow
Takuya Takagi
Hiroki Arimura
Honglak Lee
Ken Kobayashi
Craig Boutilier
OffRL
141
46
0
08 Feb 2020
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for
  Addressing Value Estimation Errors
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Jingliang Duan
Yang Guan
Shengbo Eben Li
Yangang Ren
B. Cheng
OffRL
22
173
0
09 Jan 2020
Making Sense of Reinforcement Learning and Probabilistic Inference
Making Sense of Reinforcement Learning and Probabilistic Inference
Brendan O'Donoghue
Ian Osband
Catalin Ionescu
OffRL
24
47
0
03 Jan 2020
A Survey of Deep Reinforcement Learning in Video Games
A Survey of Deep Reinforcement Learning in Video Games
Kun Shao
Zhentao Tang
Yuanheng Zhu
Nannan Li
Dongbin Zhao
OffRL
AI4TS
43
188
0
23 Dec 2019
Direct and indirect reinforcement learning
Direct and indirect reinforcement learning
Yang Guan
Shengbo Eben Li
Jingliang Duan
Jie Li
Yangang Ren
Qi Sun
B. Cheng
OffRL
38
34
0
23 Dec 2019
Merging Deterministic Policy Gradient Estimations with Varied
  Bias-Variance Tradeoff for Effective Deep Reinforcement Learning
Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning
Gang Chen
22
4
0
24 Nov 2019
Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function
  Approximation
Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation
Shangtong Zhang
Bo Liu
Hengshuai Yao
Shimon Whiteson
OffRL
21
8
0
11 Nov 2019
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement
  Learning
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
Wenjie Shi
Shiji Song
Cheng Wu
17
36
0
07 Sep 2019
A Convergence Result for Regularized Actor-Critic Methods
A Convergence Result for Regularized Actor-Critic Methods
Wesley A Suttle
Zhuoran Yang
Kaipeng Zhang
Ji Liu
9
0
0
13 Jul 2019
Ranking Policy Gradient
Ranking Policy Gradient
Kaixiang Lin
Jiayu Zhou
OffRL
11
7
0
24 Jun 2019
Epistemic Risk-Sensitive Reinforcement Learning
Epistemic Risk-Sensitive Reinforcement Learning
Hannes Eriksson
Christos Dimitrakakis
19
29
0
14 Jun 2019
Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum
  Linear Quadratic Games
Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games
Kaipeng Zhang
Zhuoran Yang
Tamer Basar
27
125
0
31 May 2019
P3O: Policy-on Policy-off Policy Optimization
P3O: Policy-on Policy-off Policy Optimization
Rasool Fakoor
Pratik Chaudhari
Alex Smola
OffRL
20
51
0
05 May 2019
Similarities between policy gradient methods (PGM) in Reinforcement
  learning (RL) and supervised learning (SL)
Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)
Eric Benhamou
OffRL
14
1
0
12 Apr 2019
Generalized Off-Policy Actor-Critic
Generalized Off-Policy Actor-Critic
Shangtong Zhang
Wendelin Bohmer
Shimon Whiteson
OffRL
CML
14
43
0
27 Mar 2019
Reinforcement Learning Based Text Style Transfer without Parallel
  Training Corpus
Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus
Hongyu Gong
S. Bhat
Lingfei Wu
Jinjun Xiong
Wen-mei W. Hwu
OffRL
34
93
0
26 Mar 2019
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy
  Critics
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Denis Steckelmacher
Hélène Plisnier
D. Roijers
A. Nowé
OffRL
23
17
0
11 Mar 2019
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General
  Entropy and Effective Environment Exploration in Deep Reinforcement Learning
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning
Gang Chen
Yiming Peng
14
8
0
14 Feb 2019
A Bandit Framework for Optimal Selection of Reinforcement Learning
  Agents
A Bandit Framework for Optimal Selection of Reinforcement Learning Agents
A. Merentitis
Kashif Rasul
Roland Vollgraf
Abdul-Saboor Sheikh
Urs M. Bergmann
14
2
0
10 Feb 2019
12
Next