Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1702.08892
Cited By
Bridging the Gap Between Value and Policy Based Reinforcement Learning
28 February 2017
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bridging the Gap Between Value and Policy Based Reinforcement Learning"
50 / 97 papers shown
Title
Is there Value in Reinforcement Learning?
Lior Fox
Y. Loewenstein
OffRL
64
0
0
07 May 2025
Generative Auto-Bidding with Value-Guided Explorations
Jingtong Gao
Yewen Li
Shuai Mao
Peng Jiang
Nan Jiang
...
Fei Pan
Peng Jiang
Kun Gai
Bo An
Xiangyu Zhao
OffRL
46
0
0
20 Apr 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
44
2
0
25 Mar 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
65
24
0
20 Feb 2025
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun
Dinghuai Zhang
Jinkyoo Park
Ling Pan
DiffM
84
2
0
17 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
47
16
0
28 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Zhiyong Yang
VLM
ALM
OffRL
AI4TS
LRM
120
163
0
22 Jan 2025
Genetic-guided GFlowNets for Sample Efficient Molecular Optimization
Hyeon-Seob Kim
Minsu Kim
Sanghyeok Choi
Jinkyoo Park
53
3
0
31 Dec 2024
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
21
0
0
20 Oct 2024
U-net based prediction of cerebrospinal fluid distribution and ventricular reflux grading
Melanie Rieff
Fabian Holzberger
Oksana Lapina
Geir Ringstad
Lars Magnus Valnes
Bogna Warsza
Kent-Andre Mardal
Per Kristian Eide
Barbara Wohlmuth
41
0
0
06 Oct 2024
Decoupling regularization from the action space
Sobhan Mohammadpour
Emma Frejinger
Pierre-Luc Bacon
37
0
0
10 Jun 2024
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
33
0
0
03 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
46
2
0
30 May 2024
Almost sure convergence rates of stochastic gradient methods under gradient domination
Simon Weissmann
Sara Klein
Waïss Azizian
Leif Döring
39
3
0
22 May 2024
BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions
Xiao Liu
Jie Zhao
Wubing Chen
Mao Tan
Yongxin Su
OffRL
FAtt
33
0
0
14 Jan 2024
Robotic Control of the Deformation of Soft Linear Objects Using Deep Reinforcement Learning
Mélodie Hani Daniel Zakaria
Miguel Aranda
Laurent Lequievre
S. Lengagne
J. Corrales
Y. Mezouar
AI4CE
20
6
0
08 Dec 2023
A Large Deviations Perspective on Policy Gradient Algorithms
Wouter Jongeneel
Daniel Kuhn
Mengmeng Li
31
1
0
13 Nov 2023
Amortizing intractable inference in large language models
Marvin Schmitt
Moksh Jain
Daniel Habermann
Younesse Kaddar
Ullrich Kothe
Stefan T. Radev
Nikolay Malkin
AIFin
BDL
32
47
0
06 Oct 2023
Deep reinforcement learning for process design: Review and perspective
Qitong Gao
Artur M. Schweidtmann
AI4CE
30
14
0
15 Aug 2023
Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
Tongxin Li
Yiheng Lin
Shaolei Ren
Adam Wierman
AAML
OffRL
34
6
0
20 Jul 2023
A User Study on Explainable Online Reinforcement Learning for Adaptive Systems
Andreas Metzger
Jan Laufer
Felix Feit
Klaus Pohl
OffRL
OnRL
24
1
0
09 Jul 2023
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
Haoran Xu
Li Jiang
Jianxiong Li
Zhuoran Yang
Zhaoran Wang
Victor Chan
Xianyuan Zhan
OffRL
36
73
0
28 Mar 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
33
0
0
22 Mar 2023
Inference on Optimal Dynamic Policies via Softmax Approximation
Qizhao Chen
Morgane Austern
Vasilis Syrgkanis
OffRL
33
1
0
08 Mar 2023
Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing
Shuai Xiao
Le Guo
Zaifan Jiang
Lei Lv
Yuanbo Chen
Jun Zhu
Shuang Yang
30
21
0
02 Mar 2023
The In-Sample Softmax for Offline Reinforcement Learning
Chenjun Xiao
Han Wang
Yangchen Pan
Adam White
Martha White
OffRL
29
26
0
28 Feb 2023
A general Markov decision process formalism for action-state entropy-regularized reward maximization
D. Grytskyy
Jorge Ramírez-Ruiz
R. Moreno-Bote
22
3
0
02 Feb 2023
Understanding the Complexity Gains of Single-Task RL with a Curriculum
Qiyang Li
Yuexiang Zhai
Yi Ma
Sergey Levine
37
14
0
24 Dec 2022
A survey on text generation using generative adversarial networks
Gustavo de Rosa
João Paulo Papa
GAN
32
89
0
20 Dec 2022
Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks
Anton Dereventsov
Andrew Starnes
Clayton Webster
26
4
0
21 Nov 2022
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Siliang Zeng
Chenliang Li
Alfredo García
Min-Fong Hong
34
42
0
04 Oct 2022
Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning
Xianfu Chen
Zhifeng Zhao
S. Mao
Celimuge Wu
Honggang Zhang
M. Bennis
OffRL
26
3
0
19 Sep 2022
Variational Inference for Model-Free and Model-Based Reinforcement Learning
Felix Leibfried
OffRL
20
0
0
04 Sep 2022
Entropy Augmented Reinforcement Learning
Jianfei Ma
36
0
0
19 Aug 2022
Choquet regularization for reinforcement learning
Xia Han
Ruodu Wang
X. Zhou
38
2
0
17 Aug 2022
Robust Knowledge Adaptation for Dynamic Graph Neural Networks
Han Li
Changsheng Li
Kaituo Feng
Ye Yuan
Guoren Wang
H. Zha
34
13
0
22 Jul 2022
Making Linear MDPs Practical via Contrastive Representation Learning
Tianjun Zhang
Tongzheng Ren
Mengjiao Yang
Joseph E. Gonzalez
Dale Schuurmans
Bo Dai
25
44
0
14 Jul 2022
Algorithm for Constrained Markov Decision Process with Linear Convergence
E. Gladin
Maksim Lavrik-Karmazin
K. Zainullina
Varvara Rudenko
Alexander V. Gasnikov
Martin Takáč
33
6
0
03 Jun 2022
Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning
Qiuhao Chen
Yuxuan Du
Qi Zhao
Yuliang Jiao
Xiliang Lu
Xingyao Wu
23
12
0
14 Apr 2022
DearFSAC: An Approach to Optimizing Unreliable Federated Learning via Deep Reinforcement Learning
Chenghao Huang
Weilong Chen
Yuxi Chen
Shunji Yang
Yanru Zhang
FedML
21
2
0
30 Jan 2022
MAMRL: Exploiting Multi-agent Meta Reinforcement Learning in WAN Traffic Engineering
Shan Sun
M. Kiran
Wei Ren
32
2
0
30 Nov 2021
A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning
Tongzheng Ren
Tianjun Zhang
Csaba Szepesvári
Bo Dai
27
19
0
22 Nov 2021
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
30
10
0
04 Nov 2021
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Wenzhuo Zhou
Ruoqing Zhu
Annie Qu
40
22
0
20 Oct 2021
Divergence-Regularized Multi-Agent Actor-Critic
Kefan Su
Zongqing Lu
46
25
0
01 Oct 2021
Implicitly Regularized RL with Implicit Q-Values
Nino Vieillard
Marcin Andrychowicz
Anton Raichuk
Olivier Pietquin
M. Geist
OffRL
24
9
0
16 Aug 2021
A general sample complexity analysis of vanilla policy gradient
Rui Yuan
Robert Mansel Gower
A. Lazaric
79
62
0
23 Jul 2021
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Alan Chan
Hugo Silva
Sungsu Lim
Tadashi Kozuno
A. R. Mahmood
Martha White
25
29
0
17 Jul 2021
Convergent and Efficient Deep Q Network Algorithm
Zhikang T. Wang
Masahito Ueda
27
12
0
29 Jun 2021
Characterizing the Gap Between Actor-Critic and Policy Gradient
Junfeng Wen
Saurabh Kumar
Ramki Gummadi
Dale Schuurmans
31
15
0
13 Jun 2021
1
2
Next