ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.00456
  4. Cited By
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human
  Preferences in Dialog

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

30 June 2019
Natasha Jaques
Asma Ghandeharioun
J. Shen
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
    OffRL
ArXivPDFHTML

Papers citing "Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog"

50 / 102 papers shown
Title
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
61
14
0
11 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
25
51
0
01 Jun 2022
A Mixture-of-Expert Approach to RL-based Dialogue Management
A Mixture-of-Expert Approach to RL-based Dialogue Management
Yinlam Chow
Azamat Tulepbergenov
Ofir Nachum
Moonkyung Ryu
Mohammad Ghavamzadeh
Craig Boutilier
MoE
25
14
0
31 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
372
12,081
0
04 Mar 2022
Supported Policy Optimization for Offline Reinforcement Learning
Supported Policy Optimization for Offline Reinforcement Learning
Jialong Wu
Haixu Wu
Zihan Qiu
Jianmin Wang
Mingsheng Long
OffRL
35
65
0
13 Feb 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
13
611
0
07 Feb 2022
Offline Reinforcement Learning for Road Traffic Control
Offline Reinforcement Learning for Road Traffic Control
Mayuresh Kunjir
Sanjay Chawla
OffRL
32
4
0
07 Jan 2022
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in
  General-Sum Markov Games with Myopic Followers?
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?
Han Zhong
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
31
30
0
27 Dec 2021
Dealing with the Unknown: Pessimistic Offline Reinforcement Learning
Dealing with the Unknown: Pessimistic Offline Reinforcement Learning
Jinning Li
Chen Tang
Masayoshi Tomizuka
Wei Zhan
OffRL
19
21
0
09 Nov 2021
Value Penalized Q-Learning for Recommender Systems
Value Penalized Q-Learning for Recommender Systems
Chengqian Gao
Ke Xu
Kuangqi Zhou
Lanqing Li
Xueqian Wang
Bo Yuan
P. Zhao
OffRL
54
20
0
15 Oct 2021
Medical Dead-ends and Learning to Identify High-risk States and
  Treatments
Medical Dead-ends and Learning to Identify High-risk States and Treatments
Mehdi Fatemi
Taylor W. Killian
J. Subramanian
Marzyeh Ghassemi
OffRL
30
37
0
08 Oct 2021
Dropout Q-Functions for Doubly Efficient Reinforcement Learning
Dropout Q-Functions for Doubly Efficient Reinforcement Learning
Takuya Hiraoka
Takahisa Imagawa
Taisei Hashimoto
Takashi Onishi
Yoshimasa Tsuruoka
11
105
0
05 Oct 2021
Recursively Summarizing Books with Human Feedback
Recursively Summarizing Books with Human Feedback
Jeff Wu
Long Ouyang
Daniel M. Ziegler
Nissan Stiennon
Ryan J. Lowe
Jan Leike
Paul Christiano
ALM
35
296
0
22 Sep 2021
A Workflow for Offline Model-Free Robotic Reinforcement Learning
A Workflow for Offline Model-Free Robotic Reinforcement Learning
Aviral Kumar
Anika Singh
Stephen Tian
Chelsea Finn
Sergey Levine
OffRL
143
85
0
22 Sep 2021
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Tianhe Yu
Aviral Kumar
Yevgen Chebotar
Karol Hausman
Sergey Levine
Chelsea Finn
OffRL
35
77
0
16 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement
  Learning
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
29
115
0
19 Aug 2021
Optimal Actor-Critic Policy with Optimized Training Datasets
Optimal Actor-Critic Policy with Optimized Training Datasets
C. Banerjee
Zhiyong Chen
N. Noman
M. Zamani
OffRL
33
7
0
16 Aug 2021
Offline Decentralized Multi-Agent Reinforcement Learning
Offline Decentralized Multi-Agent Reinforcement Learning
Jiechuan Jiang
Zongqing Lu
OffRL
28
37
0
04 Aug 2021
OptiDICE: Offline Policy Optimization via Stationary Distribution
  Correction Estimation
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
Jongmin Lee
Wonseok Jeon
Byung-Jun Lee
J. Pineau
Kee-Eung Kim
OffRL
37
91
0
21 Jun 2021
Behavioral Priors and Dynamics Models: Improving Performance and Domain
  Transfer in Offline RL
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL
Catherine Cang
Aravind Rajeswaran
Pieter Abbeel
Michael Laskin
OffRL
32
29
0
16 Jun 2021
Offline RL Without Off-Policy Evaluation
Offline RL Without Off-Policy Evaluation
David Brandfonbrener
William F. Whitney
Rajesh Ranganath
Joan Bruna
OffRL
42
162
0
16 Jun 2021
A Minimalist Approach to Offline Reinforcement Learning
A Minimalist Approach to Offline Reinforcement Learning
Scott Fujimoto
S. Gu
OffRL
58
785
0
12 Jun 2021
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Yue Wu
Shuangfei Zhai
Nitish Srivastava
J. Susskind
Jian Zhang
Ruslan Salakhutdinov
Hanlin Goh
EDL
OffRL
OnRL
21
184
0
17 May 2021
Regularized Behavior Value Estimation
Regularized Behavior Value Estimation
Çağlar Gülçehre
Sergio Gomez Colmenarejo
Ziyun Wang
Jakub Sygnowski
T. Paine
Konrad Zolna
Yutian Chen
Matthew W. Hoffman
Razvan Pascanu
Nando de Freitas
OffRL
31
37
0
17 Mar 2021
Offline Reinforcement Learning with Fisher Divergence Critic
  Regularization
Offline Reinforcement Learning with Fisher Divergence Critic Regularization
Ilya Kostrikov
Jonathan Tompson
Rob Fergus
Ofir Nachum
OffRL
29
300
0
14 Mar 2021
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement
  Learning
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning
Samarth Sinha
Ajay Mandlekar
Animesh Garg
OffRL
26
106
0
10 Mar 2021
Offline Reinforcement Learning with Pseudometric Learning
Offline Reinforcement Learning with Pseudometric Learning
Robert Dadashi
Shideh Rezaeifar
Nino Vieillard
Léonard Hussenot
Olivier Pietquin
M. Geist
OffRL
39
40
0
02 Mar 2021
COMBO: Conservative Offline Model-Based Policy Optimization
COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu
Aviral Kumar
Rafael Rafailov
Aravind Rajeswaran
Sergey Levine
Chelsea Finn
OffRL
222
419
0
16 Feb 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
60
73
0
01 Jan 2021
Is Pessimism Provably Efficient for Offline RL?
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
27
349
0
30 Dec 2020
PLAS: Latent Action Space for Offline Reinforcement Learning
PLAS: Latent Action Space for Offline Reinforcement Learning
Wenxuan Zhou
Sujay Bajracharya
David Held
OffRL
38
158
0
14 Nov 2020
COG: Connecting New Skills to Past Experience with Offline Reinforcement
  Learning
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
Avi Singh
Albert Yu
Jonathan Yang
Jesse Zhang
Aviral Kumar
Sergey Levine
SSL
OffRL
OnRL
35
103
0
27 Oct 2020
Behavior Priors for Efficient Reinforcement Learning
Behavior Priors for Efficient Reinforcement Learning
Dhruva Tirumala
Alexandre Galashov
Hyeonwoo Noh
Leonard Hasenclever
Razvan Pascanu
...
Guillaume Desjardins
Wojciech M. Czarnecki
Arun Ahuja
Yee Whye Teh
N. Heess
37
39
0
27 Oct 2020
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
  Learning
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay
Aviral Kumar
Pulkit Agrawal
Sergey Levine
Ofir Nachum
OffRL
OnRL
39
155
0
26 Oct 2020
The Importance of Pessimism in Fixed-Dataset Policy Optimization
The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman
Carles Gelada
Marc G. Bellemare
OffRL
42
135
0
15 Sep 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
56
1,994
0
02 Sep 2020
Offline Meta-Reinforcement Learning with Advantage Weighting
Offline Meta-Reinforcement Learning with Advantage Weighting
E. Mitchell
Rafael Rafailov
Xue Bin Peng
Sergey Levine
Chelsea Finn
OffRL
38
104
0
13 Aug 2020
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation
  for Reinforcement Learning
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
Ming Yin
Yu Bai
Yu-Xiang Wang
OffRL
41
31
0
07 Jul 2020
Critic Regularized Regression
Critic Regularized Regression
Ziyun Wang
Alexander Novikov
Konrad Zolna
Jost Tobias Springenberg
Scott E. Reed
...
Noah Y. Siegel
J. Merel
Çağlar Gülçehre
N. Heess
Nando de Freitas
OffRL
36
319
0
26 Jun 2020
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair
Abhishek Gupta
Murtaza Dalal
Sergey Levine
OffRL
OnRL
46
591
0
16 Jun 2020
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu
Aviral Kumar
Ofir Nachum
George Tucker
Sergey Levine
GP
OffRL
78
1,315
0
15 Apr 2020
An empirical investigation of the challenges of real-world reinforcement
  learning
An empirical investigation of the challenges of real-world reinforcement learning
Gabriel Dulac-Arnold
Nir Levine
D. Mankowitz
Jerry Li
Cosmin Paduraru
Sven Gowal
Todd Hester
OffRL
34
120
0
24 Mar 2020
BRPO: Batch Residual Policy Optimization
BRPO: Batch Residual Policy Optimization
Kentaro Kanamori
Yinlam Chow
Takuya Takagi
Hiroki Arimura
Honglak Lee
Ken Kobayashi
Craig Boutilier
OffRL
141
46
0
08 Feb 2020
IRIS: Implicit Reinforcement without Interaction at Scale for Learning
  Control from Offline Robot Manipulation Data
IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data
Ajay Mandlekar
Fabio Ramos
Byron Boots
Silvio Savarese
Li Fei-Fei
Animesh Garg
Dieter Fox
OffRL
34
117
0
13 Nov 2019
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement
  Learning
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
Xinyue Chen
Zijian Zhou
Ziwen Wang
Che Wang
Yanqiu Wu
Keith Ross
OffRL
30
121
0
27 Oct 2019
Benchmarking Batch Deep Reinforcement Learning Algorithms
Benchmarking Batch Deep Reinforcement Learning Algorithms
Shih-Han Chou
Wen-Yen Chang
W. Hsu
Jianlong Fu
OffRL
18
181
0
03 Oct 2019
Scaling data-driven robotics with reward sketching and batch
  reinforcement learning
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Serkan Cabi
Sergio Gomez Colmenarejo
Alexander Novikov
Ksenia Konyushkova
Scott E. Reed
...
David Barker
Jonathan Scholz
Misha Denil
Nando de Freitas
Ziyun Wang
OffRL
28
29
0
26 Sep 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,616
0
18 Sep 2019
Approximating Interactive Human Evaluation with Self-Play for
  Open-Domain Dialog Systems
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
Asma Ghandeharioun
J. Shen
Natasha Jaques
Craig Ferguson
Noah J. Jones
Àgata Lapedriza
Rosalind W. Picard
12
91
0
21 Jun 2019
Dialogue Learning With Human-In-The-Loop
Dialogue Learning With Human-In-The-Loop
Jiwei Li
Alexander H. Miller
S. Chopra
MarcÁurelio Ranzato
Jason Weston
OffRL
227
134
0
29 Nov 2016
Previous
123
Next