Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.05715
Cited By
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
15 November 2017
Zachary Chase Lipton
Xiujun Li
Jianfeng Gao
Lihong Li
Faisal Ahmed
Li Deng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems"
30 / 30 papers shown
Title
Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability
Navin Kamuni
Hardik Shah
Sathishkumar Chintala
Naveen Kunchakuri
Sujatha Alla Old Dominion
39
19
0
31 Jan 2024
TOD-Flow: Modeling the Structure of Task-Oriented Dialogues
Sungryull Sohn
Yiwei Lyu
Anthony Z. Liu
Lajanugen Logeswaran
Dong-Ki Kim
Dongsub Shim
Honglak Lee
28
3
0
07 Dec 2023
Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization
Yangyang Zhao
Zhenyu Wang
Mehdi Dastani
Shihan Wang
24
0
0
05 May 2023
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration
Chentian Jiang
Nan Rosemary Ke
Hado van Hasselt
16
3
0
08 Feb 2023
Knowledge-Guided Exploration in Deep Reinforcement Learning
Sahisnu Mazumder
Bing-Quan Liu
Shuai Wang
Yingxuan Zhu
Xiaotian Yin
Lifeng Liu
Jian Li
50
4
0
26 Oct 2022
Learning Deformable Object Manipulation from Expert Demonstrations
G. Salhotra
Isabella Liu
Marcus Dominguez-Kuhne
Gaurav Sukhatme
36
27
0
20 Jul 2022
Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings
Jorge Armando Mendez Mendez
Alborz Geramifard
Mohammad Ghavamzadeh
Bing-Quan Liu
OffRL
27
6
0
01 Jul 2022
A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-Oriented Dialogue Policy Learning
Wai-Chung Kwan
Hongru Wang
Huimin Wang
Kam-Fai Wong
OffRL
38
43
0
28 Feb 2022
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network
Yun-Da Tsai
Shou-De Lin
48
5
0
17 Feb 2022
Neural Collaborative Filtering Bandits via Meta Learning
Yikun Ban
Yunzhe Qi
Tianxin Wei
Jingrui He
OffRL
39
9
0
31 Jan 2022
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain
Jianye Hao
Tianpei Yang
Hongyao Tang
Chenjia Bai
Jinyi Liu
Zhaopeng Meng
Peng Liu
Zhen Wang
OffRL
38
93
0
14 Sep 2021
Bayesian Bellman Operators
M. Fellows
Kristian Hartikainen
Shimon Whiteson
OffRL
42
15
0
09 Jun 2021
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Yue Wu
Shuangfei Zhai
Nitish Srivastava
J. Susskind
Jian Zhang
Ruslan Salakhutdinov
Hanlin Goh
EDL
OffRL
OnRL
21
184
0
17 May 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
60
73
0
01 Jan 2021
Neural Thompson Sampling
Weitong Zhang
Dongruo Zhou
Lihong Li
Quanquan Gu
34
115
0
02 Oct 2020
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management
Zhi Chen
Lu Chen
Xiaoyuan Liu
Kai Yu
33
20
0
22 Sep 2020
Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption
Hongyin Luo
Shang-Wen Li
James R. Glass
BDL
MedIm
25
9
0
19 May 2020
Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue Task
Katya Kudashkina
Valliappa Chockalingam
Graham W. Taylor
Michael Bowling
OffRL
LLMAG
33
2
0
28 Apr 2020
Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness
Zheng Zhang
Lizi Liao
Xiaoyan Zhu
Tat-Seng Chua
Zitao Liu
Yan Huang
Minlie Huang
LLMAG
30
19
0
21 Apr 2020
Neural Contextual Bandits with UCB-based Exploration
Dongruo Zhou
Lihong Li
Quanquan Gu
36
15
0
11 Nov 2019
Incremental Learning from Scratch for Task-Oriented Dialogue Systems
Weikang Wang
Jiajun Zhang
Q. Li
M. Hwang
Chengqing Zong
Zhifei Li
CLL
28
21
0
12 Jun 2019
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning
Lu Chen
Zhi Chen
Bowen Tan
Sishan Long
Milica Gasic
Kai Yu
19
35
0
27 May 2019
Perturbed-History Exploration in Stochastic Linear Bandits
B. Kveton
Csaba Szepesvári
Mohammad Ghavamzadeh
Craig Boutilier
16
41
0
21 Mar 2019
Where Do Human Heuristics Come From?
Marcel Binz
Dominik M. Endres
16
0
0
20 Feb 2019
Certified Reinforcement Learning with Logic Guidance
Mohammadhosein Hasanbeig
Daniel Kroening
Alessandro Abate
21
53
0
02 Feb 2019
End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis
Lin Xu
Qixian Zhou
Ke Gong
Xiaodan Liang
Jianheng Tang
Liang Lin
MedIm
24
166
0
30 Jan 2019
Neural Approaches to Conversational AI
Jianfeng Gao
Michel Galley
Lihong Li
49
670
0
21 Sep 2018
Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
Shang-Yu Su
Xiujun Li
Jianfeng Gao
Jingjing Liu
Yun-Nung Chen
OffRL
27
67
0
28 Aug 2018
Modeling Multi-turn Conversation with Deep Utterance Aggregation
ZhuoSheng Zhang
Jiangtong Li
Peng Fei Zhu
Zhao Hai
Gongshen Liu
27
251
0
24 Jun 2018
Deep Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Daniel Russo
Zheng Wen
41
300
0
22 Mar 2017
1