ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.05715
  4. Cited By

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

15 November 2017
Zachary Chase Lipton
Xiujun Li
Jianfeng Gao
Lihong Li
Faisal Ahmed
Li Deng
ArXivPDFHTML

Papers citing "BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems"

30 / 30 papers shown
Title
Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic
  Motivation Reinforcement Learning Algorithms for Improved Training and
  Adaptability
Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability
Navin Kamuni
Hardik Shah
Sathishkumar Chintala
Naveen Kunchakuri
Sujatha Alla Old Dominion
39
19
0
31 Jan 2024
TOD-Flow: Modeling the Structure of Task-Oriented Dialogues
TOD-Flow: Modeling the Structure of Task-Oriented Dialogues
Sungryull Sohn
Yiwei Lyu
Anthony Z. Liu
Lajanugen Logeswaran
Dong-Ki Kim
Dongsub Shim
Honglak Lee
28
3
0
07 Dec 2023
Rescue Conversations from Dead-ends: Efficient Exploration for
  Task-oriented Dialogue Policy Optimization
Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization
Yangyang Zhao
Zhenyu Wang
Mehdi Dastani
Shihan Wang
24
0
0
05 May 2023
Learning How to Infer Partial MDPs for In-Context Adaptation and
  Exploration
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration
Chentian Jiang
Nan Rosemary Ke
Hado van Hasselt
16
3
0
08 Feb 2023
Knowledge-Guided Exploration in Deep Reinforcement Learning
Knowledge-Guided Exploration in Deep Reinforcement Learning
Sahisnu Mazumder
Bing-Quan Liu
Shuai Wang
Yingxuan Zhu
Xiaotian Yin
Lifeng Liu
Jian Li
50
4
0
26 Oct 2022
Learning Deformable Object Manipulation from Expert Demonstrations
Learning Deformable Object Manipulation from Expert Demonstrations
G. Salhotra
Isabella Liu
Marcus Dominguez-Kuhne
Gaurav Sukhatme
36
27
0
20 Jul 2022
Reinforcement Learning of Multi-Domain Dialog Policies Via Action
  Embeddings
Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings
Jorge Armando Mendez Mendez
Alborz Geramifard
Mohammad Ghavamzadeh
Bing-Quan Liu
OffRL
27
6
0
01 Jul 2022
A Survey on Recent Advances and Challenges in Reinforcement Learning
  Methods for Task-Oriented Dialogue Policy Learning
A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-Oriented Dialogue Policy Learning
Wai-Chung Kwan
Hongru Wang
Huimin Wang
Kam-Fai Wong
OffRL
38
43
0
28 Feb 2022
Fast online inference for nonlinear contextual bandit based on
  Generative Adversarial Network
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network
Yun-Da Tsai
Shou-De Lin
48
5
0
17 Feb 2022
Neural Collaborative Filtering Bandits via Meta Learning
Neural Collaborative Filtering Bandits via Meta Learning
Yikun Ban
Yunzhe Qi
Tianxin Wei
Jingrui He
OffRL
39
9
0
31 Jan 2022
Exploration in Deep Reinforcement Learning: From Single-Agent to
  Multiagent Domain
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain
Jianye Hao
Tianpei Yang
Hongyao Tang
Chenjia Bai
Jinyi Liu
Zhaopeng Meng
Peng Liu
Zhen Wang
OffRL
38
93
0
14 Sep 2021
Bayesian Bellman Operators
Bayesian Bellman Operators
M. Fellows
Kristian Hartikainen
Shimon Whiteson
OffRL
42
15
0
09 Jun 2021
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Yue Wu
Shuangfei Zhai
Nitish Srivastava
J. Susskind
Jian Zhang
Ruslan Salakhutdinov
Hanlin Goh
EDL
OffRL
OnRL
21
184
0
17 May 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
60
73
0
01 Jan 2021
Neural Thompson Sampling
Neural Thompson Sampling
Weitong Zhang
Dongruo Zhou
Lihong Li
Quanquan Gu
34
115
0
02 Oct 2020
Distributed Structured Actor-Critic Reinforcement Learning for Universal
  Dialogue Management
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management
Zhi Chen
Lu Chen
Xiaoyuan Liu
Kai Yu
33
20
0
22 Sep 2020
Prototypical Q Networks for Automatic Conversational Diagnosis and
  Few-Shot New Disease Adaption
Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption
Hongyin Luo
Shang-Wen Li
James R. Glass
BDL
MedIm
25
9
0
19 May 2020
Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue
  Task
Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue Task
Katya Kudashkina
Valliappa Chockalingam
Graham W. Taylor
Michael Bowling
OffRL
LLMAG
33
2
0
28 Apr 2020
Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness
Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness
Zheng Zhang
Lizi Liao
Xiaoyan Zhu
Tat-Seng Chua
Zitao Liu
Yan Huang
Minlie Huang
LLMAG
30
19
0
21 Apr 2020
Neural Contextual Bandits with UCB-based Exploration
Neural Contextual Bandits with UCB-based Exploration
Dongruo Zhou
Lihong Li
Quanquan Gu
36
15
0
11 Nov 2019
Incremental Learning from Scratch for Task-Oriented Dialogue Systems
Incremental Learning from Scratch for Task-Oriented Dialogue Systems
Weikang Wang
Jiajun Zhang
Q. Li
M. Hwang
Chengqing Zong
Zhifei Li
CLL
28
21
0
12 Jun 2019
AgentGraph: Towards Universal Dialogue Management with Structured Deep
  Reinforcement Learning
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning
Lu Chen
Zhi Chen
Bowen Tan
Sishan Long
Milica Gasic
Kai Yu
19
35
0
27 May 2019
Perturbed-History Exploration in Stochastic Linear Bandits
Perturbed-History Exploration in Stochastic Linear Bandits
B. Kveton
Csaba Szepesvári
Mohammad Ghavamzadeh
Craig Boutilier
16
41
0
21 Mar 2019
Where Do Human Heuristics Come From?
Where Do Human Heuristics Come From?
Marcel Binz
Dominik M. Endres
16
0
0
20 Feb 2019
Certified Reinforcement Learning with Logic Guidance
Certified Reinforcement Learning with Logic Guidance
Mohammadhosein Hasanbeig
Daniel Kroening
Alessandro Abate
21
53
0
02 Feb 2019
End-to-End Knowledge-Routed Relational Dialogue System for Automatic
  Diagnosis
End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis
Lin Xu
Qixian Zhou
Ke Gong
Xiaodan Liang
Jianheng Tang
Liang Lin
MedIm
24
166
0
30 Jan 2019
Neural Approaches to Conversational AI
Neural Approaches to Conversational AI
Jianfeng Gao
Michel Galley
Lihong Li
49
670
0
21 Sep 2018
Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
Shang-Yu Su
Xiujun Li
Jianfeng Gao
Jingjing Liu
Yun-Nung Chen
OffRL
27
67
0
28 Aug 2018
Modeling Multi-turn Conversation with Deep Utterance Aggregation
Modeling Multi-turn Conversation with Deep Utterance Aggregation
ZhuoSheng Zhang
Jiangtong Li
Peng Fei Zhu
Zhao Hai
Gongshen Liu
27
251
0
24 Jun 2018
Deep Exploration via Randomized Value Functions
Deep Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Daniel Russo
Zheng Wen
41
300
0
22 Mar 2017
1