ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.04748
  4. Cited By
Imperfect also Deserves Reward: Multi-Level and Sequential Reward
  Modeling for Better Dialog Management

Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

North American Chapter of the Association for Computational Linguistics (NAACL), 2021
10 April 2021
Zhengxu Hou
Bang Liu
Ruihui Zhao
Chinmay Pani
Yafei Liu
Xi Chen
Yefeng Zheng
ArXiv (abs)PDFHTML

Papers citing "Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management"

7 / 7 papers shown
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Towards Reward Fairness in RLHF: From a Resource Allocation PerspectiveAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sheng Ouyang
Yulan Hu
Ge Chen
Qingyang Li
Fuzheng Zhang
Yong Liu
283
8
0
29 May 2025
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Junying Chen
Chi Gui
Anningzhe Gao
Ke Ji
Xidong Wang
Xiang Wan
Benyou Wang
MedImAI4CELM&MA
241
55
0
18 Jul 2024
Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation
  for Automatic Diagnosis
Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis
Hao Wang
Sendong Zhao
Zewen Qiang
Nuwa Xi
Bing Qin
Ting Liu
LM&MA
309
33
0
29 Jan 2024
On Transforming Reinforcement Learning by Transformer: The Development
  Trajectory
On Transforming Reinforcement Learning by Transformer: The Development TrajectoryIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Shengchao Hu
Li Shen
Ya Zhang
Yixin Chen
Dacheng Tao
OffRL
387
75
0
29 Dec 2022
Post-processing Networks: Method for Optimizing Pipeline Task-oriented
  Dialogue Systems using Reinforcement Learning
Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement LearningSIGDIAL Conferences (SIGDIAL), 2022
Atsumoto Ohashi
Ryuichiro Higashinaka
OffRL
225
7
0
25 Jul 2022
Diaformer: Automatic Diagnosis via Symptoms Sequence Generation
Diaformer: Automatic Diagnosis via Symptoms Sequence GenerationAAAI Conference on Artificial Intelligence (AAAI), 2021
Junying Chen
Dongfang Li
Qingcai Chen
Wenxiu Zhou
Xin Liu
MedIm
257
35
0
20 Dec 2021
Hierarchical Reinforcement Learning for Automatic Disease Diagnosis
Hierarchical Reinforcement Learning for Automatic Disease Diagnosis
Cheng Zhong
Kangenbei Liao
Wei Chen
Qianlong Liu
Baolin Peng
Xuanjing Huang
J. Peng
Zhongyu Wei
OffRL
246
5
0
29 Apr 2020
1
Page 1 of 1