Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.14457
Cited By
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
20 June 2024
Huifang Du
Shuqin Li
Minghao Wu
Xuejing Feng
Yuan-Fang Li
Haofen Wang
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue"
7 / 7 papers shown
Title
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Minghao Wu
Abdul Waheed
Chiyu Zhang
Muhammad Abdul-Mageed
Alham Fikri Aji
ALM
92
96
0
27 Apr 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
109
351
0
26 Apr 2023
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
99
75
0
05 Jun 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
256
2,029
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
290
8,441
0
04 Mar 2022
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
Yixuan Su
Lei Shu
Elman Mansimov
Arshit Gupta
Deng Cai
Yi-An Lai
Yi Zhang
123
163
0
29 Sep 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
259
1,151
0
18 Sep 2019
1