ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.05415
  4. Cited By
Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

16 January 2019
Braden Hancock
Antoine Bordes
Pierre-Emmanuel Mazaré
Jason Weston
ArXivPDFHTML

Papers citing "Learning from Dialogue after Deployment: Feed Yourself, Chatbot!"

50 / 109 papers shown
Title
Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding
Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding
Kimia Ramezan
Alireza Amiri Bavandpour
Yifei Yuan
Clemencia Siro
Mohammad Aliannejadi
38
0
0
17 Feb 2025
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization
Xuefeng Liu
Songhao Jiang
Siyu Chen
Zhuoran Yang
Yuxin Chen
Ian Foster
Rick L. Stevens
LM&MA
OffRL
60
0
0
11 Feb 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
52
1
0
07 Jan 2025
Retrospective Learning from Interactions
Retrospective Learning from Interactions
Zizhao Chen
Mustafa Omer Gul
Yiwei Chen
Gloria Geng
Anne Wu
Yoav Artzi
LRM
36
1
0
17 Oct 2024
An Approach for Auto Generation of Labeling Functions for Software
  Engineering Chatbots
An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots
Ebube Alor
Ahmad Abdellatif
SayedHassan Khatoonabadi
Emad Shihab
42
0
0
09 Oct 2024
The Future of Open Human Feedback
The Future of Open Human Feedback
Shachar Don-Yehiya
Ben Burtenshaw
Ramon Fernandez Astudillo
Cailean Osborne
Mimansa Jaiswal
...
Omri Abend
Jennifer Ding
Sara Hooker
Hannah Rose Kirk
Leshem Choshen
VLM
ALM
64
4
0
15 Aug 2024
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
Shachar Don-Yehiya
Leshem Choshen
Omri Abend
31
2
0
15 Aug 2024
Learning Random Numbers to Realize Appendable Memory System for
  Artificial Intelligence to Acquire New Knowledge after Deployment
Learning Random Numbers to Realize Appendable Memory System for Artificial Intelligence to Acquire New Knowledge after Deployment
Kazunori D Yamada
37
0
0
29 Jul 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large
  Language and Vision-Language Models
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
43
26
0
26 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
59
8
0
17 Jun 2024
A Survey of Language-Based Communication in Robotics
A Survey of Language-Based Communication in Robotics
William Hunt
Sarvapali D. Ramchurn
Mohammad D. Soorati
LM&Ro
70
12
0
06 Jun 2024
Aligning LLM Agents by Learning Latent Preference from User Edits
Aligning LLM Agents by Learning Latent Preference from User Edits
Ge Gao
Alexey Taymanov
Eduardo Salinas
Paul Mineiro
Dipendra Kumar Misra
LLMAG
42
27
0
23 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
34
35
0
12 Apr 2024
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
Qi Gou
Cam-Tu Nguyen
35
8
0
28 Mar 2024
RLVF: Learning from Verbal Feedback without Overgeneralization
RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan
Alexander Khazatsky
Eric Mitchell
Annie S. Chen
Sheryl Hsu
Archit Sharma
Chelsea Finn
45
12
0
16 Feb 2024
Asking Multimodal Clarifying Questions in Mixed-Initiative
  Conversational Search
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
Yifei Yuan
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
Wai Lam
26
6
0
12 Feb 2024
Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate
  Group Conversations
Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations
Manqing Mao
Paishun Ting
Yijian Xiang
Mingyang Xu
Julia Chen
Jianzhe Lin
LLMAG
41
6
0
10 Jan 2024
Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend
  Existing Ones?
Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones?
Dominic Petrak
N. Moosavi
Ye Tian
Nikolai Rozanov
Iryna Gurevych
16
5
0
24 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
  Large Language Model Compression
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
Ran Wang
Rui Yan
27
4
0
24 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large
  Language Models for Subjective Human Preferences and Values
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
26
42
0
11 Oct 2023
PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
  Dialogue Systems
PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems
Bryan Wilie
Yan Xu
Willy Chung
Samuel Cahyawijaya
Holy Lovenia
Pascale Fung
40
1
0
19 Sep 2023
An Empirical Evaluation of Prompting Strategies for Large Language
  Models in Zero-Shot Clinical Natural Language Processing
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing
Sonish Sivarajkumar
Mark Kelley
Alyssa Samolyk-Mazzanti
Shyam Visweswaran
Yanshan Wang
LM&MA
46
28
0
14 Sep 2023
Leveraging Implicit Feedback from Deployment Data in Dialogue
Leveraging Implicit Feedback from Deployment Data in Dialogue
Richard Yuanzhe Pang
Stephen Roller
Kyunghyun Cho
He He
Jason Weston
53
8
0
26 Jul 2023
Domain-Agnostic Neural Architecture for Class Incremental Continual
  Learning in Document Processing Platform
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform
Mateusz Wójcik
Witold Ko'sciukiewicz
Mateusz Baran
Tomasz Kajdanowicz
Adam Gonczarek
CLL
27
1
0
11 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
31
4
0
01 Jul 2023
System-Level Natural Language Feedback
System-Level Natural Language Feedback
Weizhe Yuan
Kyunghyun Cho
Jason Weston
41
5
0
23 Jun 2023
Improving Open Language Models by Learning from Organic Interactions
Improving Open Language Models by Learning from Organic Interactions
Jing Xu
Da Ju
Joshua Lane
M. Komeili
Eric Michael Smith
...
Rashel Moritz
Sainbayar Sukhbaatar
Y-Lan Boureau
Jason Weston
Kurt Shuster
30
9
0
07 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
53
305
0
02 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
29
23
0
01 Jun 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
45
549
0
22 May 2023
Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process
Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process
Fanghua Ye
Zhiyuan Hu
Emine Yilmaz
34
6
0
21 May 2023
Sentence Level Curriculum Learning for Improved Neural Conversational
  Models
Sentence Level Curriculum Learning for Improved Neural Conversational Models
S. Paulsen
25
0
0
15 May 2023
Learning to Simulate Natural Language Feedback for Interactive Semantic
  Parsing
Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing
Hao Yan
Saurabh Srivastava
Yintao Tai
Sida I. Wang
Wen-tau Yih
Ziyu Yao
37
17
0
14 May 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
224
506
0
03 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
  Language Generation
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
117
56
0
01 May 2023
Training Language Models with Language Feedback at Scale
Training Language Models with Language Feedback at Scale
Jérémy Scheurer
Jon Ander Campos
Tomasz Korbak
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
53
103
0
28 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for
  the alignment of large language models with personalised feedback
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
46
100
0
09 Mar 2023
Chain of Hindsight Aligns Language Models with Feedback
Chain of Hindsight Aligns Language Models with Feedback
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
31
117
0
06 Feb 2023
JASMINE: Arabic GPT Models for Few-Shot Learning
JASMINE: Arabic GPT Models for Few-Shot Learning
El Moatez Billah Nagoudi
Muhammad Abdul-Mageed
AbdelRahim Elmadany
Alcides Alcoba Inciarte
Md. Tawkat Islam Khondaker
33
7
0
21 Dec 2022
Human-in-the-loop Abstractive Dialogue Summarization
Human-in-the-loop Abstractive Dialogue Summarization
Jiaao Chen
Mohan Dodda
Diyi Yang
33
10
0
19 Dec 2022
Optimizing Prompts for Text-to-Image Generation
Optimizing Prompts for Text-to-Image Generation
Y. Hao
Zewen Chi
Li Dong
Furu Wei
55
140
0
19 Dec 2022
The CRINGE Loss: Learning what language not to model
The CRINGE Loss: Learning what language not to model
Leonard Adolphs
Tianyu Gao
Jing Xu
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
MU
31
35
0
10 Nov 2022
Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language
  Model Control
Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control
Xiang Fan
Yiwei Lyu
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
BDL
32
6
0
10 Nov 2022
User or Labor: An Interaction Framework for Human-Machine Relationships
  in NLP
User or Labor: An Interaction Framework for Human-Machine Relationships in NLP
Ruyuan Wan
Naome A. Etori
Karla A. Badillo-Urquiola
Dongyeop Kang
11
8
0
03 Nov 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad
  Responses into Good Labels
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Weiyan Shi
Emily Dinan
Kurt Shuster
Jason Weston
Jing Xu
52
19
0
28 Oct 2022
Adaptive Natural Language Generation for Task-oriented Dialogue via
  Reinforcement Learning
Adaptive Natural Language Generation for Task-oriented Dialogue via Reinforcement Learning
Atsumoto Ohashi
Ryuichiro Higashinaka
OffRL
34
6
0
16 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Learning from data in the mixed adversarial non-adversarial case:
  Finding the helpers and ignoring the trolls
Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls
Da Ju
Jing Xu
Y-Lan Boureau
Jason Weston
AAML
29
17
0
05 Aug 2022
Learning New Skills after Deployment: Improving open-domain
  internet-driven dialogue with human feedback
Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
Jing Xu
Megan Ung
M. Komeili
Kushal Arora
Y-Lan Boureau
Jason Weston
30
37
0
05 Aug 2022
BlenderBot 3: a deployed conversational agent that continually learns to
  responsibly engage
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
Kurt Shuster
Jing Xu
M. Komeili
Da Ju
Eric Michael Smith
...
Naman Goyal
Arthur Szlam
Y-Lan Boureau
Melanie Kambadur
Jason Weston
LM&Ro
KELM
37
235
0
05 Aug 2022
123
Next