Continually Improving Extractive QA via Human Feedback

Continually Improving Extractive QA via Human Feedback

21 May 2023

Papers citing "Continually Improving Extractive QA via Human Feedback"

12 / 12 papers shown

Title
DRS: Deep Question Reformulation With Structured Output Zhecheng Li Y. Wang Bryan Hooi Yujun Cai Nanyun Peng Kai-Wei Chang KELM 71 0 0 27 Nov 2024
Retrospective Learning from Interactions Zizhao Chen Mustafa Omer Gul Yiwei Chen Gloria Geng Anne Wu Yoav Artzi LRM 21 1 0 17 Oct 2024
CoGen: Learning from Feedback with Coupled Comprehension and Generation Mustafa Omer Gul Yoav Artzi 23 3 0 28 Aug 2024
An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation Thai Tang Quoc Duc Ha Minh Tho Quan Thanh Anh Nguyen-Duc LRM 13 1 0 28 Aug 2024
I Could've Asked That: Reformulating Unanswerable Questions Wenting Zhao Ge Gao Claire Cardie Alexander M. Rush ELM 17 1 0 24 Jul 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs Shreyas Chaudhari Pranjal Aggarwal Vishvak Murahari Tanmay Rajpurohit A. Kalyan Karthik Narasimhan A. Deshpande Bruno Castro da Silva 21 34 0 12 Apr 2024
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies Liangming Pan Michael Stephen Saxon Wenda Xu Deepak Nathani Xinyi Wang William Yang Wang KELM LRM 31 201 0 06 Aug 2023
Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios Yilun Zhao Haowei Zhang Shengyun Si Linyong Nan Xiangru Tang Arman Cohan LMTD 17 12 0 24 May 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems Sergey Levine Aviral Kumar George Tucker Justin Fu OffRL GP 329 1,944 0 04 May 2020
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 275 1,583 0 18 Sep 2019
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback Carolin (Haas) Lawrence Stefan Riezler OffRL 171 56 0 03 May 2018