Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

27 May 2018

Papers citing "Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning"

18 / 18 papers shown

Title
Post-edits Are Preferences Too Nathaniel Berger Stefan Riezler M. Exel Matthias Huck 37 0 0 24 Feb 2025
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings Miguel Moura Ramos Tomás Almeida Daniel Vareta Filipe Azevedo Sweta Agrawal Patrick Fernandes André F. T. Martins 31 1 0 08 Nov 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment Leitian Tao Yixuan Li 88 5 0 13 Sep 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees Sijia Chen Yibo Wang Yi-Feng Wu Qing-Guo Chen Zhao Xu Weihua Luo Kaifu Zhang Lijun Zhang LLMAG LRM 50 10 0 11 Jun 2024
Improving Socratic Question Generation using Data Augmentation and Preference Optimization Nischal Ashok Kumar Andrew S. Lan 33 8 0 01 Mar 2024
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models Jiong Wang Junlin Wu Muhao Chen Yevgeniy Vorobeychik Chaowei Xiao AAML 21 12 0 16 Nov 2023
LitSumm: Large language models for literature summarisation of non-coding RNAs Andrew Green C. Ribas Nancy Ontiveros-Palacios Sam Griffiths-Jones Anton I. Petrov Alex Bateman Blake Sweeney 24 4 0 06 Nov 2023
Continually Improving Extractive QA via Human Feedback Ge Gao Hung-Ting Chen Yoav Artzi Eunsol Choi 24 12 0 21 May 2023
Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement Gavin Abercrombie Verena Rieser Dirk Hovy 51 16 0 25 Jan 2023
Mapping the Design Space of Human-AI Interaction in Text Summarization Ruijia Cheng Alison Smith-Renner Kecheng Zhang Joel R. Tetreault A. Jaimes 41 31 0 29 Jun 2022
Why is constrained neural language generation particularly challenging? Cristina Garbacea Qiaozhu Mei 59 14 0 11 Jun 2022
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior Noriyuki Kojima Alane Suhr Yoav Artzi 25 24 0 10 Aug 2021
Interactive Learning from Activity Description Khanh Nguyen Dipendra Kumar Misra Robert Schapire Miroslav Dudík Patrick Shafto 47 34 0 13 Feb 2021
Open Problems in Cooperative AI Allan Dafoe Edward Hughes Yoram Bachrach Tantum Collins Kevin R. McKee Joel Z. Leibo Kate Larson T. Graepel 24 199 0 15 Dec 2020
Informed Machine Learning -- A Taxonomy and Survey of Integrating Knowledge into Learning Systems Laura von Rueden S. Mayer Katharina Beckh B. Georgiev Sven Giesselbach ... Rajkumar Ramamurthy Michal Walczak Jochen Garcke Christian Bauckhage Jannis Schuecker 34 626 0 29 Mar 2019
Scalable agent alignment via reward modeling: a research direction Jan Leike David M. Krueger Tom Everitt Miljan Martic Vishal Maini Shane Legg 28 395 0 19 Nov 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,743 0 26 Sep 2016
Convolutional Neural Networks for Sentence Classification Yoon Kim AILaw VLM 255 13,364 0 25 Aug 2014