Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1805.10627
Cited By
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
27 May 2018
Julia Kreutzer
Joshua Uyheng
Stefan Riezler
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning"
18 / 18 papers shown
Title
Post-edits Are Preferences Too
Nathaniel Berger
Stefan Riezler
M. Exel
Matthias Huck
37
0
0
24 Feb 2025
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
31
1
0
08 Nov 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
88
5
0
13 Sep 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
50
10
0
11 Jun 2024
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar
Andrew S. Lan
33
8
0
01 Mar 2024
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
21
12
0
16 Nov 2023
LitSumm: Large language models for literature summarisation of non-coding RNAs
Andrew Green
C. Ribas
Nancy Ontiveros-Palacios
Sam Griffiths-Jones
Anton I. Petrov
Alex Bateman
Blake Sweeney
24
4
0
06 Nov 2023
Continually Improving Extractive QA via Human Feedback
Ge Gao
Hung-Ting Chen
Yoav Artzi
Eunsol Choi
24
12
0
21 May 2023
Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement
Gavin Abercrombie
Verena Rieser
Dirk Hovy
51
16
0
25 Jan 2023
Mapping the Design Space of Human-AI Interaction in Text Summarization
Ruijia Cheng
Alison Smith-Renner
Kecheng Zhang
Joel R. Tetreault
A. Jaimes
41
31
0
29 Jun 2022
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
59
14
0
11 Jun 2022
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior
Noriyuki Kojima
Alane Suhr
Yoav Artzi
25
24
0
10 Aug 2021
Interactive Learning from Activity Description
Khanh Nguyen
Dipendra Kumar Misra
Robert Schapire
Miroslav Dudík
Patrick Shafto
47
34
0
13 Feb 2021
Open Problems in Cooperative AI
Allan Dafoe
Edward Hughes
Yoram Bachrach
Tantum Collins
Kevin R. McKee
Joel Z. Leibo
Kate Larson
T. Graepel
24
199
0
15 Dec 2020
Informed Machine Learning -- A Taxonomy and Survey of Integrating Knowledge into Learning Systems
Laura von Rueden
S. Mayer
Katharina Beckh
B. Georgiev
Sven Giesselbach
...
Rajkumar Ramamurthy
Michal Walczak
Jochen Garcke
Christian Bauckhage
Jannis Schuecker
34
626
0
29 Mar 2019
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
28
395
0
19 Nov 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
255
13,364
0
25 Aug 2014
1