Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1707.09118
Cited By
v1
v2
v3 (latest)
Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation
28 July 2017
Carolin (Haas) Lawrence
Artem Sokolov
Stefan Riezler
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation"
22 / 22 papers shown
Title
Reinforcement learning
Florentin Wörgötter
401
2,920
0
16 May 2024
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhiwei He
Xing Wang
Wenxiang Jiao
Zhuosheng Zhang
Rui Wang
Shuming Shi
Zhaopeng Tu
ALM
264
33
0
23 Jan 2024
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
283
60
0
11 Oct 2023
Positivity-free Policy Learning with Observational Data
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Pan Zhao
Antoine Chambaz
Julie Josse
Shu Yang
189
6
0
10 Oct 2023
Learning Complementary Policies for Human-AI Teams
Ruijiang Gao
M. Saar-Tsechansky
Maria De-Arteaga
291
10
0
06 Feb 2023
Simulating Bandit Learning from User Feedback for Extractive Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Ge Gao
Eunsol Choi
Yoav Artzi
191
18
0
18 Mar 2022
Loss Functions for Discrete Contextual Pricing with Observational Data
Max Biggs
Ruijiang Gao
Wei-Ju Sun
335
10
0
18 Nov 2021
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits
Julia Kreutzer
David Vilar
Artem Sokolov
169
18
0
13 Oct 2021
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior
Transactions of the Association for Computational Linguistics (TACL), 2021
Noriyuki Kojima
Alane Suhr
Yoav Artzi
155
28
0
10 Aug 2021
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks
Julia Kreutzer
Stefan Riezler
Carolin (Haas) Lawrence
RALM
OffRL
228
17
0
04 Nov 2020
Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning
Lianhui Qin
Vered Shwartz
Peter West
Chandra Bhagavatula
Jena D. Hwang
Ronan Le Bras
Antoine Bosselut
Yejin Choi
OffRL
LRM
369
86
0
12 Oct 2020
Machine Translation System Selection from Bandit Feedback
Conference of the Association for Machine Translation in the Americas (AMTA), 2020
Jason Naradowsky
Xuan Zhang
Kevin Duh
OffRL
170
8
0
22 Feb 2020
On the Fairness of Randomized Trials for Recommendation with Heterogeneous Demographics and Beyond
Zifeng Wang
Xi Chen
Rui Wen
Shao-Lun Huang
239
1
0
25 Jan 2020
MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic Programming
Symposium on Advances in Approximate Bayesian Inference (AABI), 2019
Yura N. Perov
L. Graham
Kostis Gourgoulias
Jonathan G. Richens
Ciarán M. Gilligan-Lee
Adam Baker
Saurabh Johri
LRM
192
17
0
17 Oct 2019
FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Henry B. Moss
Andrew Moore
David S. Leslie
Paul Rayson
99
5
0
28 Jun 2019
Counterfactual Learning from Human Proofreading Feedback for Semantic Parsing
Carolin (Haas) Lawrence
Stefan Riezler
OffRL
136
7
0
29 Nov 2018
Learning from Chunk-based Feedback in Neural Machine Translation
Pavel Petrushkov
Shahram Khadivi
E. Matusov
132
19
0
19 Jun 2018
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
Julia Kreutzer
Joshua Uyheng
Stefan Riezler
273
92
0
27 May 2018
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback
Carolin (Haas) Lawrence
Stefan Riezler
OffRL
442
57
0
03 May 2018
Can Neural Machine Translation be Improved with User Feedback?
Julia Kreutzer
Shahram Khadivi
E. Matusov
Stefan Riezler
181
100
0
16 Apr 2018
Counterfactual Learning for Machine Translation: Degeneracies and Solutions
Carolin (Haas) Lawrence
Pratik Gajane
Stefan Riezler
OffRL
CML
99
7
0
23 Nov 2017
A Shared Task on Bandit Learning for Machine Translation
Artem Sokolov
Julia Kreutzer
Kellen Sunderland
Pavel Danchenko
Witold Szymaniak
Hagen Fürstenau
Stefan Riezler
139
16
0
27 Jul 2017
1