Papers citing 'Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation'

Title
Reinforcement learning Florentin Wörgötter 401 2,920 0 16 May 2024
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward ModelNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 Zhiwei He Xing Wang Wenxiang Jiao Zhuosheng Zhang Rui Wang Shuming Shi Zhaopeng Tu ALM 264 33 0 23 Jan 2024
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and ValuesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Hannah Rose Kirk Andrew M. Bean Bertie Vidgen Paul Röttger Scott A. Hale ALM 283 60 0 11 Oct 2023
Positivity-free Policy Learning with Observational DataInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023 Pan Zhao Antoine Chambaz Julie Josse Shu Yang 189 6 0 10 Oct 2023
Learning Complementary Policies for Human-AI Teams Ruijiang Gao M. Saar-Tsechansky Maria De-Arteaga 291 10 0 06 Feb 2023
Simulating Bandit Learning from User Feedback for Extractive Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Ge Gao Eunsol Choi Yoav Artzi 191 18 0 18 Mar 2022
Loss Functions for Discrete Contextual Pricing with Observational Data Max Biggs Ruijiang Gao Wei-Ju Sun 335 10 0 18 Nov 2021
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits Julia Kreutzer David Vilar Artem Sokolov 169 18 0 13 Oct 2021
Continual Learning for Grounded Instruction Generation by Observing Human Following BehaviorTransactions of the Association for Computational Linguistics (TACL), 2021 Noriyuki Kojima Alane Suhr Yoav Artzi 155 28 0 10 Aug 2021
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks Julia Kreutzer Stefan Riezler Carolin (Haas) Lawrence RALM OffRL 228 17 0 04 Nov 2020
Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning Lianhui Qin Vered Shwartz Peter West Chandra Bhagavatula Jena D. Hwang Ronan Le Bras Antoine Bosselut Yejin Choi OffRL LRM 369 86 0 12 Oct 2020
Machine Translation System Selection from Bandit FeedbackConference of the Association for Machine Translation in the Americas (AMTA), 2020 Jason Naradowsky Xuan Zhang Kevin Duh OffRL 170 8 0 22 Feb 2020
On the Fairness of Randomized Trials for Recommendation with Heterogeneous Demographics and Beyond Zifeng Wang Xi Chen Rui Wen Shao-Lun Huang 239 1 0 25 Jan 2020
MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic ProgrammingSymposium on Advances in Approximate Bayesian Inference (AABI), 2019 Yura N. Perov L. Graham Kostis Gourgoulias Jonathan G. Richens Ciarán M. Gilligan-Lee Adam Baker Saurabh Johri LRM 192 17 0 17 Oct 2019
FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithmsAnnual Meeting of the Association for Computational Linguistics (ACL), 2019 Henry B. Moss Andrew Moore David S. Leslie Paul Rayson 99 5 0 28 Jun 2019
Counterfactual Learning from Human Proofreading Feedback for Semantic Parsing Carolin (Haas) Lawrence Stefan Riezler OffRL 136 7 0 29 Nov 2018
Learning from Chunk-based Feedback in Neural Machine Translation Pavel Petrushkov Shahram Khadivi E. Matusov 132 19 0 19 Jun 2018
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning Julia Kreutzer Joshua Uyheng Stefan Riezler 273 92 0 27 May 2018
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback Carolin (Haas) Lawrence Stefan Riezler OffRL 442 57 0 03 May 2018
Can Neural Machine Translation be Improved with User Feedback? Julia Kreutzer Shahram Khadivi E. Matusov Stefan Riezler 181 100 0 16 Apr 2018
Counterfactual Learning for Machine Translation: Degeneracies and Solutions Carolin (Haas) Lawrence Pratik Gajane Stefan Riezler OffRL CML 99 7 0 23 Nov 2017
A Shared Task on Bandit Learning for Machine Translation Artem Sokolov Julia Kreutzer Kellen Sunderland Pavel Danchenko Witold Szymaniak Hagen Fürstenau Stefan Riezler 139 16 0 27 Jul 2017