Deep reinforcement learning from human preferences

12 June 2017

Papers citing "Deep reinforcement learning from human preferences"

50 / 691 papers shown

Title
The Alignment Problem from a Deep Learning Perspective Richard Ngo Lawrence Chan Sören Mindermann 68 183 0 30 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 234 449 0 23 Aug 2022
Comparison-based Conversational Recommender System with Relative Bandit Feedback Zhihui Xie Tong Yu Canzhe Zhao Shuai Li 25 39 0 21 Aug 2022
Calculus on MDPs: Potential Shaping as a Gradient Erik Jenner H. V. Hoof Adam Gleave 22 4 0 20 Aug 2022
Transformers are Adaptable Task Planners Vidhi Jain Yixin Lin Eric Undersander Yonatan Bisk Akshara Rai 25 24 0 06 Jul 2022
Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning David Lindner Mennatallah El-Assady OffRL 35 16 0 27 Jun 2022
A General Recipe for Likelihood-free Bayesian Optimization Jiaming Song Lantao Yu Willie Neiswanger Stefano Ermon 38 23 0 27 Jun 2022
Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation Jenny Zhang Samson Yu Jiafei Duan Cheston Tan 41 4 0 20 Jun 2022
How to talk so AI will learn: Instructions, descriptions, and autonomy T. Sumers Robert D. Hawkins Mark K. Ho Thomas Griffiths Dylan Hadfield-Menell LM&Ro 38 20 0 16 Jun 2022
Contrastive Learning as Goal-Conditioned Reinforcement Learning Benjamin Eysenbach Tianjun Zhang Ruslan Salakhutdinov Sergey Levine SSL OffRL 39 141 0 15 Jun 2022
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning Yinglun Xu Qi Zeng Gagandeep Singh AAML 40 6 0 30 May 2022
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning Xinran Liang Katherine Shu Kimin Lee Pieter Abbeel 21 58 0 24 May 2022
Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond Masato Mita Keisuke Sakaguchi Masato Hagiwara Tomoya Mizumoto Jun Suzuki Kentaro Inui 48 15 0 23 May 2022
Learning Dense Reward with Temporal Variant Self-Supervision Yuning Wu Jieliang Luo Hui Li SSL 21 0 0 20 May 2022
Aligning Robot Representations with Humans Andreea Bobu Andi Peng 27 0 0 15 May 2022
Perspectives on Incorporating Expert Feedback into Model Updates Valerie Chen Umang Bhatt Hoda Heidari Adrian Weller Ameet Talwalkar 37 11 0 13 May 2022
Adversarial Training for High-Stakes Reliability Daniel M. Ziegler Seraphina Nix Lawrence Chan Tim Bauman Peter Schmidt-Nielsen ... Noa Nabeshima Benjamin Weinstein-Raun D. Haas Buck Shlegeris Nate Thomas AAML 38 59 0 03 May 2022
Counterfactual harm Jonathan G. Richens R. Beard Daniel H. Thompson 31 27 0 27 Apr 2022
Mind the gap: Challenges of deep learning approaches to Theory of Mind Jaan Aru Aqeel Labash Oriol Corcoll Raul Vicente 28 26 0 30 Mar 2022
Uncertainty Estimation for Language Reward Models Adam Gleave G. Irving UQLM 42 31 0 14 Mar 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 384 12,081 0 04 Mar 2022
X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback Jensen Gao S. Reddy Glen Berseth Nicholas Hardy N. Natraj K. Ganguly Anca Dragan Sergey Levine 23 10 0 04 Mar 2022
Reinforcement Learning in Practice: Opportunities and Challenges Yuxi Li OffRL 38 9 0 23 Feb 2022
A Ranking Game for Imitation Learning Harshit S. Sikchi Akanksha Saran Wonjoon Goo S. Niekum OffRL 27 22 0 07 Feb 2022
ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning S. Chen Jensen Gao S. Reddy Glen Berseth Anca Dragan Sergey Levine OffRL 36 11 0 05 Feb 2022
Knowledge-Integrated Informed AI for National Security Anu Myne Kevin J. Leahy Ryan Soklaski 21 0 0 04 Feb 2022
Safe Deep RL in 3D Environments using Human Feedback Matthew Rahtz Vikrant Varma Ramana Kumar Zachary Kenton Shane Legg Jan Leike 32 4 0 20 Jan 2022
Inducing Structure in Reward Learning by Learning Features Andreea Bobu Marius Wiggert Claire Tomlin Anca Dragan 27 30 0 18 Jan 2022
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models Alexander Pan Kush S. Bhatia Jacob Steinhardt 53 172 0 10 Jan 2022
Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning DeepMind Interactive Agents Team Josh Abramson Josh Abramson Arun Ahuja Arthur Brussee Federico Carnevale ... Tamara von Glehn Greg Wayne Nathaniel Wong Chen Yan Rui Zhu LM&Ro 45 46 0 07 Dec 2021
Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft Vinicius G. Goecks Nicholas R. Waytowich David Watkins Bharat Prakash 13 7 0 07 Dec 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences Aldo Pacchiano Aadirupa Saha Jonathan Lee 33 82 0 08 Nov 2021
B-Pref: Benchmarking Preference-Based Reinforcement Learning Kimin Lee Laura M. Smith Anca Dragan Pieter Abbeel OffRL 42 93 0 04 Nov 2021
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning Sabela Ramos Sertan Girgin Léonard Hussenot Damien Vincent Hanna Yakubovich ... Piotr Stańczyk Raphaël Marinier Jeremiah Harmsen Olivier Pietquin Nikola Momchev OffRL 38 24 0 04 Nov 2021
On the Expressivity of Markov Reward David Abel Will Dabney Anna Harutyunyan Mark K. Ho Michael L. Littman Doina Precup Satinder Singh 29 82 0 01 Nov 2021
Collaborating with Humans without Human Data D. Strouse Kevin R. McKee M. Botvinick Edward Hughes Richard Everett 124 161 0 15 Oct 2021
A Framework for Learning to Request Rich and Contextually Useful Information from Humans Khanh Nguyen Yonatan Bisk Hal Daumé 49 16 0 14 Oct 2021
Correct Me if I am Wrong: Interactive Learning for Robotic Manipulation Eugenio Chisari Tim Welschehold Joschka Boedecker Wolfram Burgard Abhinav Valada 19 37 0 07 Oct 2021
Learning Multimodal Rewards from Rankings Vivek Myers Erdem Biyik Nima Anari Dorsa Sadigh OffRL 32 49 0 27 Sep 2021
Recursively Summarizing Books with Human Feedback Jeff Wu Long Ouyang Daniel M. Ziegler Nissan Stiennon Ryan J. Lowe Jan Leike Paul Christiano ALM 37 296 0 22 Sep 2021
Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems Subbarao Kambhampati S. Sreedharan Mudit Verma Yantian Zha L. Guan 52 47 0 21 Sep 2021
ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning Ryan Hoque Ashwin Balakrishna Ellen R. Novoseller Albert Wilcox Daniel S. Brown Ken Goldberg 35 84 0 17 Sep 2021
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning Tianhe Yu Aviral Kumar Yevgen Chebotar Karol Hausman Sergey Levine Chelsea Finn OffRL 35 77 0 16 Sep 2021
Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning Ning Wei Jiahua Liang Di Xie Shiliang Pu 25 0 0 06 Sep 2021
APReL: A Library for Active Preference-based Reward Learning Algorithms Erdem Biyik Aditi Talati Dorsa Sadigh 20 36 0 16 Aug 2021
Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback Xiaofei Wang Kimin Lee Kourosh Hakhamaneshi Pieter Abbeel Michael Laskin 34 42 0 11 Aug 2021
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior Noriyuki Kojima Alane Suhr Yoav Artzi 27 24 0 10 Aug 2021
Accelerating the Learning of TAMER with Counterfactual Explanations Jakob Karalus F. Lindner OffRL 29 4 0 03 Aug 2021
Differential-Critic GAN: Generating What You Want by a Cue of Preferences Yinghua Yao Yuangang Pan Ivor W. Tsang Xin Yao DiffM 28 0 0 14 Jul 2021
Recent Advances in Leveraging Human Guidance for Sequential Decision-Making Tasks Ruohan Zhang F. Torabi Garrett A. Warnell Peter Stone 83 28 0 13 Jul 2021