v1v2 (latest)

Interactive Learning from Policy-Dependent Human Feedback

International Conference on Machine Learning (ICML), 2017

21 January 2017

Michael L. Littman

Papers citing "Interactive Learning from Policy-Dependent Human Feedback"

50 / 189 papers shown

QuickLAP: Quick Language-Action Preference Learning for Autonomous Driving Agents

177

22 Nov 2025

Reinforcement Learning from Implicit Neural Feedback for Human-Aligned Robot Control

Suzie Kim

OffRL

18 Nov 2025

Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment

187

02 Nov 2025

Optimistic Task Inference for Behavior Foundation Models

152

23 Oct 2025

TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes

112

01 Oct 2025

Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

153

12 Aug 2025

Beyond Ordinal Preferences: Why Alignment Needs Cardinal Human Feedback

Parker Whitfill

Stewy Slocum

ALM

102

11 Aug 2025

Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Learning

Zhengran Ji

Boyuan Chen

271

10 Aug 2025

Can you see how I learn? Human observers' inferences about Reinforcement Learning agents' learning processesAdaptive Agents and Multi-Agent Systems (AAMAS), 2025

206

16 Jun 2025

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

236

15 Jun 2025

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

333

03 Jun 2025

Interactive Imitation Learning for Dexterous Robotic Manipulation: Challenges and Perspectives -- A Survey

Edgar Welte

Rania Rayyes

542

30 May 2025

Reinforcement Learning from Multi-level and Episodic Human FeedbackConference on Learning for Dynamics & Control (L4DC), 2025

Muhammad Qasim Elahi

Somtochukwu Oguchienti

Maheed H. Ahmed

Mahsa Ghasemi

OffRL

591

20 Apr 2025

Safe Explicable Policy Search

Akkamahadevi Hanni

Jonathan Montaño

Yu Zhang

398

10 Mar 2025

Adversarial Policy Optimization for Offline Preference-based Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2025

Hyungkyu Kang

Min-hwan Oh

OffRL

442

07 Mar 2025

High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects

416

06 Mar 2025

Uncertainty Comes for Free: Human-in-the-Loop Policies with Diffusion Models

Zhanpeng He

Yifeng Cao

M. Ciocarlie

763

26 Feb 2025

Extracting and Understanding the Superficial Knowledge in AlignmentNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

396

07 Feb 2025

CTR-Driven Advertising Image Generation with Multimodal Large Language ModelsThe Web Conference (WWW), 2025

...

384

05 Feb 2025

Learning from Active Human Involvement through Proxy Value PropagationNeural Information Processing Systems (NeurIPS), 2025

490

05 Feb 2025

A Comprehensive Survey of Foundation Models in MedicineIEEE Reviews in Biomedical Engineering (RBME), 2024

855

17 Jan 2025

Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models

Maryam Shoaeinaeini

Brent Harrison

153

15 Nov 2024

Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

Francisco Cruz

411

31 Oct 2024

Prosody as a Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic ImplicationsInternational Conference on Multimodal Interaction (ICMI), 2024

179

31 Oct 2024

Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation

Tomas Bueno Momcilovic

Beat Buesser

Giulio Zizzo

Mark Purcell

Tomas Bueno Momcilovic

AAML

160

10 Oct 2024

Incremental Learning for Robot Shared Autonomy

489

08 Oct 2024

Dynamic Policy Fusion for User Alignment Without Re-Interaction

Ajsal Shereef Palattuparambil

Thommen George Karimpanal

Santu Rana

468

30 Sep 2024

Align

^2

LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

...

Juncheng Li

Hao Jiang

Haoyuan Li

Yueting Zhuang

MLLM ALM

133

27 Sep 2024

CANDERE-COACH: Reinforcement Learning from Noisy Feedback

Yuxuan Li

Srijita Das

Matthew E. Taylor

231

23 Sep 2024

Rater Cohesion and Quality from a Vicarious PerspectiveConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Deepak Pandita

Tharindu Cyril Weerasooriya

Sujan Dutta

Sarah K. K. Luger

Tharindu Ranasinghe

Ashiqur R. KhudaBukhsh

Marcos Zampieri

Christopher M. Homan

242

15 Aug 2024

An Introduction to Reinforcement Learning: Fundamental Concepts and Practical Applications

286

13 Aug 2024

Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data

389

22 Jul 2024

Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives

D. Hagos

Rick Battle

Danda B. Rawat

LM&MA OffRL

570

102

20 Jul 2024

A Comparative Analysis of Interactive Reinforcement Learning Algorithms in Warehouse Robot Grid Based Environment

Arunabh Bora

OffRL

144

16 Jul 2024

Three Dogmas of Reinforcement Learning

David Abel

Mark K. Ho

Anna Harutyunyan

425

15 Jul 2024

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

472

24 Jun 2024

Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

461

11 Jun 2024

Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic SparsityAdaptive Agents and Multi-Agent Systems (AAMAS), 2024

Calarina Muslimani

Bram Grooten

Deepak Ranganatha Sastry Mamillapalli

Mykola Pechenizkiy

Decebal Constantin Mocanu

Matthew E. Taylor

451

10 Jun 2024

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

355

03 Jun 2024

Transfer Q Star: Principled Decoding for LLM Alignment

Ming Yin

Mengdi Wang

Furong Huang

365

30 May 2024

A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback

Asuman Ozdaglar

358

20 May 2024

A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy

Zhaoxing Li

230

16 May 2024

Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)

Nicolas Drapier

Aladine Chetouani

A. Chateigner

167

15 May 2024

Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

Calarina Muslimani

Matthew E. Taylor

OffRL

538

30 Apr 2024

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

365

23 Apr 2024

Dataset Reset Policy Optimization for RLHF

540

12 Apr 2024

Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows

G. Guo

Dustin L. Arendt

Alex Endert

300

02 Apr 2024

Learning to Watermark LLM-generated Text via Reinforcement Learning

Xiaojun Xu

Yuanshun Yao

Yang Liu

384

13 Mar 2024

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosEuropean Conference on Computer Vision (ECCV), 2024

489

07 Mar 2024

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Andi Nika

Debmalya Mandal

Parameswaran Kamalaruban

Georgios Tzannetos

Goran Radanović

Adish Singla

222

04 Mar 2024