ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.10163
  4. Cited By
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

28 September 2017
Garrett A. Warnell
Nicholas R. Waytowich
Vernon J. Lawhern
Peter Stone
ArXivPDFHTML

Papers citing "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces"

50 / 147 papers shown
Title
A Systematic Approach to Design Real-World Human-in-the-Loop Deep Reinforcement Learning: Salient Features, Challenges and Trade-offs
A Systematic Approach to Design Real-World Human-in-the-Loop Deep Reinforcement Learning: Salient Features, Challenges and Trade-offs
Jalal Arabneydi
Saiful Islam
Srijita Das
S. Gottipati
William Duguay
Clodéric Mars
Matthew E. Taylor
Matthew J. Guzdial
Antoine Fagette
Younes Zerouali
26
0
0
23 Apr 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang
Min-hwan Oh
OffRL
55
0
0
07 Mar 2025
High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
Jialong Xue
Wei Gao
Yu Wang
Chao Ji
Dongdong Zhao
Shi Yan
Shiwu Zhang
47
0
0
06 Mar 2025
Reducing Reward Dependence in RL Through Adaptive Confidence Discounting
Reducing Reward Dependence in RL Through Adaptive Confidence Discounting
Muhammed Yusuf Satici
David L. Roberts
OffRL
46
0
0
28 Feb 2025
Learning from Active Human Involvement through Proxy Value Propagation
Learning from Active Human Involvement through Proxy Value Propagation
Zhenghao Peng
Wenjie Mo
Chenda Duan
Quanyi Li
Bolei Zhou
109
14
0
05 Feb 2025
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques
Natalia Zhang
X. Wang
Qiwen Cui
Runlong Zhou
Sham Kakade
Simon S. Du
OffRL
61
0
0
10 Jan 2025
CREW: Facilitating Human-AI Teaming Research
CREW: Facilitating Human-AI Teaming Research
Lingyu Zhang
Zhengran Ji
Boyuan Chen
59
3
0
03 Jan 2025
Efficient Diversity-based Experience Replay for Deep Reinforcement Learning
Efficient Diversity-based Experience Replay for Deep Reinforcement Learning
Kaiyan Zhao
Yiming Wang
Yuyang Chen
Yan Li
Leong Hou U
Xiaoguang Niu
44
1
0
27 Oct 2024
GUIDE: Real-Time Human-Shaped Agents
GUIDE: Real-Time Human-Shaped Agents
Lingyu Zhang
Zhengran Ji
Nicholas R Waytowich
Boyuan Chen
42
2
0
19 Oct 2024
Incremental Learning for Robot Shared Autonomy
Incremental Learning for Robot Shared Autonomy
Yiran Tao
Guixiu Qiao
Dan Ding
Zackory Erickson
CLL
40
0
0
08 Oct 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
40
2
0
25 Sep 2024
CANDERE-COACH: Reinforcement Learning from Noisy Feedback
CANDERE-COACH: Reinforcement Learning from Noisy Feedback
Yuxuan Li
Srijita Das
Matthew E. Taylor
21
0
0
23 Sep 2024
Beyond Following: Mixing Active Initiative into Computational Creativity
Beyond Following: Mixing Active Initiative into Computational Creativity
Zhiyu Lin
Upol Ehsan
Rohan Agarwal
Samihan Dani
Vidushi Vashishth
Mark O. Riedl
46
0
0
06 Sep 2024
Bridging the gap between natural user expression with complex automation
  programming in smart homes
Bridging the gap between natural user expression with complex automation programming in smart homes
Yingtian Shi
Xiaoyi Liu
Chun Yu
Tianao Yang
Cheng Gao
Chen Liang
Yuanchun Shi
38
0
0
22 Aug 2024
Emotion-Agent: Unsupervised Deep Reinforcement Learning with
  Distribution-Prototype Reward for Continuous Emotional EEG Analysis
Emotion-Agent: Unsupervised Deep Reinforcement Learning with Distribution-Prototype Reward for Continuous Emotional EEG Analysis
Zhihao Zhou
Qile Liu
Jiyuan Wang
Zhen Liang
34
0
0
22 Aug 2024
Is user feedback always informative? Retrieval Latent Defending for
  Semi-Supervised Domain Adaptation without Source Data
Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
Junha Song
Tae Soo Kim
Junha Kim
Gunhee Nam
Thijs Kooi
Jaegul Choo
58
1
0
22 Jul 2024
How Much Progress Did I Make? An Unexplored Human Feedback Signal for
  Teaching Robots
How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots
Hang Yu
Qidi Fang
Shijie Fang
Reuben M. Aronson
E. Short
25
0
0
08 Jul 2024
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from
  Intervention
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
Yuxin Chen
Chen Tang
Chenran Li
Ran Tian
Peter Stone
Masayoshi Tomizuka
Wei Zhan
28
1
0
24 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
Calarina Muslimani
Bram Grooten
Deepak Ranganatha Sastry Mamillapalli
Mykola Pechenizkiy
Decebal Constantin Mocanu
Matthew E. Taylor
56
0
0
10 Jun 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
24
0
29 May 2024
A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning
  Systems: Survey and Taxonomy
A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy
Zhaoxing Li
30
2
0
16 May 2024
RLHF from Heterogeneous Feedback via Personalization and Preference
  Aggregation
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park
Mingyang Liu
Dingwen Kong
Kaiqing Zhang
Asuman Ozdaglar
52
30
0
30 Apr 2024
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Calarina Muslimani
Matthew E. Taylor
OffRL
51
2
0
30 Apr 2024
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement
  Learning via Hindsight Relabeling
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
Utsav Singh
Wesley A Suttle
Brian M Sadler
Vinay P. Namboodiri
Amrit Singh Bedi
36
4
0
20 Apr 2024
Dataset Reset Policy Optimization for RLHF
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
32
21
0
12 Apr 2024
Explainability in JupyterLab and Beyond: Interactive XAI Systems for
  Integrated and Collaborative Workflows
Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows
G. Guo
Dustin L. Arendt
Alex Endert
53
1
0
02 Apr 2024
Human Alignment of Large Language Models through Online Preference
  Optimisation
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello
Daniel Guo
Rémi Munos
Mark Rowland
Yunhao Tang
...
Michal Valko
Tianqi Liu
Rishabh Joshi
Zeyu Zheng
Bilal Piot
52
60
0
13 Mar 2024
Sample-Efficient Preference-based Reinforcement Learning with Dynamics
  Aware Rewards
Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards
Katherine Metcalf
Miguel Sarabia
Natalie Mackraz
B. Theobald
45
6
0
28 Feb 2024
Principled Preferential Bayesian Optimization
Principled Preferential Bayesian Optimization
Wenjie Xu
Wenbin Wang
Yuning Jiang
B. Svetozarevic
Colin N. Jones
35
6
0
08 Feb 2024
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement
  Learning with Diverse Human Feedback
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
Yifu Yuan
Jianye Hao
Yi Ma
Zibin Dong
Hebin Liang
Jinyi Liu
Zhixin Feng
Kai-Wen Zhao
Yan Zheng
OffRL
ALM
26
14
0
04 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
36
25
0
29 Jan 2024
Integrating Human Expertise in Continuous Spaces: A Novel Interactive
  Bayesian Optimization Framework with Preference Expected Improvement
Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement
Nikolaus Feith
Elmar Rueckert
37
1
0
23 Jan 2024
Human-AI Collaboration in Real-World Complex Environment with
  Reinforcement Learning
Human-AI Collaboration in Real-World Complex Environment with Reinforcement Learning
Md Saiful Islam
Srijita Das
S. Gottipati
William Duguay
Clodéric Mars
Jalal Arabneydi
Antoine Fagette
Matthew J. Guzdial
Matthew E. Taylor
41
1
0
23 Dec 2023
Explore 3D Dance Generation via Reward Model from Automatically-Ranked
  Demonstrations
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations
Zilin Wang
Hao-Wen Zhuang
Lu Li
Yinmin Zhang
Junjie Zhong
Jun Chen
Yu Yang
Boshi Tang
Zhiyong Wu
53
3
0
18 Dec 2023
A dynamical clipping approach with task feedback for Proximal Policy
  Optimization
A dynamical clipping approach with task feedback for Proximal Policy Optimization
Ziqi Zhang
Jingzehua Xu
Zifeng Zhuang
Jinxin Liu
Donglin Wang
Shuai Zhang
29
1
0
12 Dec 2023
A Review of Communicating Robot Learning during Human-Robot Interaction
A Review of Communicating Robot Learning during Human-Robot Interaction
Soheil Habibian
Antonio Alvarez Valdivia
Laura H. Blumenschein
Dylan P. Losey
34
6
0
01 Dec 2023
LLM Augmented Hierarchical Agents
LLM Augmented Hierarchical Agents
Bharat Prakash
Tim Oates
T. Mohsenin
24
4
0
09 Nov 2023
Accelerating Reinforcement Learning of Robotic Manipulations via
  Feedback from Large Language Models
Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models
Kun-Mo Chu
Xufeng Zhao
C. Weber
Mengdi Li
Stefan Wermter
LLMAG
LM&Ro
49
14
0
04 Nov 2023
COPR: Continual Learning Human Preference through Optimal Policy
  Regularization
COPR: Continual Learning Human Preference through Optimal Policy Regularization
Han Zhang
Lin Gui
Yuanzhao Zhai
Hui Wang
Yu Lei
Ruifeng Xu
CLL
51
0
0
24 Oct 2023
Bootstrapping Adaptive Human-Machine Interfaces with Offline
  Reinforcement Learning
Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning
Jensen Gao
S. Reddy
Glen Berseth
Anca Dragan
Sergey Levine
OffRL
33
0
0
07 Sep 2023
Iterative Reward Shaping using Human Feedback for Correcting Reward
  Misspecification
Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification
Jasmina Gajcin
J. McCarthy
Rahul Nair
Radu Marinescu
Elizabeth M. Daly
Ivana Dusparic
25
3
0
30 Aug 2023
RLHF-Blender: A Configurable Interactive Interface for Learning from
  Diverse Human Feedback
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
Yannick Metz
David Lindner
Raphael Baur
Daniel A. Keim
Mennatallah El-Assady
AI4CE
40
10
0
08 Aug 2023
Rating-based Reinforcement Learning
Rating-based Reinforcement Learning
Devin White
Mingkang Wu
Ellen R. Novoseller
Vernon J. Lawhern
Nicholas R. Waytowich
Yongcan Cao
ALM
24
6
0
30 Jul 2023
Primitive Skill-based Robot Learning from Human Evaluative Feedback
Primitive Skill-based Robot Learning from Human Evaluative Feedback
Ayano Hiranaka
Minjune Hwang
Sharon Lee
Chen Wang
Li Fei-Fei
Jiajun Wu
Ruohan Zhang
OffRL
13
11
0
28 Jul 2023
Breadcrumbs to the Goal: Goal-Conditioned Exploration from
  Human-in-the-Loop Feedback
Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback
M. Torné
Max Balsells
Zihan Wang
Samedh Desai
Tao Chen
Pulkit Agrawal
Abhishek Gupta
30
8
0
20 Jul 2023
STRAPPER: Preference-based Reinforcement Learning via Self-training
  Augmentation and Peer Regularization
STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization
Yachen Kang
Li He
Jinxin Liu
Zifeng Zhuang
Donglin Wang
41
0
0
19 Jul 2023
Opening up ChatGPT: Tracking openness, transparency, and accountability
  in instruction-tuned text generators
Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators
Andreas Liesenfeld
Alianda Lopez
Mark Dingemanse
ALM
26
86
0
08 Jul 2023
Preference Ranking Optimization for Human Alignment
Preference Ranking Optimization for Human Alignment
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
34
240
0
30 Jun 2023
Is RLHF More Difficult than Standard RL?
Is RLHF More Difficult than Standard RL?
Yuanhao Wang
Qinghua Liu
Chi Jin
OffRL
21
58
0
25 Jun 2023
123
Next