ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.14866
  4. Cited By
Learning from an Exploring Demonstrator: Optimal Reward Estimation for
  Bandits
v1v2 (latest)

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
28 June 2021
Wenshuo Guo
Kumar Krishna Agrawal
Aditya Grover
Vidya Muthukumar
A. Pananjady
ArXiv (abs)PDFHTML

Papers citing "Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits"

7 / 7 papers shown
Title
Eliciting Risk Aversion with Inverse Reinforcement Learning via
  Interactive Questioning
Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning
Ziteng Cheng
Anthony Coache
S. Jaimungal
134
1
0
16 Aug 2023
Diffusion Models for Black-Box Optimization
Diffusion Models for Black-Box OptimizationInternational Conference on Machine Learning (ICML), 2023
S. Krishnamoorthy
Satvik Mashkaria
Aditya Grover
DiffM
316
81
0
12 Jun 2023
MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning
MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning
Arundhati Banerjee
Soham R. Phade
Stefano Ermon
Stephan Zheng
OffRL
90
1
0
10 Apr 2023
Reliable Conditioning of Behavioral Cloning for Offline Reinforcement
  Learning
Reliable Conditioning of Behavioral Cloning for Offline Reinforcement Learning
Tung Nguyen
Qinqing Zheng
Aditya Grover
OffRL
265
7
0
11 Oct 2022
Generative Pretraining for Black-Box Optimization
Generative Pretraining for Black-Box OptimizationInternational Conference on Machine Learning (ICML), 2022
S. Krishnamoorthy
Satvik Mashkaria
Aditya Grover
OffRLAI4CE
430
36
0
22 Jun 2022
Integrating Reward Maximization and Population Estimation: Sequential
  Decision-Making for Internal Revenue Service Audit Selection
Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit SelectionAAAI Conference on Artificial Intelligence (AAAI), 2022
Peter Henderson
Ben Chugg
Brandon R. Anderson
Kristen M. Altenburger
Alex Turk
J. Guyton
Jacob Goldin
James Grimmelmann
OffRL
153
10
0
25 Apr 2022
Inverse Contextual Bandits: Learning How Behavior Evolves over Time
Inverse Contextual Bandits: Learning How Behavior Evolves over Time
Alihan Huyuk
Daniel Jarrett
M. Schaar
CMLOffRL
203
11
0
13 Jul 2021
1