ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.08331
  4. Cited By
Accelerating Offline Reinforcement Learning Application in Real-Time
  Bidding and Recommendation: Potential Use of Simulation

Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation

17 September 2021
Haruka Kiyohara
K. Kawakami
Yuta Saito
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation"

13 / 13 papers shown
A Case for Leveraging Generative AI to Expand and Enhance Training in the Provision of Mental Health Services
A Case for Leveraging Generative AI to Expand and Enhance Training in the Provision of Mental Health Services
Hannah R. Lawrence
Shannon Wiltsey Stirman
Samuel Dorison
Taedong Yun
Megan Jones Bell
AI4MH
195
0
0
08 Oct 2025
Generative Auto-Bidding with Value-Guided Explorations
Generative Auto-Bidding with Value-Guided ExplorationsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Jingtong Gao
Yewen Li
Shuai Mao
Peng Jiang
Nan Jiang
...
Fei Pan
Peng Jiang
Kun Gai
Rui Hu
Xiangyu Zhao
OffRL
531
15
0
20 Apr 2025
AutoOPE: Automated Off-Policy Estimator Selection
AutoOPE: Automated Off-Policy Estimator Selection
Nicolò Felicioni
Michael Benigni
Maurizio Ferrari Dacrema
OffRL
212
2
0
26 Jun 2024
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning
  and How to Deal with It
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
Yuta Saito
Masahiro Nomura
OffRL
338
5
0
23 Apr 2024
Off-Policy Evaluation of Slate Bandit Policies via Optimizing
  Abstraction
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
Haruka Kiyohara
Masahiro Nomura
Yuta Saito
693
17
0
03 Feb 2024
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy
  Evaluation
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy EvaluationInternational Conference on Learning Representations (ICLR), 2023
Haruka Kiyohara
Ren Kishimoto
K. Kawakami
Ken Kobayashi
Kazuhide Nakata
Yuta Saito
OffRL
510
15
0
30 Nov 2023
SCOPE-RL: A Python Library for Offline Reinforcement Learning and
  Off-Policy Evaluation
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
Haruka Kiyohara
Ren Kishimoto
K. Kawakami
Ken Kobayashi
Kazuhide Nakata
Yuta Saito
OffRLELM
535
5
0
30 Nov 2023
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
Off-Policy Evaluation of Ranking Policies under Diverse User BehaviorKnowledge Discovery and Data Mining (KDD), 2023
Haruka Kiyohara
Masatoshi Uehara
Yusuke Narita
N. Shimizu
Yasuo Yamamoto
Yuta Saito
OffRLCML
328
13
0
26 Jun 2023
User Behavior Simulation with Large Language Model based Agents
User Behavior Simulation with Large Language Model based Agents
Lei Wang
Jingsen Zhang
Hao-ran Yang
Zhiyuan Chen
Jiakai Tang
...
Wayne Xin Zhao
Jun Xu
Zhicheng Dou
Jun Wang
Ji-Rong Wen
LM&RoLLMAG
465
151
0
05 Jun 2023
Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Policy-Adaptive Estimator Selection for Off-Policy EvaluationAAAI Conference on Artificial Intelligence (AAAI), 2022
Takuma Udagawa
Haruka Kiyohara
Yusuke Narita
Yuta Saito
Keisuke Tateno
OffRL
296
29
0
25 Nov 2022
Synthetic Data-Based Simulators for Recommender Systems: A Survey
Synthetic Data-Based Simulators for Recommender Systems: A Survey
Elizaveta Stavinova
A. Grigorievskiy
A. Volodkevich
P. Chunaev
Klavdiya Olegovna Bochenina
D. Bugaychenko
SyDa
204
9
0
22 Jun 2022
Doubly Robust Off-Policy Evaluation for Ranking Policies under the
  Cascade Behavior Model
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior ModelWeb Search and Data Mining (WSDM), 2022
Haruka Kiyohara
Yuta Saito
Tatsuya Matsuhiro
Yusuke Narita
N. Shimizu
Yasuo Yamamoto
OffRL
258
53
0
03 Feb 2022
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
733
92
0
17 Aug 2020
1
Page 1 of 1