ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.05407
  4. Cited By
On-line Policy Improvement using Monte-Carlo Search

On-line Policy Improvement using Monte-Carlo Search

9 January 2025
Gerald Tesauro
Gregory R. Galperin
ArXivPDFHTML

Papers citing "On-line Policy Improvement using Monte-Carlo Search"

50 / 52 papers shown
Title
A Survey on Self-play Methods in Reinforcement Learning
A Survey on Self-play Methods in Reinforcement Learning
Ruize Zhang
Zelai Xu
Chengdong Ma
Chao Yu
Weijuan Tu
...
Deheng Ye
Wenbo Ding
Yaodong Yang
Yu Wang
Yu Wang
SyDa
SSL
OnRL
51
8
0
02 Aug 2024
Model Predictive Control and Reinforcement Learning: A Unified Framework
  Based on Dynamic Programming
Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming
Dimitri Bertsekas
41
6
0
02 Jun 2024
An Approximate Dynamic Programming Framework for Occlusion-Robust
  Multi-Object Tracking
An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking
Pratyusha Musunuru
Yuchao Li
Jamison Weber
Dimitri P. Bertsekas
43
0
0
24 May 2024
Graph Reinforcement Learning for Combinatorial Optimization: A Survey
  and Unifying Perspective
Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective
Victor-Alexandru Darvariu
Stephen Hailes
Mirco Musolesi
AI4CE
50
6
0
09 Apr 2024
Tree Search in DAG Space with Model-based Reinforcement Learning for
  Causal Discovery
Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery
Victor-Alexandru Darvariu
Stephen Hailes
Mirco Musolesi
CML
46
2
0
20 Oct 2023
Iterative Option Discovery for Planning, by Planning
Iterative Option Discovery for Planning, by Planning
Kenny Young
Richard S. Sutton
25
2
0
02 Oct 2023
Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement
  Learning
Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
Hongyu Ding
Yuan-Yan Tang
Qing Wu
Bo Wang
Chunlin Chen
Zhi Wang
37
4
0
16 Jul 2023
The Update-Equivalence Framework for Decision-Time Planning
The Update-Equivalence Framework for Decision-Time Planning
Samuel Sokota
Gabriele Farina
David J. Wu
Hengyuan Hu
Kevin A. Wang
J. Zico Kolter
Noam Brown
30
3
0
25 Apr 2023
A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum
  Markov Games
A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games
Anna Winnicki
R. Srikant
34
1
0
17 Mar 2023
Multiagent Rollout with Reshuffling for Warehouse Robots Path Planning
Multiagent Rollout with Reshuffling for Warehouse Robots Path Planning
William Emanuelsson
Alejandro Penacho Riveiros
Yuchao Li
Karl H. Johansson
Jonas Mårtensson
22
1
0
15 Nov 2022
Nested Search versus Limited Discrepancy Search
Nested Search versus Limited Discrepancy Search
Tristan Cazenave
32
0
0
01 Oct 2022
Regret Analysis for Hierarchical Experts Bandit Problem
Regret Analysis for Hierarchical Experts Bandit Problem
Qihan Guo
Siwei Wang
Jun Zhu
24
0
0
11 Aug 2022
A Survey on Model-based Reinforcement Learning
A Survey on Model-based Reinforcement Learning
Fan Luo
Tian Xu
Hang Lai
Xiong-Hui Chen
Weinan Zhang
Yang Yu
OffRL
LRM
44
101
0
19 Jun 2022
Learning from Drivers to Tackle the Amazon Last Mile Routing Research
  Challenge
Learning from Drivers to Tackle the Amazon Last Mile Routing Research Challenge
Chen Wu
Yin Song
Verdi March
Eden Duthie
32
7
0
09 May 2022
Symphony: Learning Realistic and Diverse Agents for Autonomous Driving
  Simulation
Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation
Maximilian Igl
Daewoo Kim
Alex Kuefler
Paul Mougin
Punit Shah
K. Shiarlis
Drago Anguelov
Mark Palatucci
Brandyn White
Shimon Whiteson
35
64
0
06 May 2022
A Dynamic Programming Algorithm for Finding an Optimal Sequence of
  Informative Measurements
A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements
P. Loxley
Ka Wai Cheung
23
3
0
24 Sep 2021
Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control
Dimitri Bertsekas
AI4CE
50
55
0
20 Aug 2021
Model-Based Opponent Modeling
Model-Based Opponent Modeling
Xiaopeng Yu
Jiechuan Jiang
Wanpeng Zhang
Haobin Jiang
Zongqing Lu
OffRL
27
28
0
04 Aug 2021
Train on Small, Play the Large: Scaling Up Board Games with AlphaZero
  and GNN
Train on Small, Play the Large: Scaling Up Board Games with AlphaZero and GNN
Shai Ben-Assayag
Ran El-Yaniv
GNN
27
9
0
18 Jul 2021
Leveraging Tripartite Interaction Information from Live Stream
  E-Commerce for Improving Product Recommendation
Leveraging Tripartite Interaction Information from Live Stream E-Commerce for Improving Product Recommendation
Sanshi Lei Yu
Zhuoxuan Jiang
Dongdong Chen
Shanshan Feng
Dongsheng Li
Qi Liu
Jinfeng Yi
38
20
0
07 Jun 2021
Annotating Motion Primitives for Simplifying Action Search in
  Reinforcement Learning
Annotating Motion Primitives for Simplifying Action Search in Reinforcement Learning
I. Sledge
Darshan W. Bryner
José C. Príncipe
20
1
0
24 Feb 2021
Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User
  Behavior
Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior
R. Meshram
Kesav Kaza
OffRL
19
1
0
08 Feb 2021
Deep Controlled Learning for Inventory Control
Deep Controlled Learning for Inventory Control
Tarkan Temizoz
Christina Imdahl
R. Dijkman
Douniel Lamghari-Idrissi
W. Jaarsveld
24
8
0
30 Nov 2020
On the role of planning in model-based deep reinforcement learning
On the role of planning in model-based deep reinforcement learning
Jessica B. Hamrick
A. Friesen
Feryal M. P. Behbahani
A. Guez
Fabio Viola
Sims Witherspoon
Thomas W. Anthony
Lars Buesing
Petar Velickovic
T. Weber
OffRL
19
65
0
08 Nov 2020
Lifelong Incremental Reinforcement Learning with Online Bayesian
  Inference
Lifelong Incremental Reinforcement Learning with Online Bayesian Inference
Zhi Wang
Chunlin Chen
D. Dong
CLL
OffRL
12
56
0
28 Jul 2020
Simulation Based Algorithms for Markov Decision Processes and
  Multi-Action Restless Bandits
Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits
R. Meshram
Kesav Kaza
22
10
0
25 Jul 2020
Model-based Reinforcement Learning: A Survey
Model-based Reinforcement Learning: A Survey
Thomas M. Moerland
Joost Broekens
Aske Plaat
Catholijn M. Jonker
OffRL
25
47
0
30 Jun 2020
A Unifying Framework for Reinforcement Learning and Planning
A Unifying Framework for Reinforcement Learning and Planning
Thomas M. Moerland
Joost Broekens
Aske Plaat
Catholijn M. Jonker
OffRL
27
9
0
26 Jun 2020
Continuous Control for Searching and Planning with a Learned Model
Continuous Control for Searching and Planning with a Learned Model
Xuxi Yang
Werner Duvaud
Peng Wei
16
5
0
12 Jun 2020
Review, Analysis and Design of a Comprehensive Deep Reinforcement
  Learning Framework
Review, Analysis and Design of a Comprehensive Deep Reinforcement Learning Framework
Ngoc Duy Nguyen
Thanh Thi Nguyen
Hai V. Nguyen
Doug Creighton
S. Nahavandi
27
3
0
27 Feb 2020
Constrained Multiagent Rollout and Multidimensional Assignment with the
  Auction Algorithm
Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm
Dimitri Bertsekas
22
11
0
18 Feb 2020
Reinforcement Learning for POMDP: Partitioned Rollout and Policy
  Iteration with Application to Autonomous Sequential Repair Problems
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems
Sushmita Bhattacharya
Sahil Badyal
Thomas Wheeler
Stephanie Gil
Dimitri Bertsekas
27
33
0
11 Feb 2020
The Choice Function Framework for Online Policy Improvement
The Choice Function Framework for Online Policy Improvement
Murugeswari Issakkimuthu
Alan Fern
Prasad Tadepalli
OffRL
17
1
0
01 Oct 2019
Policy Gradient Search: Online Planning and Expert Iteration without
  Search Trees
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
Thomas W. Anthony
Robert Nishihara
Philipp Moritz
Tim Salimans
John Schulman
17
30
0
07 Apr 2019
Learn a Prior for RHEA for Better Online Planning
Learn a Prior for RHEA for Better Online Planning
Xinyao Tong
W. Liu
Bin Li
OffRL
26
0
0
14 Feb 2019
Learning 6-DoF Grasping and Pick-Place Using Attention Focus
Learning 6-DoF Grasping and Pick-Place Using Attention Focus
Marcus Gualtieri
Robert W. Platt
14
56
0
15 Jun 2018
Multiple-Step Greedy Policies in Online and Approximate Reinforcement
  Learning
Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning
Yonathan Efroni
Gal Dalal
B. Scherrer
Shie Mannor
OffRL
17
14
0
21 May 2018
Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and
  Some New Implementations
Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations
Dimitri Bertsekas
OffRL
33
131
0
12 Apr 2018
Beyond the One Step Greedy Approach in Reinforcement Learning
Beyond the One Step Greedy Approach in Reinforcement Learning
Yonathan Efroni
Gal Dalal
B. Scherrer
Shie Mannor
OffRL
50
48
0
10 Feb 2018
Learning the Reward Function for a Misspecified Model
Learning the Reward Function for a Misspecified Model
Erik Talvitie
22
10
0
29 Jan 2018
A Survey on Compiler Autotuning using Machine Learning
A Survey on Compiler Autotuning using Machine Learning
Amir H. Ashouri
W. Killian
John Cavazos
G. Palermo
Cristina Silvano
35
199
0
13 Jan 2018
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
T. Weber
S. Racanière
David P. Reichert
Lars Buesing
A. Guez
...
Razvan Pascanu
Peter W. Battaglia
Demis Hassabis
David Silver
Daan Wierstra
LM&Ro
51
549
0
19 Jul 2017
Multi-Labelled Value Networks for Computer Go
Multi-Labelled Value Networks for Computer Go
Ti-Rong Wu
I-Chen Wu
Guan-Wun Chen
Ting Han Wei
Tung-Yi Lai
Hung-Chun Wu
Li-Cheng Lan
36
22
0
30 May 2017
Self-Correcting Models for Model-Based Reinforcement Learning
Self-Correcting Models for Model-Based Reinforcement Learning
Erik Talvitie
LRM
29
92
0
19 Dec 2016
Approximate Policy Iteration for Budgeted Semantic Video Segmentation
Approximate Policy Iteration for Budgeted Semantic Video Segmentation
Behrooz Mahasseni
S. Todorovic
Alan Fern
22
4
0
26 Jul 2016
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer
  Policies
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies
Francesco Riccio
Roberto Capobianco
Daniele Nardi
29
4
0
01 Jun 2016
Classification-based Approximate Policy Iteration: Experiments and
  Extended Discussions
Classification-based Approximate Policy Iteration: Experiments and Extended Discussions
Amir-massoud Farahmand
Doina Precup
André Barreto
Mohammad Ghavamzadeh
OffRL
50
7
0
02 Jul 2014
Analysis of Watson's Strategies for Playing Jeopardy!
Analysis of Watson's Strategies for Playing Jeopardy!
Gerald Tesauro
David Gondek
J. Lenchner
James Fan
J. Prager
47
34
0
04 Feb 2014
Learning to Win by Reading Manuals in a Monte-Carlo Framework
Learning to Win by Reading Manuals in a Monte-Carlo Framework
S. Branavan
David Silver
Regina Barzilay
49
190
0
18 Jan 2014
Monte Carlo Search Algorithm Discovery for One Player Games
Monte Carlo Search Algorithm Discovery for One Player Games
Francis Maes
D. St-Pierre
D. Ernst
65
3
0
23 Aug 2012
12
Next