ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.01815
  4. Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement
  Learning Algorithm

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
ArXivPDFHTML

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 207 papers shown
Title
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
X. Huang
Weiwen Liu
Xingshan Zeng
Y. Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Y. Wang
R. Tang
Defu Lian
KELM
31
0
0
12 May 2025
Measuring General Intelligence with Generated Games
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLM
ELM
LM&MA
LRM
50
0
0
12 May 2025
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Zijian An
Lifeng Zhou
26
0
0
08 May 2025
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Z. Wang
J. Wang
Chi Ma
Huiling Zhen
M. Yuan
Jianye Hao
Defu Lian
Enhong Chen
Feng Wu
LRM
141
0
0
05 May 2025
Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken
Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken
Koki Inami
Masashi Konosu
Koki Yamane
Nozomu Masuya
Yunhan Li
Yu-Han Shu
Hiroshi Sato
Shinnosuke Homma
S. Sakaino
49
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
X. Li
Kwan-Yee Kenneth Wong
LLMAG
ReLM
LRM
86
0
0
27 Apr 2025
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Lynn Cherif
Flemming Kondrup
David Venuto
Ankit Anand
Doina Precup
Khimya Khetarpal
LM&Ro
49
0
0
24 Apr 2025
Improving Human-AI Coordination through Adversarial Training and Generative Models
Improving Human-AI Coordination through Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
64
0
0
21 Apr 2025
An Efficient Approach for Cooperative Multi-Agent Learning Problems
An Efficient Approach for Cooperative Multi-Agent Learning Problems
Ángel Aso-Mollar
Eva Onaindia
21
0
0
07 Apr 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
46
0
0
28 Mar 2025
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Minori Narita
Ryo Kuroiwa
J. Christopher Beck
42
0
0
20 Mar 2025
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
Pedro Sequeira
Vidyasagar Sadhu
Melinda Gervasio
DiffM
86
0
0
25 Feb 2025
Two-Player Zero-Sum Differential Games with One-Sided Information
Two-Player Zero-Sum Differential Games with One-Sided Information
Mukesh Ghimire
Z. Xu
Yi Ren
SyDa
95
0
0
17 Feb 2025
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi-An Ma
DiffM
91
23
0
17 Feb 2025
LLMs Can Teach Themselves to Better Predict the Future
LLMs Can Teach Themselves to Better Predict the Future
Benjamin Turtel
Danny Franklin
Philipp Schoenegger
LRM
57
0
0
07 Feb 2025
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Niccolò Grillo
Andrea Toccaceli
Joël Mathys
Benjamin Estermann
Stefania Fresca
Roger Wattenhofer
AI4CE
LRM
104
0
0
06 Feb 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
53
0
0
04 Feb 2025
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok
LRM
69
0
0
28 Jan 2025
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Jamie Lohoff
Emre Neftci
58
1
0
28 Jan 2025
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
Hazem Taha
Ameer M. S. Abdelhadi
35
1
0
22 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
K. M. Collins
Umang Bhatt
Ilia Sucholutsky
46
1
0
16 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
81
220
0
03 Jan 2025
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan
Yazhe Niu
Yuan Pu
Shuai Hu
Yu Liu
Jing Yang
65
0
0
03 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
68
6
0
03 Jan 2025
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Ke Xue
Yutong Wang
Cong Guan
Lei Yuan
Haobo Fu
Qiang Fu
Chao Qian
Yang Yu
42
16
0
03 Jan 2025
Predicting Chess Puzzle Difficulty with Transformers
Predicting Chess Puzzle Difficulty with Transformers
Szymon Miłosz
Paweł Kapusta
33
0
0
31 Dec 2024
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down
  Maps
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps
Linfeng Zhao
Lawson L. S. Wong
74
1
0
16 Dec 2024
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based
  Reinforcement Learning
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Jiayu Chen
Wentse Chen
Jeff Schneider
OffRL
31
1
0
15 Oct 2024
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
70
2
0
10 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Manling Li
Jindong Wang
Heng Ji
AI4MH
146
0
0
09 Oct 2024
Scalable Signal Temporal Logic Guided Reinforcement Learning via Value
  Function Space Optimization
Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization
Yiting He
Peiran Liu
Yiding Ji
OffRL
36
0
0
04 Aug 2024
Reinforcement Learning for Sustainable Energy: A Survey
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRL
AI4CE
40
1
0
26 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLM
ELM
LRM
34
1
0
12 Jul 2024
A Review of the Applications of Deep Learning-Based Emergent
  Communication
A Review of the Applications of Deep Learning-Based Emergent Communication
Brendon Boldt
David R. Mortensen
VLM
31
6
0
03 Jul 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
43
1
0
15 Jun 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence
Open-Endedness is Essential for Artificial Superhuman Intelligence
Edward Hughes
Michael Dennis
Jack Parker-Holder
Feryal M. P. Behbahani
Aditi Mavalankar
Yuge Shi
Tom Schaul
Tim Rocktaschel
LRM
34
18
0
06 Jun 2024
Counterfactual Explanations for Multivariate Time-Series without
  Training Datasets
Counterfactual Explanations for Multivariate Time-Series without Training Datasets
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
27
1
0
28 May 2024
Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based
  Method for Evaluating Chess Strategies from Textbooks
Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks
Haifa Alrdahi
R. Batista-Navarro
44
0
0
10 May 2024
A Survey on Self-Evolution of Large Language Models
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
54
22
0
22 Apr 2024
Cooperative Evolutionary Pressure and Diminishing Returns Might Explain
  the Fermi Paradox: On What Super-AIs Are Like
Cooperative Evolutionary Pressure and Diminishing Returns Might Explain the Fermi Paradox: On What Super-AIs Are Like
Daniel Vallstrom
24
0
0
01 Apr 2024
Understanding Iterative Combinatorial Auction Designs via Multi-Agent
  Reinforcement Learning
Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning
G. dÉon
N. Newman
Kevin Leyton-Brown
27
0
0
29 Feb 2024
Puzzle Solving using Reasoning of Large Language Models: A Survey
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ELM
ReLM
LRM
19
24
0
17 Feb 2024
Scaling Artificial Intelligence for Digital Wargaming in Support of
  Decision-Making
Scaling Artificial Intelligence for Digital Wargaming in Support of Decision-Making
Scotty Black
Christian J. Darken
14
2
0
08 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement
  Learning and Large Language Models
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
21
7
0
02 Feb 2024
Generalized Nested Rollout Policy Adaptation with Limited Repetitions
Generalized Nested Rollout Policy Adaptation with Limited Repetitions
Tristan Cazenave
8
3
0
18 Jan 2024
From Images to Connections: Can DQN with GNNs learn the Strategic Game
  of Hex?
From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?
Yannik Keller
Jannis Blüml
Gopika Sudhakaran
Kristian Kersting
GNN
19
0
0
22 Nov 2023
Runtime Verification of Learning Properties for Reinforcement Learning
  Algorithms
Runtime Verification of Learning Properties for Reinforcement Learning Algorithms
T. Mannucci
Julio de Oliveira Filho
OffRL
8
0
0
16 Nov 2023
Optimal Robotic Assembly Sequence Planning: A Sequential Decision-Making Approach
Optimal Robotic Assembly Sequence Planning: A Sequential Decision-Making Approach
Kartik Nagpal
Negar Mehr
25
0
0
26 Oct 2023
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player
  Zero-Sum Games
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
Yang Li
Kun Xiong
Yingping Zhang
Jiangcheng Zhu
Stephen Marcus McAleer
Wei Pan
J. Wang
Zonghong Dai
Yaodong Yang
31
2
0
09 Aug 2023
Towards General Game Representations: Decomposing Games Pixels into
  Content and Style
Towards General Game Representations: Decomposing Games Pixels into Content and Style
C. Trivedi
Konstantinos Makantasis
Antonios Liapis
Georgios N. Yannakakis
OCL
38
3
0
20 Jul 2023
12345
Next