ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.01815
  4. Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement
  Learning Algorithm

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
ArXiv (abs)PDFHTML

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 839 papers shown
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRLAI4TSLRMReLMVLM
1.2K
5,342
0
22 Jan 2025
HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation
HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation
Hazem Taha
Ameer M. S. Abdelhadi
186
1
0
22 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Katherine M. Collins
Umang Bhatt
Ilia Sucholutsky
378
2
0
16 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Guang Dai
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRMSyDaReLM
352
244
0
08 Jan 2025
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Heterogeneous Multi-agent Zero-Shot Coordination by CoevolutionIEEE Transactions on Evolutionary Computation (TEVC), 2022
Ke Xue
Yutong Wang
Cong Guan
Lei Yuan
Haobo Fu
Qiang Fu
Chao Qian
Yang Yu
555
22
0
03 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALMLRM
928
570
0
03 Jan 2025
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan
Yazhe Niu
Yuan Pu
Shuai Hu
Yu Liu
Jing Yang
515
1
0
03 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs BetterNeural Information Processing Systems (NeurIPS), 2024
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
772
15
0
03 Jan 2025
Predicting Chess Puzzle Difficulty with Transformers
Predicting Chess Puzzle Difficulty with TransformersBigData Congress [Services Society] (BSS), 2024
Szymon Miłosz
Paweł Kapusta
159
5
0
31 Dec 2024
Training Software Engineering Agents and Verifiers with SWE-Gym
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan
Xingyao Wang
Graham Neubig
Navdeep Jaitly
Chenhui Xu
Alane Suhr
Yizhe Zhang
408
36
0
30 Dec 2024
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search
  Boosted Reasoning via Iterative Preference Learning
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Huchen Jiang
Yangyang Ma
Chaofan Ding
Kexin Luan
Xinhan Di
ReLMLRM
327
2
0
23 Dec 2024
Ensembling Large Language Models with Process Reward-Guided Tree Search
  for Better Complex Reasoning
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sungjin Park
Xiao Liu
Yeyun Gong
Edward Choi
LRM
275
26
0
20 Dec 2024
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Junyi Li
Hwee Tou Ng
LRM
412
4
0
19 Dec 2024
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
  Verification and Refinement
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Jinhao Jiang
Jiayi Chen
Junyi Li
Ruiyang Ren
Shijie Wang
Wayne Xin Zhao
Yang Song
Tao Zhang
LRM
258
31
0
17 Dec 2024
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down
  Maps
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps
Linfeng Zhao
Lawson L. S. Wong
341
2
0
16 Dec 2024
Monte Carlo Tree Search based Space Transfer for Black-box Optimization
Monte Carlo Tree Search based Space Transfer for Black-box OptimizationNeural Information Processing Systems (NeurIPS), 2024
Shukuan Wang
Ke Xue
Lei Song
Xiaobin Huang
Chao Qian
291
6
0
10 Dec 2024
Learning World Models for Unconstrained Goal Navigation
Learning World Models for Unconstrained Goal NavigationNeural Information Processing Systems (NeurIPS), 2024
Yuanlin Duan
Wensen Mao
He Zhu
231
5
0
03 Nov 2024
Reinforcement Learning Gradients as Vitamin for Online Finetuning
  Decision Transformers
Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision TransformersNeural Information Processing Systems (NeurIPS), 2024
Kai Yan
Alex Schwing
Yu-Xiong Wang
OffRLOnRL
242
4
0
31 Oct 2024
Enhancing Chess Reinforcement Learning with Graph Representation
Enhancing Chess Reinforcement Learning with Graph RepresentationNeural Information Processing Systems (NeurIPS), 2024
Tomas Rigaux
H. Kashima
GNN
164
2
0
31 Oct 2024
LLM Tree Search
LLM Tree Search
Dylan Wilson
114
2
0
24 Oct 2024
NodeOP: Optimizing Node Management for Decentralized Networks
NodeOP: Optimizing Node Management for Decentralized Networks
Angela Tsang
Jiankai Sun
Boo Xie
Azeem Khan
Ender Lu
Fletcher Fan
Maggie Wu
Jing Tang
88
0
0
22 Oct 2024
SoK: Dataset Copyright Auditing in Machine Learning Systems
SoK: Dataset Copyright Auditing in Machine Learning SystemsIEEE Symposium on Security and Privacy (S&P), 2024
L. Du
Xuanru Zhou
M. Chen
Chusong Zhang
Zhou Su
Peng Cheng
Jiming Chen
Zhikun Zhang
MLAU
406
15
0
22 Oct 2024
Memory-Efficient Large Language Models for Program Repair with Semantic-Guided Patch Generation
Memory-Efficient Large Language Models for Program Repair with Semantic-Guided Patch Generation
Thanh Le-Cong
Bach Le
Toby Murray
187
0
0
22 Oct 2024
Improve Value Estimation of Q Function and Reshape Reward with Monte
  Carlo Tree Search
Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search
Jiamian Li
184
0
0
15 Oct 2024
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Jiayu Chen
Wentse Chen
Shiyu Huang
Jeff Schneider
OffRL
427
8
0
15 Oct 2024
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
Bokai Hu
Sai Ashish Somayajula
Xin Pan
Zihan Huang
OffRL
395
5
0
14 Oct 2024
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Gap-Dependent Bounds for Q-Learning using Reference-Advantage DecompositionInternational Conference on Learning Representations (ICLR), 2024
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
382
9
0
10 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Pengfei Yu
Jindong Wang
Heng Ji
AI4MH
887
1
0
09 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Rui Wang
Pengfei Liu
VLM
330
137
0
08 Oct 2024
Human-aligned Chess with a Bit of Search
Human-aligned Chess with a Bit of Search
Yiming Zhang
Athul Paul Jacob
Vivian Lai
Daniel Fried
Daphne Ippolito
132
4
0
04 Oct 2024
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Seungyong Moon
Bumsoo Park
Hyun Oh Song
AIFinRALM
275
4
0
03 Oct 2024
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Zitian Gao
Boye Niu
Xuzheng He
Haotian Xu
Hongzhang Liu
Aiwei Liu
Xuming Hu
Lijie Wen
LRM
469
59
0
02 Oct 2024
Maia-2: A Unified Model for Human-AI Alignment in Chess
Maia-2: A Unified Model for Human-AI Alignment in ChessNeural Information Processing Systems (NeurIPS), 2024
Zhenwei Tang
Difan Jiao
Reid McIlroy-Young
Jon M. Kleinberg
Siddhartha Sen
Ashton Anderson
160
13
0
30 Sep 2024
Gaze-informed Signatures of Trust and Collaboration in Human-Autonomy
  Teams
Gaze-informed Signatures of Trust and Collaboration in Human-Autonomy TeamsComputers in Human Behavior (CHB), 2024
Anthony J. Ries
Stéphane Aroca-Ouellette
Alessandro Roncone
Ewart J. de Visser
123
4
0
27 Sep 2024
Refutation of Spectral Graph Theory Conjectures with Search Algorithms)
Refutation of Spectral Graph Theory Conjectures with Search Algorithms)
Milo Roucairol
Tristan Cazenave
91
4
0
27 Sep 2024
Towards Efficient Neuro-Symbolic AI: From Workload Characterization to
  Hardware Architecture
Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture
Zishen Wan
Che-Kai Liu
Hanchen Yang
Ritik Raj
Chaojian Li
...
Yingyan Celine Lin
Mohamed Ibrahim
Jan M. Rabaey
Tushar Krishna
A. Raychowdhury
330
18
0
20 Sep 2024
A Case Study of Web App Coding with OpenAI Reasoning Models
A Case Study of Web App Coding with OpenAI Reasoning Models
Yi Cui
ELMVLMLRM
150
0
0
19 Sep 2024
Planning Transformer: Long-Horizon Offline Reinforcement Learning with
  Planning Tokens
Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens
Joseph Clinton
Robert Lieck
OffRL
199
6
0
14 Sep 2024
State and Action Factorization in Power Grids
State and Action Factorization in Power Grids
Gianvito Losapio
Davide Beretta
Marco Mussi
Alberto Maria Metelli
Marcello Restelli
147
2
0
03 Sep 2024
Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus
  via Model-Based RL
Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL
Jihwan Lee
Woochang Sim
Sejin Kim
Sundong Kim
OffRL
199
2
0
27 Aug 2024
Localized Observation Abstraction Using Piecewise Linear Spatial Decay
  for Reinforcement Learning in Combat Simulations
Localized Observation Abstraction Using Piecewise Linear Spatial Decay for Reinforcement Learning in Combat Simulations
Scotty Black
Christian J. Darken
107
0
0
23 Aug 2024
Enhancing Reinforcement Learning Through Guided Search
Enhancing Reinforcement Learning Through Guided SearchEuropean Conference on Artificial Intelligence (ECAI), 2024
Jérôme Arjonilla
Abdallah Saffidine
Tristan Cazenave
OffRL
328
0
0
19 Aug 2024
ShortCircuit: AlphaZero-Driven Circuit Design
ShortCircuit: AlphaZero-Driven Circuit Design
Dimitrios Tsaras
Antoine Grosnit
Lei Chen
Zhiyao Xie
Haitham Bou-Ammar
Mingxuan Yuan
218
0
0
19 Aug 2024
Perfect Information Monte Carlo with Postponing Reasoning
Perfect Information Monte Carlo with Postponing Reasoning
Jérôme Arjonilla
Abdallah Saffidine
Tristan Cazenave
149
1
0
05 Aug 2024
A Value Function Space Approach for Hierarchical Planning with Signal Temporal Logic Tasks
A Value Function Space Approach for Hierarchical Planning with Signal Temporal Logic TasksIEEE Control Systems Letters (L-CSS), 2024
Peiran Liu
Yiting He
Yihao Qin
Hang Zhou
Yiding Ji
OffRL
277
0
0
04 Aug 2024
TASI Lectures on Physics for Machine Learning
TASI Lectures on Physics for Machine Learning
Jim Halverson
262
5
0
31 Jul 2024
Reinforcement Learning for Sustainable Energy: A Survey
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRLAI4CE
227
7
0
26 Jul 2024
Learning to Play Foosball: System and Baselines
Learning to Play Foosball: System and Baselines
Janosch Moos
Cedric Derstroff
Niklas Schröder
Debora Clever
174
1
0
23 Jul 2024
AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding
AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding
Chang Lei
Huan Lei
146
0
0
14 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLMELMLRM
414
1
0
12 Jul 2024
Previous
12345...151617
Next