ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.01815
  4. Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement
  Learning Algorithm

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
ArXiv (abs)PDFHTML

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 839 papers shown
DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling
DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling
Hao Sun
Zile Qiao
Bo Wang
Guoxin Chen
Yingyan Hou
Yong Jiang
Pengjun Xie
Fei Huang
Yan Zhang
141
1
0
07 Sep 2025
SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning
SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning
Yuhao Zhang
Shaoming Duan
Jinhang Su
Chuanyi Liu
Peiyi Han
SyDa
195
0
0
04 Sep 2025
Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes
Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes
Isidoro Tamassia
Wendelin Böhmer
117
0
0
04 Sep 2025
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Davide Paglieri
Bartłomiej Cupiał
Jonathan Cook
Ulyana Piterbarg
Jens Tuyls
Edward Grefenstette
Jakob Foerster
Jack Parker-Holder
Tim Rocktaschel
LLMAG
238
1
0
03 Sep 2025
On Entropy Control in LLM-RL Algorithms
On Entropy Control in LLM-RL Algorithms
Han Shen
154
12
0
03 Sep 2025
Scalable Option Learning in High-Throughput Environments
Scalable Option Learning in High-Throughput Environments
Mikael Henaff
Scott Fujimoto
Michael Matthews
Michael Rabbat
OffRL
191
1
0
30 Aug 2025
Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
Haoze Wu
Cheng Wang
Wenshuo Zhao
Junxian He
OffRL
129
3
0
28 Aug 2025
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Yi Zhou
Qingshui Gu
Zhoufutu Wen
Ziniu Li
Tianshun Xing
...
Qian Liu
C. D. Lin
Jian Yang
G. Zhang
Wenhao Huang
LRM
139
26
0
24 Aug 2025
In2x at WMT25 Translation Task
In2x at WMT25 Translation Task
Lei Pang
Hanyi Mao
Quanjia Xiao
HaiXiao Liu
Xiangyi Li
116
0
0
20 Aug 2025
TOAST: Fast and scalable auto-partitioning based on principled static analysis
TOAST: Fast and scalable auto-partitioning based on principled static analysis
Sami Alabed
Dominik Grewe
Norman A. Rink
Masha Samsikova
Timur Sitdikov
Agnieszka Swietlik
Dimitrios Vytiniotis
Daniel Belov
127
0
0
20 Aug 2025
Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges
Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges
Changyuan Zhao
Guangyuan Liu
Ruichen Zhang
Yinqiu Liu
Jiacheng Wang
...
Shen
Zhu Han
Sumei Sun
Chau Yuen
Dong In Kim
204
5
0
13 Aug 2025
Evolutionary Optimization of Deep Learning Agents for Sparrow Mahjong
Evolutionary Optimization of Deep Learning Agents for Sparrow Mahjong
Jim O'Connor
Derin Gezgin
Gary B Parker
49
0
0
11 Aug 2025
Tail-Risk-Safe Monte Carlo Tree Search under PAC-Level Guarantees
Tail-Risk-Safe Monte Carlo Tree Search under PAC-Level Guarantees
Zuyuan Zhang
A. Ghosh
Tian-Shing Lan
130
0
0
07 Aug 2025
JSON-Bag: A generic game trajectory representation
JSON-Bag: A generic game trajectory representation
Dien Nguyen
Diego Perez-Liebana
Simon Lucas
40
0
0
01 Aug 2025
SimuRA: A World-Model-Driven Simulative Reasoning Architecture for General Goal-Oriented Agents
SimuRA: A World-Model-Driven Simulative Reasoning Architecture for General Goal-Oriented Agents
Mingkai Deng
Jinyu Hou
Yilin Shen
Hongxia Jin
LLMAGLM&RoLRM
267
2
0
31 Jul 2025
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Yihong Dong
Xue Jiang
Yongding Tao
Huanyu Liu
Kechi Zhang
...
Binhua Li
Zhi Jin
Fei Huang
Y. Li
Ge Li
LRM
366
17
0
31 Jul 2025
What Does it Mean for a Neural Network to Learn a "World Model"?
What Does it Mean for a Neural Network to Learn a "World Model"?
Kenneth Li
F. Viégas
Martin Wattenberg
NAI
117
1
0
29 Jul 2025
Learning to Imitate with Less: Efficient Individual Behavior Modeling in Chess
Learning to Imitate with Less: Efficient Individual Behavior Modeling in Chess
Zhenwei Tang
Difan Jiao
Eric Xue
Reid McIlroy-Young
Jon M. Kleinberg
S. Sen
Ashton Anderson
223
1
0
29 Jul 2025
Agentic Reinforced Policy Optimization
Agentic Reinforced Policy Optimization
Guanting Dong
Hangyu Mao
Kai Ma
Licheng Bao
Yifei Chen
...
Fuzheng Zhang
Guorui Zhou
Yutao Zhu
Ji-Rong Wen
Zhicheng Dou
LRM
209
41
0
26 Jul 2025
The Impact of Language Mixing on Bilingual LLM Reasoning
The Impact of Language Mixing on Bilingual LLM Reasoning
Yihao Li
Jiayi Xin
Miranda Muqing Miao
Qi Long
Lyle Ungar
LRM
267
4
0
21 Jul 2025
What if Othello-Playing Language Models Could See?
What if Othello-Playing Language Models Could See?
Xinyi Chen
Yifei Yuan
Jiaang Li
Serge J. Belongie
Maarten de Rijke
Anders Søgaard
LRM
153
0
0
19 Jul 2025
Critiques of World Models
Critiques of World Models
Eric P. Xing
Mingkai Deng
Jinyu Hou
Zhiting Hu
SyDa
222
6
0
07 Jul 2025
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
Dongyoon Hwang
Hojoon Lee
Jaegul Choo
D. Park
Jongho Park
ReLMOffRLLRM
201
1
0
01 Jul 2025
Style-Preserving Policy Optimization for Game Agents
Style-Preserving Policy Optimization for Game Agents
Lingfeng Li
Yunlong Lu
Yongyi Wang
Wenxin Li
LLMAG
262
0
0
20 Jun 2025
Data-Driven Policy Mapping for Safe RL-based Energy Management Systems
Data-Driven Policy Mapping for Safe RL-based Energy Management SystemsEnergy Reports (Energy Rep.), 2025
Theo Zangato
A. Osmani
Pegah Alizadeh
162
1
0
19 Jun 2025
Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents
Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents
Lingfeng Li
Yunlong Lu
Yongyi Wang
Qifan Zheng
Wenxin Li
LLMAG
165
0
0
17 Jun 2025
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Lowell Weissman
Michael Krumdick
A. Lynn Abbott
294
0
0
15 Jun 2025
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhenyu Hou
Ziniu Hu
Yujiang Li
Rui Lu
Jie Tang
Yuxiao Dong
OffRLLRM
192
22
0
13 Jun 2025
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Gaurav Chaudhary
Wassim Uddin Mondal
Laxmidhar Behera
OffRL
400
2
0
11 Jun 2025
Subgoal-Guided Policy Heuristic Search with Learned Subgoals
Subgoal-Guided Policy Heuristic Search with Learned SubgoalsInternational Conference on Machine Learning (ICML), 2025
Jake E. Tuero
M. Buro
Levi H. S. Lelis
176
0
0
08 Jun 2025
Boosting LLM Reasoning via Spontaneous Self-Correction
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao
Tengyu Xu
Xuewei Wang
Zhengxing Chen
Di Jin
...
Yun He
Sinong Wang
Han Fang
Sarath Chandar
Chen Zhu
ReLMLRMKELM
250
5
0
07 Jun 2025
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning
Zhen Hao Wong
Jingwen Deng
Runming He
Zirong Chen
Qijie You
Hejun Dong
Hao Liang
Chengyu Shen
Bin Cui
Wentao Zhang
ReLMLRM
302
0
0
05 Jun 2025
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Andre He
Daniel Fried
Sean Welleck
289
29
0
03 Jun 2025
Bregman Centroid Guided Cross-Entropy Method
Bregman Centroid Guided Cross-Entropy Method
Yuliang Gu
H. Cao
Marco Caccamo
N. Hovakimyan
218
0
0
02 Jun 2025
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Tian Qin
Core Francisco Park
Mujin Kwun
Aaron Walsman
Eran Malach
Nikhil Anand
Hidenori Tanaka
David Alvarez-Melis
ReLMOffRLLRM
207
4
0
28 May 2025
A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment
A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment
Brett Bissey
Kyle Gatesman
Walker Dimon
Mohammad Alam
Luis Robaina
Joseph Weissman
AAML
137
0
0
27 May 2025
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leander Diaz-Bone
Marco Bagatella
Jonas Hübotter
Andreas Krause
OffRL
307
4
0
26 May 2025
Large Language Models for Planning: A Comprehensive and Systematic Survey
Large Language Models for Planning: A Comprehensive and Systematic Survey
Pengfei Cao
Tianyi Men
Wencan Liu
Jingwen Zhang
Xuzhao Li
Xixun Lin
Dianbo Sui
Yanan Cao
Kang Liu
Jun Zhao
LLMAGLM&RoOffRLELMLRM
437
15
0
26 May 2025
VideoGameBench: Can Vision-Language Models complete popular video games?
VideoGameBench: Can Vision-Language Models complete popular video games?
Alex Zhang
Thomas Griffiths
Karthik Narasimhan
Ofir Press
VLM
404
10
0
23 May 2025
DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors
Tazeek Bin Abdur Rakib
Ambuj Mehrish
Lay-Ki Soon
Wern Han Lim
Soujanya Poria
OffRL
240
2
0
23 May 2025
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang
Jin Peng Zhou
Jonathan D. Chang
Zhaolin Gao
Nathan Kallus
Kianté Brantley
Wen Sun
LRM
352
7
0
23 May 2025
A Temporal Difference Method for Stochastic Continuous Dynamics
A Temporal Difference Method for Stochastic Continuous Dynamics
Haruki Settai
Naoya Takeishi
Takehisa Yairi
524
0
0
21 May 2025
SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning
SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning
Xiong Jun Wu
Zhenduo Zhang
ZuJie Wen
Zhiqiang Zhang
Wang Ren
...
Xudong Han
Chengfu Tang
Dingnan Jin
Qing Cui
Jun Zhou
LRM
584
2
0
20 May 2025
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
Yakun Zhu
Zhongzhen Huang
Linjie Mu
Yutong Huang
Wei Nie
Jiaji Liu
Shaoting Zhang
Pengfei Liu
Xiaofan Zhang
LM&MAELMLRM
653
6
0
20 May 2025
Cost-Awareness in Tree-Search LLM Planning: A Systematic Study
Cost-Awareness in Tree-Search LLM Planning: A Systematic Study
Zihao Zhang
Fei Liu
Kenan Jiang
Shijia Pan
Shu Kai
Fei Liu
266
1
0
20 May 2025
Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
Karina Zainullina
Alexander Golubev
Maria Trofimova
Sergei Polezhaev
Ibragim Badertdinov
...
Filipp Fisin
Sergei Skvortsov
Maksim Nekrashevich
Anton Shevtsov
Boris Yangel
240
3
0
19 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
486
8
0
18 May 2025
Enhancing Large Language Models with Reward-guided Tree Search for Knowledge Graph Question and Answering
Enhancing Large Language Models with Reward-guided Tree Search for Knowledge Graph Question and Answering
Xiao Long
Liansheng Zhuang
Chen Shen
Shaotian Yan
Yifei Li
Shafei Wang
RALMLRM
277
2
0
18 May 2025
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Ardian Selmonaj
Alessandro Antonucci
Adrian Schneider
Michael Rüegsegger
Matthias Sommer
309
1
0
16 May 2025
Measuring General Intelligence with Generated Games
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLMELMLM&MALRM
292
7
0
12 May 2025
Previous
12345...151617
Next