Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017

David Silver

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 839 papers shown

DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling

141

07 Sep 2025

SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning

195

04 Sep 2025

Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes

Isidoro Tamassia

Wendelin Böhmer

117

04 Sep 2025

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

238

03 Sep 2025

On Entropy Control in LLM-RL Algorithms

Han Shen

154

03 Sep 2025

Scalable Option Learning in High-Throughput Environments

191

30 Aug 2025

Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions

129

28 Aug 2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

...

139

24 Aug 2025

In2x at WMT25 Translation Task

116

20 Aug 2025

TOAST: Fast and scalable auto-partitioning based on principled static analysis

127

20 Aug 2025

Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges

...

204

13 Aug 2025

Evolutionary Optimization of Deep Learning Agents for Sparrow Mahjong

Jim O'Connor

Derin Gezgin

Gary B Parker

11 Aug 2025

Tail-Risk-Safe Monte Carlo Tree Search under PAC-Level Guarantees

Zuyuan Zhang

A. Ghosh

Tian-Shing Lan

130

07 Aug 2025

JSON-Bag: A generic game trajectory representation

Dien Nguyen

Diego Perez-Liebana

Simon Lucas

01 Aug 2025

SimuRA: A World-Model-Driven Simulative Reasoning Architecture for General Goal-Oriented Agents

267

31 Jul 2025

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

...

366

31 Jul 2025

What Does it Mean for a Neural Network to Learn a "World Model"?

117

29 Jul 2025

Learning to Imitate with Less: Efficient Individual Behavior Modeling in Chess

223

29 Jul 2025

Agentic Reinforced Policy Optimization

...

209

26 Jul 2025

The Impact of Language Mixing on Bilingual LLM Reasoning

267

21 Jul 2025

What if Othello-Playing Language Models Could See?

153

19 Jul 2025

Critiques of World Models

222

07 Jul 2025

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

201

01 Jul 2025

Style-Preserving Policy Optimization for Game Agents

262

20 Jun 2025

Data-Driven Policy Mapping for Safe RL-based Energy Management SystemsEnergy Reports (Energy Rep.), 2025

Theo Zangato

A. Osmani

Pegah Alizadeh

162

19 Jun 2025

Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents

165

17 Jun 2025

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman

Michael Krumdick

A. Lynn Abbott

294

15 Jun 2025

TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

192

13 Jun 2025

MOORL: A Framework for Integrating Offline-Online Reinforcement Learning

400

11 Jun 2025

Subgoal-Guided Policy Heuristic Search with Learned SubgoalsInternational Conference on Machine Learning (ICML), 2025

Jake E. Tuero

M. Buro

Levi H. S. Lelis

176

08 Jun 2025

Boosting LLM Reasoning via Spontaneous Self-Correction

...

250

07 Jun 2025

LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning

302

05 Jun 2025

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening

Andre He

Daniel Fried

Sean Welleck

289

03 Jun 2025

Bregman Centroid Guided Cross-Entropy Method

218

02 Jun 2025

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?

207

28 May 2025

A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment

137

27 May 2025

DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning

307

26 May 2025

Large Language Models for Planning: A Comprehensive and Systematic Survey

LLMAG LM&Ro OffRL ELM LRM

437

26 May 2025

VideoGameBench: Can Vision-Language Models complete popular video games?

404

23 May 2025

DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors

Tazeek Bin Abdur Rakib

240

23 May 2025

Value-Guided Search for Efficient Chain-of-Thought Reasoning

352

23 May 2025

A Temporal Difference Method for Stochastic Continuous Dynamics

Haruki Settai

Naoya Takeishi

Takehisa Yairi

524

21 May 2025

SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning

...

584

20 May 2025

DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models

653

20 May 2025

Cost-Awareness in Tree-Search LLM Planning: A Systematic Study

Zihao Zhang

Fei Liu

Kenan Jiang

Shijia Pan

Shu Kai

Fei Liu

266

20 May 2025

Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents

...

240

19 May 2025

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

486

18 May 2025

Enhancing Large Language Models with Reward-guided Tree Search for Knowledge Graph Question and Answering

277

18 May 2025

Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics

309

16 May 2025

Measuring General Intelligence with Generated Games

292

12 May 2025