Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017

David Silver

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 839 papers shown

Guided Self-Evolving LLMs with Minimal Human Supervision

353

02 Dec 2025

On the Approximation of Phylogenetic Distance Functions by Artificial Neural Networks

Benjamin K. Rosenzweig

Matthew W. Hahn

01 Dec 2025

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess

156

01 Dec 2025

Breaking Algorithmic Collusion in Human-AI Ecosystems

Natalie Collina

Eshwar Ram Arunachaleswaran

Meena Jagadeesan

26 Nov 2025

Closed-Loop Transformers: Autoregressive Modeling as Iterative Latent Equilibrium

Akbar Anbar Jafari

G. Anbarjafari

26 Nov 2025

RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation

177

25 Nov 2025

UFO: Unfair-to-Fair Evolving Mitigates Unfairness in LLM-based Recommender Systems via Self-Play Fine-tuning

100

23 Nov 2025

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

20 Nov 2025

Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environment

19 Nov 2025

FLEX: Continuous Agent Evolution via Forward Learning from Experience

287

09 Nov 2025

Estimating cognitive biases with attention-aware inverse planning

29 Oct 2025

Exploring Human-AI Conceptual Alignment through the Prism of Chess

131

29 Oct 2025

Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm

Robin Schmöcker

Alexander Dockhorn

Bodo Rosenhahn

29 Oct 2025

SPICE: Self-Play In Corpus Environments Improves Reasoning

237

28 Oct 2025

Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms

Robin Schmöcker

Alexander Dockhorn

Bodo Rosenhahn

28 Oct 2025

ChessQA: Evaluating Large Language Models for Chess Understanding

197

28 Oct 2025

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

295

27 Oct 2025

Top-Down Semantic Refinement for Image Captioning

302

25 Oct 2025

Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics

108

25 Oct 2025

Computational Hardness of Reinforcement Learning with Partial

q^π

-Realizability

Shayan Karimi

Xiaoqi Tan

159

24 Oct 2025

Out-of-distribution Tests Reveal Compositionality in Chess Transformers

171

23 Oct 2025

Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses

161

23 Oct 2025

Can They Dixit? Yes they Can! Dixit as a Playground for Multimodal Language Model Capabilities

Nishant Balepur

Dang Nguyen

Dayeon Ki

145

22 Oct 2025

A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

Julian Schulz

LRM

130

22 Oct 2025

Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games

107

19 Oct 2025

Human-Allied Relational Reinforcement Learning

Fateme Golivand Darvishvand

117

17 Oct 2025

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

Prasanna Mayilvahanan

Ricardo Dominguez-Olmedo

Thaddäus Wiedemer

Wieland Brendel

OffRL AIMat ReLM LRM

206

13 Oct 2025

KnowRL: Teaching Language Models to Know What They Know

Sahil Kale

Devendra Singh Dhami

KELM

112

13 Oct 2025

Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning

Hiroshi Nonaka

Simon Ambrozak

Sofia R. Miskala-Dinc

Amedeo Ercole

Aviva Prins

OffRL

13 Oct 2025

FORGE-Tree: Diffusion-Forcing Tree Search for Long-Horizon Robot Manipulation

133

07 Oct 2025

Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning

161

06 Oct 2025

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

...

117

05 Oct 2025

Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with Multiplicative Noise

Gabriel Diaz

Lucky Li

Wenhao Zhang

285

03 Oct 2025

LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits

Sanket Badhe

AILaw

164

03 Oct 2025

Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning

162

02 Oct 2025

Rethinking Thinking Tokens: LLMs as Improvement Operators

200

01 Oct 2025

Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis

184

01 Oct 2025

Diffusion Alignment as Variational Expectation-Maximization

112

01 Oct 2025

Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models

120

29 Sep 2025

From

f(x)

and

g(x)

f(g(x))

: LLMs Learn New Skills in RL by Composing Old Ones

239

29 Sep 2025

Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models

29 Sep 2025

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

142

29 Sep 2025

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

...

Lei Bai

458

29 Sep 2025

Adversarial Diffusion for Robust Reinforcement Learning

Daniele Foffano

Alessio Russo

Alexandre Proutiere

164

28 Sep 2025

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

202

26 Sep 2025

Physics of Learning: A Lagrangian perspective to different learning paradigms

Siyuan Guo

Bernhard Schölkopf

25 Sep 2025

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories

109

20 Sep 2025

TransZero: Parallel Tree Expansion in MuZero using Transformer Networks

Emil Malmsten

Wendelin Böhmer

14 Sep 2025

From Correction to Mastery: Reinforced Distillation of Large Language Model Agents

279

12 Sep 2025

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

184

09 Sep 2025