Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.01815
Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"
50 / 839 papers shown
Guided Self-Evolving LLMs with Minimal Human Supervision
Wenhao Yu
Zhenwen Liang
Chengsong Huang
Kishan Panaganti
Tianqing Fang
Haitao Mi
Dong Yu
SyDa
ReLM
LRM
352
3
0
02 Dec 2025
On the Approximation of Phylogenetic Distance Functions by Artificial Neural Networks
Benjamin K. Rosenzweig
Matthew W. Hahn
56
0
0
01 Dec 2025
LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
Sai Kolasani
Maxim Saplin
Nicholas Crispino
Kyle Montgomery
Jared Quincy Davis
Matei A. Zaharia
Chi Wang
Chenguang Wang
ELM
LRM
155
1
0
01 Dec 2025
Breaking Algorithmic Collusion in Human-AI Ecosystems
Natalie Collina
Eshwar Ram Arunachaleswaran
Meena Jagadeesan
56
0
0
26 Nov 2025
Closed-Loop Transformers: Autoregressive Modeling as Iterative Latent Equilibrium
Akbar Anbar Jafari
G. Anbarjafari
67
1
0
26 Nov 2025
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
Yuanyuan Lin
Xiangyu Ouyang
Teng Zhang
Kaixin Sui
175
0
0
25 Nov 2025
UFO: Unfair-to-Fair Evolving Mitigates Unfairness in LLM-based Recommender Systems via Self-Play Fine-tuning
J. Zhang
Yuyuan Li
Xiaohua Feng
Zhifei Ren
Li Zhang
Chaochao Chen
90
0
0
23 Nov 2025
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
Yicong Zheng
Kevin L. McKee
Thomas Miconi
Zacharie Bugaud
Mick van Gelderen
Jed McCaleb
RALM
66
1
0
20 Nov 2025
Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environment
Jeffrey Q. Jiang
Kevin Hong
Emily Kuczynski
Gregory Pottie
56
0
0
19 Nov 2025
FLEX: Continuous Agent Evolution via Forward Learning from Experience
Zhicheng Cai
Xinyuan Guo
Yu Pei
Jiangtao Feng
Jiangjie Chen
Ya Zhang
Wei-Ying Ma
Mingxuan Wang
Hao Zhou
Hao Zhou
CLL
LLMAG
LRM
279
4
0
09 Nov 2025
Estimating cognitive biases with attention-aware inverse planning
Sounak Banerjee
Daphne Cornelisse
Deepak Gopinath
Emily S. Sumner
Jonathan A. DeCastro
Guy Rosman
Eugene Vinitsky
Mark K. Ho
73
1
0
29 Oct 2025
Exploring Human-AI Conceptual Alignment through the Prism of Chess
Semyon Lomasov
Judah Goldfeder
Mehmet Hamza Erol
Matthew So
Yao Yan
Addison Howard
Nathan Kutz
Ravid Shwartz-Ziv
120
0
0
29 Oct 2025
Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm
Robin Schmöcker
Alexander Dockhorn
Bodo Rosenhahn
81
1
0
29 Oct 2025
SPICE: Self-Play In Corpus Environments Improves Reasoning
Bo Liu
Chuanyang Jin
Seungone Kim
Weizhe Yuan
Wenting Zhao
Ilia Kulikov
Xian Li
Sainbayar Sukhbaatar
Jack Lanchantin
Jason Weston
ReLM
LRM
237
7
0
28 Oct 2025
Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms
Robin Schmöcker
Alexander Dockhorn
Bodo Rosenhahn
81
2
0
28 Oct 2025
ChessQA: Evaluating Large Language Models for Chess Understanding
Qianfeng Wen
Zhenwei Tang
Ashton Anderson
ELM
LRM
197
1
0
28 Oct 2025
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Yixing Chen
Yiding Wang
Siqi Zhu
Haofei Yu
Tao Feng
Muhan Zhang
M. Patwary
Jiaxuan You
LLMAG
LRM
295
5
0
27 Oct 2025
Top-Down Semantic Refinement for Image Captioning
Jusheng Zhang
Kaitong Cai
Jing Yang
Jian Wang
Chengpei Tang
Keze Wang
DiffM
MLLM
BDL
300
11
0
25 Oct 2025
Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics
Lorenzo Magnino
Kai Shao
Zida Wu
Jiacheng Shen
Mathieu Laurière
AI4CE
104
1
0
25 Oct 2025
Computational Hardness of Reinforcement Learning with Partial
q
π
q^π
q
π
-Realizability
Shayan Karimi
Xiaoqi Tan
153
0
0
24 Oct 2025
Out-of-distribution Tests Reveal Compositionality in Chess Transformers
Anna Mészáros
Patrik Reizinger
Ferenc Huszár
CoGe
171
0
0
23 Oct 2025
Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses
Wu Yichao
Wang Yirui
Ding Panpan
Wang Hailong
Zhu Bingqian
Liu Chun
AAML
145
2
0
23 Oct 2025
Can They Dixit? Yes they Can! Dixit as a Playground for Multimodal Language Model Capabilities
Nishant Balepur
Dang Nguyen
Dayeon Ki
136
0
0
22 Oct 2025
A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring
Julian Schulz
LRM
122
0
0
22 Oct 2025
Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
Yikai Zhang
Ye Rong
Siyu Yuan
Jiangjie Chen
Jian Xie
Yanghua Xiao
LLMAG
AAML
LRM
101
0
0
19 Oct 2025
Human-Allied Relational Reinforcement Learning
Fateme Golivand Darvishvand
Hikaru Shindo
Sahil Sidheekh
Kristian Kersting
S. Natarajan
OffRL
113
0
0
17 Oct 2025
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Prasanna Mayilvahanan
Ricardo Dominguez-Olmedo
Thaddäus Wiedemer
Wieland Brendel
OffRL
AIMat
ReLM
LRM
206
1
0
13 Oct 2025
KnowRL: Teaching Language Models to Know What They Know
Sahil Kale
Devendra Singh Dhami
KELM
104
0
0
13 Oct 2025
Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning
Hiroshi Nonaka
Simon Ambrozak
Sofia R. Miskala-Dinc
Amedeo Ercole
Aviva Prins
OffRL
97
0
0
13 Oct 2025
FORGE-Tree: Diffusion-Forcing Tree Search for Long-Horizon Robot Manipulation
Yanjia Huang
Shuo Liu
Sheng Liu
Qingxiao Xu
Mingyang Wu
Xiangbo Gao
Zhengzhong Tu
VGen
120
0
0
07 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
160
1
0
06 Oct 2025
AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning
Zhanke Zhou
Chentao Cao
Xiao Feng
Xuan Li
Zongze Li
...
Brando Miranda
Tongliang Liu
Sanmi Koyejo
Masashi Sugiyama
Bo Han
ReLM
LRM
117
0
0
05 Oct 2025
Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with Multiplicative Noise
Gabriel Diaz
Lucky Li
Wenhao Zhang
275
0
0
03 Oct 2025
LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits
Sanket Badhe
AILaw
161
1
0
03 Oct 2025
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
Ahmed Hendawy
Henrik Metternich
Théo Vincent
Mahdi Kallel
Jan Peters
Carlo DÉramo
OffRL
159
0
0
02 Oct 2025
Rethinking Thinking Tokens: LLMs as Improvement Operators
Lovish Madaan
Aniket Didolkar
Suchin Gururangan
John Quan
Ruan Silva
Ruslan Salakhutdinov
Manzil Zaheer
Sanjeev Arora
Anirudh Goyal
ReLM
LRM
191
1
1
01 Oct 2025
Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis
Kenjiro Ide
Taiga Someya
Kohei Kawaguchi
Keisuke Fujii
173
0
0
01 Oct 2025
Diffusion Alignment as Variational Expectation-Maximization
Jaewoo Lee
Minsu Kim
S. Choi
Inhyuck Song
Sujin Yun
Hyeongyu Kang
Woocheol Shin
Taeyoung Yun
Kiyoung Om
Jinkyoo Park
112
0
0
01 Oct 2025
Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
Y. Jiang
J. Huang
Yufeng Yuan
Xin Mao
Yu Yue
Qianchuan Zhao
Lin Yan
110
0
0
29 Sep 2025
From
f
(
x
)
f(x)
f
(
x
)
and
g
(
x
)
g(x)
g
(
x
)
to
f
(
g
(
x
)
)
f(g(x))
f
(
g
(
x
))
: LLMs Learn New Skills in RL by Composing Old Ones
L. Yuan
Weize Chen
Yuchen Zhang
Ganqu Cui
Hanbin Wang
Ziming You
Ning Ding
Zhiyuan Liu
Maosong Sun
Hao Peng
OffRL
CLL
233
1
0
29 Sep 2025
Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models
Hanlan Yang
Itamar Mishani
Luca Pivetti
Zachary Kingston
Maxim Likhachev
OffRL
LRM
68
0
0
29 Sep 2025
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Qinsi Wang
Bo Liu
Tianyi Zhou
Jing Shi
Yueqian Lin
Yiran Chen
Hai Helen Li
Kun Wan
Wentian Zhao
OffRL
VLM
LRM
141
5
0
29 Sep 2025
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
Zelin Tan
Hejia Geng
M. Zhang
Xiaohang Yu
Guancheng Wan
...
G. Zhang
Chen Zhang
Z. Yin
Wenlong Zhang
Lei Bai
OffRL
LRM
449
3
1
29 Sep 2025
Adversarial Diffusion for Robust Reinforcement Learning
Daniele Foffano
Alessio Russo
Alexandre Proutiere
159
1
0
28 Sep 2025
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Yoonjeon Kim
Doohyuk Jang
Eunho Yang
ReLM
AIFin
LRM
199
1
0
26 Sep 2025
Physics of Learning: A Lagrangian perspective to different learning paradigms
Siyuan Guo
Bernhard Schölkopf
92
0
0
25 Sep 2025
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Mohammad Beigi
Ying Shen
Parshin Shojaee
Qifan Wang
Zichao Wang
Chandan K. Reddy
Ming Jin
Lifu Huang
LRM
106
0
0
20 Sep 2025
TransZero: Parallel Tree Expansion in MuZero using Transformer Networks
Emil Malmsten
Wendelin Böhmer
95
0
0
14 Sep 2025
From Correction to Mastery: Reinforced Distillation of Large Language Model Agents
Yuanjie Lyu
Chengyu Wang
Jun Huang
Tong Xu
ALM
LRM
266
2
0
12 Sep 2025
One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
Yuan Pu
Yazhe Niu
Jia Tang
Junyu Xiong
Shuai Hu
Hongsheng Li
MoMe
182
0
0
09 Sep 2025
1
2
3
4
...
15
16
17
Next