Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.01815
Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"
50 / 201 papers shown
Title
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
X. Huang
Weiwen Liu
Xingshan Zeng
Y. Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Y. Wang
R. Tang
Defu Lian
KELM
31
0
0
12 May 2025
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLM
ELM
LM&MA
LRM
50
0
0
12 May 2025
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Zijian An
Lifeng Zhou
26
0
0
08 May 2025
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Z. Wang
J. Wang
Chi Ma
Huiling Zhen
M. Yuan
Jianye Hao
Defu Lian
Enhong Chen
Feng Wu
LRM
138
0
0
05 May 2025
Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken
Koki Inami
Masashi Konosu
Koki Yamane
Nozomu Masuya
Yunhan Li
Yu-Han Shu
Hiroshi Sato
Shinnosuke Homma
S. Sakaino
49
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
X. Li
Kwan-Yee Kenneth Wong
LLMAG
ReLM
LRM
86
0
0
27 Apr 2025
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Lynn Cherif
Flemming Kondrup
David Venuto
Ankit Anand
Doina Precup
Khimya Khetarpal
LM&Ro
49
0
0
24 Apr 2025
Improving Human-AI Coordination through Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
64
0
0
21 Apr 2025
An Efficient Approach for Cooperative Multi-Agent Learning Problems
Ángel Aso-Mollar
Eva Onaindia
21
0
0
07 Apr 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
46
0
0
28 Mar 2025
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Minori Narita
Ryo Kuroiwa
J. Christopher Beck
42
0
0
20 Mar 2025
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
Pedro Sequeira
Vidyasagar Sadhu
Melinda Gervasio
DiffM
84
0
0
25 Feb 2025
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi-An Ma
DiffM
91
23
0
17 Feb 2025
Two-Player Zero-Sum Differential Games with One-Sided Information
Mukesh Ghimire
Z. Xu
Yi Ren
SyDa
95
0
0
17 Feb 2025
LLMs Can Teach Themselves to Better Predict the Future
Benjamin Turtel
Danny Franklin
Philipp Schoenegger
LRM
57
0
0
07 Feb 2025
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Niccolò Grillo
Andrea Toccaceli
Joël Mathys
Benjamin Estermann
Stefania Fresca
Roger Wattenhofer
AI4CE
LRM
104
0
0
06 Feb 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
53
0
0
04 Feb 2025
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok
LRM
69
0
0
28 Jan 2025
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Jamie Lohoff
Emre Neftci
55
1
0
28 Jan 2025
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
Hazem Taha
Ameer M. S. Abdelhadi
35
1
0
22 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
K. M. Collins
Umang Bhatt
Ilia Sucholutsky
46
1
0
16 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
68
6
0
03 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
78
220
0
03 Jan 2025
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan
Yazhe Niu
Yuan Pu
Shuai Hu
Yu Liu
Jing Yang
65
0
0
03 Jan 2025
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Ke Xue
Yutong Wang
Cong Guan
Lei Yuan
Haobo Fu
Qiang Fu
Chao Qian
Yang Yu
42
16
0
03 Jan 2025
Predicting Chess Puzzle Difficulty with Transformers
Szymon Miłosz
Paweł Kapusta
33
0
0
31 Dec 2024
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps
Linfeng Zhao
Lawson L. S. Wong
74
1
0
16 Dec 2024
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Jiayu Chen
Wentse Chen
Jeff Schneider
OffRL
29
1
0
15 Oct 2024
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
70
2
0
10 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Manling Li
Jindong Wang
Heng Ji
AI4MH
140
0
0
09 Oct 2024
Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization
Yiting He
Peiran Liu
Yiding Ji
OffRL
34
0
0
04 Aug 2024
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRL
AI4CE
40
1
0
26 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLM
ELM
LRM
34
1
0
12 Jul 2024
A Review of the Applications of Deep Learning-Based Emergent Communication
Brendon Boldt
David R. Mortensen
VLM
31
6
0
03 Jul 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
43
1
0
15 Jun 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence
Edward Hughes
Michael Dennis
Jack Parker-Holder
Feryal M. P. Behbahani
Aditi Mavalankar
Yuge Shi
Tom Schaul
Tim Rocktaschel
LRM
34
18
0
06 Jun 2024
Counterfactual Explanations for Multivariate Time-Series without Training Datasets
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
27
1
0
28 May 2024
Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks
Haifa Alrdahi
R. Batista-Navarro
44
0
0
10 May 2024
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
54
22
0
22 Apr 2024
Cooperative Evolutionary Pressure and Diminishing Returns Might Explain the Fermi Paradox: On What Super-AIs Are Like
Daniel Vallstrom
24
0
0
01 Apr 2024
Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning
G. dÉon
N. Newman
Kevin Leyton-Brown
27
0
0
29 Feb 2024
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ELM
ReLM
LRM
16
24
0
17 Feb 2024
Scaling Artificial Intelligence for Digital Wargaming in Support of Decision-Making
Scotty Black
Christian J. Darken
14
2
0
08 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
18
7
0
02 Feb 2024
Generalized Nested Rollout Policy Adaptation with Limited Repetitions
Tristan Cazenave
6
3
0
18 Jan 2024
From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?
Yannik Keller
Jannis Blüml
Gopika Sudhakaran
Kristian Kersting
GNN
19
0
0
22 Nov 2023
Runtime Verification of Learning Properties for Reinforcement Learning Algorithms
T. Mannucci
Julio de Oliveira Filho
OffRL
6
0
0
16 Nov 2023
Optimal Robotic Assembly Sequence Planning: A Sequential Decision-Making Approach
Kartik Nagpal
Negar Mehr
25
0
0
26 Oct 2023
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
Yang Li
Kun Xiong
Yingping Zhang
Jiangcheng Zhu
Stephen Marcus McAleer
Wei Pan
J. Wang
Zonghong Dai
Yaodong Yang
31
2
0
09 Aug 2023
Towards General Game Representations: Decomposing Games Pixels into Content and Style
C. Trivedi
Konstantinos Makantasis
Antonios Liapis
Georgios N. Yannakakis
OCL
35
3
0
20 Jul 2023
1
2
3
4
5
Next